more marcotm's comments

marcotm · on March 22, 2023

Thank you very much! Filtering by location (and role) is on my todo list, but it is trickier than it seems at first. And I totally agree that the buttons are confusing. Actually, the "sort" button does sort the jobs. It sorts by semantic similarity to the job you selected (using the GPT text embedding). As for the buttons (and probably other parts of the site) not being accessible: I apologize. This shouldn't be an afterthought.

flanbiscuit · on March 22, 2023

No worries! Your MVP looks great! I'm just starting a job search so this came at the right time. Thank you!

marcotm · on March 22, 2023

Exactly. I am sure you can get similar results with some "traditional" NLP skills, but the good (bad?) thing is that they are not required when using one of the newer LLMs.

marcotm · on March 22, 2023

Thanks! Haha, you're right. There's already a kind of rule for that in the original prompt I'm using, but the "competitive" thing somehow still slips through. Will fix it in the next version.

marcotm · on March 22, 2023

Thanks for the suggestion! It's on my todo list. For now, you at least can sort jobs by similarity to a selected job. It's the middle icon to the left of each entry (maybe not the most intuitive way how to do it, though).

navane · on March 22, 2023

wow, that is both very neat and hard to find

marcotm · on March 22, 2023

I wanted to share a little side project of mine that I created while tinkering around with GPT-3.

The project uses the Algolia HN Search API [1] to retrieve the "Who is hiring?" posts from HN and then parses them with the help of GPT-3 / GPT-3.5 (I do not have API access to GPT-4, yet, but it already works quite well even with the older models). It then puts the job postings into a structured list that is hopefully easier to skim than the original postings. There are some additional features like sorting jobs by semantic similarity (based on the text embeddings from OpenAI). Filtering, sorting and saving favorites is implemented client-side, so your data and preferences remain local to your browser.

Originally, this wasn't even meant to be a public product, but if people find it useful (and HN is fine with it), I'll try to keep it running. I've also written a short article about how the parsing works behind the scenes [2]. It's quite amazing how easy many of the classic NLP tasks have become with the newer LLMs.

Happy to answer any questions about the project!

[1] https://hn.algolia.com

[2] https://marcotm.com/articles/information-extraction-with-lar...

shagie · on March 22, 2023

You can make the intermediate step a bit more structured too via https://github.com/HackerNews/API

For example, for the March one it is ID 34983767 (from the algolia search or a "there's only so many of them, here's a list that I'll add to each month").

You can then get a list of all the top level comments at https://hacker-news.firebaseio.com/v0/item/34983767.json?pri...

And then pulling up a comment at https://hacker-news.firebaseio.com/v0/item/35255027.json?pri... to not have to parse any of its child comments or the HTML of the page.

(late edit: and re-reading the blog post while not trying to pay half attention to a meeting... that is what you are doing)

marcotm · on March 22, 2023

Thanks for mentioning the Firebase-based API. I knew it existed, but somehow I went with the Algolia API by default. I use their HN search quite a bit, so that's probably why I stuck with them. (no affiliation)

whinvik · on March 22, 2023

This is really nice. I have 1 nitpicky comment on the blog. The font used is jarring for me to read.

ta1243 · on March 22, 2023

It's like I've stepped into an episode of futurama!

number6 · on March 22, 2023

I tried a similar thing today parsing unstructured text (client excel documents) and turn them into JSON. I ran into the problem that the output format changed and sometimes the JSON wants parsable.

Thanks for your prompt. There are some pointers how to improve mine

marcotm · on March 22, 2023

You're welcome! For the chat model, it definitely helps to let it know that you want valid, parsable JSON (and nothing else). Otherwise it tends to get chatty. ;-) Depending on your use case, you might even ask it to fix the JSON if it's not parsable.

number6 · on March 22, 2023

I had the problem that it changed the layout of the JSON file: {"data": [...]} or {"products":[...]}.

In your first example, you told GPT what data structure you expected. I added this to my prompt, and now it produces the JSON Data consistently.

avinassh · on March 22, 2023

Any plans for making this open source?

marcotm · on March 22, 2023

The core ideas for extracting the information with GPT are already available in the blog post linked above. Those are exactly the prompts I'm using. The rest is just a pretty simple Nuxt web application. So I'm not sure if open sourcing my mediocre frontend code would be of any value. Is there anything in particular you would be interested in?

marcotm · on Jan 28, 2022

There's a Kickstarter for startups. It's Kickstarter. But looking at the current project guidelines, I think you are right that Kickstarter is not (or does not want to be) the right platform for pure software / SaaS / etc. startups (which I guess you are thinking about). So the equivalent would be something like backers pay x amount of money to get a lifetime, one-year, whatever license of the final product (in contrast to shares like in crowd investing). Funds will only be paid to the project initiator if they get above their pre-defined threshold (similar to Kickstarter). I definitely think that this could be valuable, but I'm also not sure if that space is not already covered by existing companies. Seems relatively straight forward. Somehow similar: Early access titles on Steam, etc.

marcotm · on March 10, 2019

+1. Due to the small size of our company, my role still is somewhere between lead engineer and a pure managing position, but it‘s enough of the management schedule that I can no longer be a reliable partner for working on the core product. So I try to automate/optimize internal workflows with my coding skills, i.e., I build tools that improve things if they exist but do not break or block things if they‘re delayed due to too many meetings creeping in.