Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even simpler, you can convert each PDF page to a PNG and ask gpt4 to simply transcribe the image. In my experience it's extremely accurate, more so than Tesseract or classic OCR.


That would cost like 100x as much though.


Not really. An A4 page at 75ppi — aka what used to be the standard "Web export" back in the day — is 620x877, and 1,000 of those images costs about $2 with the current pricing for gpt4o. Assuming there are about 500 words per page on an A4-sized page, and that each word is 0.75 tokens, that's ~666k tokens for $2. Given that gpt4o is $2.50/million tokens of text, using it for OCR is break-even with Tesseract + LLM, and a lot more accurate — especially once tables or columns are involved.

It's honestly shocking how much gpt4o with vision has simplified things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: