Thanks for this. I tried using Tesseract over the weekend to extract text from a...

danvk · on Jan 19, 2015

I wouldn't say that Ocropus is well-documented (this blog post was partially intended to address that). But it's at least written in easily hackable Python, whereas Tesseract is 30 year old C/C++.

joaomsa · on Jan 19, 2015

My main gripe with tesseract is how convoluted and lacking in documentation the training procedure is, which is critical to getting better results. I'll be sure to check out ocropus.

danvk · on Jan 19, 2015

You'll enjoy my follow-up post then, which talks about training: http://www.danvk.org/2015/01/11/training-an-ocropus-ocr-mode...