Thanks for this. I tried using Tesseract over the weekend to extract text from a game screenshot and had no luck. The documentation for Tesseract is rather opaque; maybe I'll have better luck with Ocropus.
I wouldn't say that Ocropus is well-documented (this blog post was partially intended to address that). But it's at least written in easily hackable Python, whereas Tesseract is 30 year old C/C++.
My main gripe with tesseract is how convoluted and lacking in documentation the training procedure is, which is critical to getting better results. I'll be sure to check out ocropus.