readText crash

asked 2013-04-17 14:16:06 -0500

blz

updated 2013-04-17 16:24:33 -0500

Hello, I've been trying to do OCR on some toy data, but the readText function causes a fatal error.

Below is a reproduction of the problem in the hopes that it can be useful. Note that the readText function did not initially work under my OS (ubuntu latest), so I followed the steps here to create a symlink to libjpeg.

Any advice would be extremely helpful.

Thanks in advance!

SimpleCV:9> import numpy as np

img = Image(captchas[0]).scale(2.0)

SimpleCV:6> img = img.blur().threshold(190)

SimpleCV:8> blobs = img.findBlobs()

SimpleCV:10> stdev = np.std([b.area() for b in blobs])

SimpleCV:11> let = [b for b in blobs if b.area() > (.8 * stdev)]

SimpleCV:12> let.sort(key=lambda b: b.coordinates()[0])

SimpleCV:13> let[0].bloblMask().invert().readText()

SimpleCV:15> let[0].blobMask().invert().readText()

Empty page!! Fatal Python error:

(pygame parachute) Segmentation Fault

Aborted (core dumped)

EDIT: Here are a few examples of what I'm working with. Some of these characters are indeed difficult to read via Tesseract (The J looks like an I and the R is also (strangely) reported as an I)

@kscottz, It should be noted that running Tesseract from the command line does produce good output. I've managed to get reasonably good results with python-tesseract as well but the API is a mess and documentation is nonexistant. Perhaps the problem is that I'm running readText on the output blobMask instead of on an image object?

Any further advice would be useful!

@kscottz: please see my edit!

blz ( 2013-04-17 16:24:52 -0500 )

answered 2013-04-17 15:13:36 -0500

kscottz

I would also be hesitant to toss Tesseract a highly modified image unless you were certain that the input was indeed readable. Can we see your input image so we can replicate the error?

Asked: 2013-04-17 14:16:06 -0500

Seen: 949 times

Last updated: Apr 17 '13