python文本验证码识别库PyTesser ocr识别

PyTesser

PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.

PyTesser uses the Tesseract OCR engine, converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.

Dependencies

PIL is required to work with images in memory. PyTesser has been tested with Python 2.4 in Windows XP.

Usage Example
>>>from pytesser import* 
>>> image =Image.open('fnord.tif')  # Open image object using PIL
>>>print image_to_string(image)     # Run tesseract.exe on image
fnord
>>>print image_file_to_string('fnord.tif')
fnord

(more examples in README)

pytesser下载

http://code.google.com/p/pytesser/

Tesseract OCR engine下载:

http://code.google.com/p/tesseract-ocr/

PIL官方下载

http://www.pythonware.com/products/pil/

 

大概试了下,对英文效果很不错,哈哈

About 智足者富

http://chenpeng.info

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>