Linux.com: Google's Tesseract OCR Engine is a Quantum Leap Forward
Sep 29, 2006, 12:00 (0 Talkback[s])
(Other stories by Nathan Willis)
"The open source optical character recognition (OCR) landscape
got dramatically better recently when Google released the Tesseract
OCR engine as open source software.
"The Tesseract code was written at Hewlett-Packard in the 1980s
and '90s. In 1995, it was one of the top-tier performers at UNLV's
OCR competition, but when HP withdrew from the OCR software
marketplace, the code languished. Then in 2005, HP handed off the
code to UNLV's Information Science Research Institute (ISRI), an
academic center doing ongoing research into OCR and related topics.
ISRI discovered that original Tesseract developer Ray Smith was now
an employee at Google, and asked the search engine giant if it was
interested in the code. Google spent a few months updating the code
to compile on modern operating systems, and released it on