---

Linux.com: Google’s Tesseract OCR Engine is a Quantum Leap Forward

“The open source optical character recognition (OCR) landscape
got dramatically better recently when Google released the Tesseract
OCR engine as open source software.

“The Tesseract code was written at Hewlett-Packard in the 1980s
and ’90s. In 1995, it was one of the top-tier performers at UNLV’s
OCR competition, but when HP withdrew from the OCR software
marketplace, the code languished. Then in 2005, HP handed off the
code to UNLV’s Information Science Research Institute (ISRI), an
academic center doing ongoing research into OCR and related topics.
ISRI discovered that original Tesseract developer Ray Smith was now
an employee at Google, and asked the search engine giant if it was
interested in the code. Google spent a few months updating the code
to compile on modern operating systems, and released it on
SourceForge.net…”

Complete
Story

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis