Google Code Blog: Announcing Tesseract OCR
Sep 05, 2006, 15:45 (0 Talkback[s])
(Other stories by Luc Vincent)
"We wanted to let you all know that a few months ago we quietly
released--or actually re-released--an Optical Character Recognition
(OCR) engine into open source. You might wonder why Google is
interested in OCR? In a nutshell, we are all about making
information available to users, and when this information is in a
paper document, OCR is the process by which we can convert the
pages of this document into text that can then be used for
indexing.
"This particular OCR engine, called Tesseract, was in fact not
originally developed at Google! It was developed at Hewlett Packard
Laboratories between 1985 and 1995..."
Complete Story
Related Stories: