Optical and intelligent character recognition (OCR and ICR)

Optical character recognition (OCR) is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.

Intelligent character recognition (ICR) is an advanced optical character recognition or handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer.

Today the platform for people to use OCR has been changed from single PC’s to web-based applications (Cloud Computing) and mobile devices.

A comparison of optical character recognition software is available at Wikipedia.

One of the best open source OCR engines available today is Tesseract. Tesseract was originally developed as proprietary software at Hewlett-Packard between 1985 and 1995. In 2005 it was released as open source by Hewlett Packard and University of Nevada-Las VegasUNLV (UNLV). Tesseract development has been sponsored by Google since 2006. It is now released under the Apache License, Version 2.0.

There are some free OCR tools available that are based on Tesseract, for desktop PC’s and as online services :

FreeOCR, v3-2010, by Ralph Richardson ; based on Tesseract 2.04
GOCR
Tesserac-OCR
Free-OCR.com

Internet with a Brain

Your browser becomes your personal assistant and Internet gets a synthetic consciousness

Optical and intelligent character recognition (OCR and ICR)