CuneiForm (software)


CuneiForm Cognitive OpenOCR is a freely distributed open source OCR system developed by Russian software company Cognitive Technologies.
CuneiForm OCR was developed by Cognitive Technologies as a commercial product in 1993. The system came with the most popular models of scanners, MFPs and software in Russia and the rest of the world: Corel Draw, Hewlet-Packard, Epson, Xerox, Samsung, Brother, Mustek, OKI, Canon, Olivetti, etc.

In 2008 Cognitive Technologies opened the program’s source codes.

Features

CuneiForm is a system developed for transforming the electronic copies of paper documents and image files into an editable form without changing the structure and the original document fonts in automatic or semi-automatic mode. The system includes two components for single and batch processing of electronic documents.
The list of languages supported by the system:
Besides, the system supports a mixture of Russian and English. Recognition of other mixed languages is only supported in the branch, developed by Andrei Borovsky in 2009. Educating the system to recognize other languages is difficult since each language is related to a dat-file, the structure and development method of which are not disclosed by the developers.

History

1993 - Cognitive Technologies signed an OEM-contract with Corel, under the terms which Cognitive recognition library came embedded into the Corel Draw 3.0 package popular in the publishing sphere.
1994 – The contract with Hewlett-Packard on the equipment of all scanners imported into Russia with CuneiForm OCR. This was the first HP contract with a Russian software company.
1995 - The contract with the Japanese corporation Epson on supplying their scanners with the CuneiForm OCR. The OEM contract was signed with the world's largest manufacturer of fax machines, laser printers, scanners and other office equipment - Brother Corporation. According to the agreement, the new roller scanner Brother IC-150 was equipped with Cognitive software for scanning and recognition worldwide.
1996 - OEM agreement with one of the world's largest manufacturers of monitors, fax machines, laser printers, MFPs and other office equipment - Samsung Information Systems America. According to the agreement the new multifunction device Samsung OFFICE MASTER OML-8630A was to be equipped with the Cognitive Cuneiform LE system of symbol optical recognition worldwide.
Adaptive Recognition - a method based on a combination of two types of printed character recognition algorithms: multifont and omnifont. The system generates an internal font for each input document based on well printed characters using a dynamic adjustment to the specific input symbols. Thus, the method combines the omnitude and the technological efficiency of the omnifont approach with the high font recognition accuracy that dramatically improves the recognition rate.
1997 – The first usage of neural network-based technologies in CuneiForm. The algorithms using neural networks for character recognition are developed as follows: the character image that is to be recognized is reduced to a certain standard size. The luminance values of the normalized pattern are used as input parameters for the neural network. The number of output parameters of the neural network is equal to the number of recognized characters. The result of recognition is a symbol, which corresponds to the maximum value of the output vector of the neural network.
1999
2001 - OEM-contract with Canon on its scanners and multifunction devices equipment with Cognitive Technologies CuneiForm OCR software for Eastern Europe

Development prospects