The history of optical character recognition OCR has its roots in telegraphy. Shortly before the start of World War I, physicist Emanuel Goldberg invented a machine that could read characters and convert them into telegraph code. In the 1920s, he went a step further and created the first electronic document retrieval system.
Companies were microfilming financial records
At the time, quickly South Africa Phone Number retrieving specific records from reels of film was nearly impossible. To overcome this, Goldberg used a photoelectric cell to recognize patterns with the help of a movie projector. By reusing existing technologies, he took the first steps toward automating record keeping.
The US patent for his Statistical Machine was later purchased by IBM. Since then, OCR technology has proliferated, and companies around the world rely on it to help reduce overhead when it comes to converting data extracted from paper documents. How optical character recognition works.
The step of OCR is to use a scanner to process the physical form of a document. Once all the pages are copied, the OCR software converts the document into a two-color or black-and-white version. The scanned image or bitmap is analyzed for light and dark areas, where dark areas are identified as characters to be recognized and light areas are identified as background.
The dark areas are then further processed to find alphabetic letters or numeric digits. OCR programs can vary in their techniques, but generally involve targeting one character, word, or block of text at a time. Characters are then identified using one of two algorithms:
Pattern Recognition – OCR programs receive examples of text in various fonts and formats which are then used to compare and recognize characters in the scanned document. Feature detection. OCR programs apply rules regarding the characteristics of a specific letter or number to recognize characters in the scanned document.
Characteristics could include
The number of angled lines, crossed lines, or curves in a character to compare. The capital letter stored as two diagonal lines that meet with a horizontal line in between. When a character is identified, it is converted to an ASCII code that can be used by computer systems to handle further manipulations. Users should correct basic errors, review and ensure complex layouts were handled correctly before saving the document for future use.