| OCR
Schwab uses
OCR in conjunction with other translation programs, and notes that
the increased attention to detail in deciphering written text means
his software, OmniPage Pro 9.0, can handle unusual French and Spanish
letters. "They've tried to make it as user friendly as possible,"
he says, "where different accents, diacritical marks, aren't
as cumbersome to deal with."
Similarly, the latest versions of OCR software have been able to
jump another early hurdle - tables, charts, and other non-text marks
on scanned documents. When Schwab received the latest upgrade to
OmniPage, he tested its ability to reproduce charts by scanning
the economic tables in The London Economist. "I really wanted
to make the system squeak," he says. It managed to recognize
the small tables and eight-point font with only one error. "It
screwed up on one thing," recalls Schwab, "but when you're
getting a less than one percent screwup it becomes a joke."
One area that
still trips up most OCR programs is paper and printing quality.
A document with many wrinkles, a coffee stain, and correction marks
is probably going to be more prone to OCR error than a clean sheet
fresh off of the printer. Says Miller, "if you mangle the page
the recognition is not going to be as accurate."
Fred F. Ross, author of OCR with a Smile, a guide to OCR scanning,
points out that some programs handle quality issues differently.
"One document may be a dot matrix document," says Ross,
"and there is definitely one software that will do better [with
that type of document] than the other."
A simple solution for addressing varied results is to run more than
one OCR program against a scanned document. Quickly comparing the
errors among the three programs is an easy way to see which software
handles different types of scanning best. "For a small business
owner to go and buy two or three pieces of software is a very prudent
decision," says Ross. "In the long run they will save
multiples on labor in form of editing."
He points out that for big OCR projects, a few errors per page can
add up. "If you have a 50-page document and scan it on three
programs, you can cut down the error rate from 15 to seven errors
per page and save 400 errors on the document that would take you
two or three hours to fix on the other end," he says.
|