l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2007 Apr 11 10:50

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] OCR notes
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] OCR notes

Hi everyone,

I am about to embark on an exciting adventure into the land of original 
character recognition, processing nearly 1,000 documents and extracting 
numbers from them. I am interested in any anecdotal wisdom regarding:

1. efficient scanning parameters:
color / BW / grayscale

2. pre-processing steps one might do with imagemagick

3. any filtering that one might do to get ready for the OCR

I plan to use Google's new OCR project, ocropus, which currently uses 
the 'tesseract' engine. Naive attempts to OCR these documents is resulting in 
marginal accuracy, so any help is appreciated. Vertical and horizontal lines 
on the original documents are confusing the OCR, so removing them might be a 
start. I have thought about extracting each 'cell' of data with imagemagick, 
and then running the resulting mini-images though the OCR... that might be a 
last resort though...


Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.