Checking the OCR samples

Having just proof-read OCR samples from 6 county survey volumes (my first task on this project), I have been astonished at the sheer quality of the OCR output. Scanning technology has certainly come a long way since I first saw grainy scanned texts over 20 years ago. The samples have come from early volumes, with all the attendant issues of unavailability of certain fonts/characters and uneven inking on the page (which makes it hard to distinguish bolded from unbolded text). The OCR technology seems to have dealt very well with these difficulties and faithfully reproduced pretty much everything on the page – to the point where some things (such as the printers’ marks at the foot of some pages) will have to be removed. The main issues arising are missing macrons from place name elements; the treatment of footnotes; and how addenda/corrigenda will be incorporated. These issues are unrelated to the OCR process and will require an editorial decision.

Comments are closed.