[MGSA-L] Searchable Greek text - the first 600+ volumes

Chris Williams troianovagroup at hotmail.com
Sat Dec 21 09:59:24 PST 2013

This is of considerable significance – a searchable resource of Greek text from the large and increasing body of scanned PDF-format Greek literature; this archive now includes Dindorf’s Zonaras. Bruce Robertson and Frederico Boscheti have applied sophisticated OCR (Optical Character Recognition) techniques to scanned PDFs of literature available via archive.org. The resource is now easily searchable and can be downloaded and searched as bulk text using cheap and reliable indexing software such as dtsearch. It should now be possible to convert much online, out-of-copyright Greek text, classical or otherwise, for this purpose. They deserve our thanks!


Date: Thu, 12 Dec 2013 23:23:22 -0400
From: bruce.g.robertson at GMAIL.COM
Subject: [DIGITALCLASSICIST] Greek OCR: The First 600 Volumes

Dear all,  
I'd like to announce the result of last year's campaign to OCR a significant portion of the polytonic Greek volumes on archive.org using a specially allocated HPC environment from Compute Canada. 

At http://heml.mta.ca/lace you'll find page images for over 600 volumes, with corresponding OCR output and freely downloadable archives of all stages of processing. 

A selection of interesting volumes and ideas for their use appears at http;//heml.mta.ca/lace/gallery 

Yours Sincerely,

Bruce Robertson, 
Head of Classics, Mount Allison University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillists.uci.edu/pipermail/mgsa-l/attachments/20131221/a46d0551/attachment.html>

More information about the MGSA-L mailing list