Digitization of full-text documents before publishing on the Internet: A case study reviewing the latest optical character recognition technologies
Library software review
McClean, Clare M
Abstract: This article reviews the strengths and weaknesses of a selection of the latest optical character recognition (OCR) software packages employed to digitize the full content of paper documents before publishing on the Internet. Digitization options available and the key stages of the conversion process are outlined. The learning experiences of Eurotext, a U.K.-based electronic libraries project, are documented. The Eurotext project has developed a collaborative, interuniversity resource bank to enhance access to learning materials relating to the European Union. This benefits both students and lecturers by facilitating access to a single source of the key official documents in a wide range of subject areas using the World Wide Web. Dissemination of the knowledge acquired during the development of this service is of particular value to organizations intending to embark on a major digitization project.