Library Technology Guides

Document Repository

OCLC’s new Web Harvester captures Web content to add to digital collections

Press Release: OCLC [July 29, 2008]

Copyright (c) 2008 OCLC

Abstract: OCLC is now offering Web Harvester, a new product that allows libraries and other cultural heritage institutions to capture and add Web content to their digital collections managed by OCLC’s CONTENTdm Digital Collection Management Software. OCLC’s Web Harvester addresses the need to store and provide access to otherwise highly transient information resources that solely exist on Web sites.


DUBLIN, Ohio, July 29, 2008—OCLC is now offering Web Harvester, a new product that allows libraries and other cultural heritage institutions to capture and add Web content to their digital collections managed by OCLC’s CONTENTdm Digital Collection Management Software. OCLC’s Web Harvester addresses the need to store and provide access to otherwise highly transient information resources that solely exist on Web sites.

OCLC’s Web Harvester evolved from collaboration with several state libraries, state archives and universities over a period of seven years. Participants emphasized the increasing importance of collecting and managing Web-based content as information resources move online yet remain within libraries’ and archives’ collection scopes.

The Web Harvester is integrated into library workflows, allowing library staff to capture content as part of the cataloging process. The captured content is then sent to the organization’s digital collections where it can be managed with other CONTENTdm digital content.

"With the Web Harvester, we’re expanding our digital services to more fully support the full range of digital collection creation and management," said Greg Zick, Vice President, OCLC Digital Collection Services. "The Web offers vast amounts of important and valuable information resources, which libraries and cultural heritage institutions want to add to their collections and manage over time. The Web Harvester gives those organizations an additional solution for growing their digital collections with this important content."

Release of the Web Harvester demonstrates OCLC’s commitment to provide solutions for the entire digital life cycle, said Mr. Zick.

The Web Harvester is accessed via the Connexion client, OCLC’s powerful cataloging service, and captures content ranging from single, Web-based documents to entire Web sites. Once retrieved, users can review the captured Web content and add it to a collection managed by OCLC’s CONTENTdm software, a complete solution for storing, managing and delivering a library’s digital collections to the Web. Once in CONTENTdm, then Web content can be accessed and managed in conjunction with other digital collections. Harvested items are discoverable from WorldCat.org, WorldCat Local and the CONTENTdm Web interface.

For additional security, master files of the captured content also can be ingested to the OCLC Digital Archive, the service for long-term storage of originals and master files from libraries’ digital collections.

The Web Harvester is an optional product for current Hosting users of CONTENTdm to expand their ability to collect, manage and provide access to digital content.

The Georgetown Law Library and the State Law Libraries of Maryland and Virginia have been using OCLC’s Web Harvester in a pilot project. The Chesapeake Project is a digital preservation program established to preserve and ensure permanent access to vital legal information currently available in digital formats on the World Wide Web.

"All three libraries participating in The Chesapeake Project are pleased with the project’s progress throughout its first year, and are enthusiastic about the prospect of continuing the project beyond its pilot phase," said Sarah Rhodes, Digital Preservation Librarian, Georgetown University Law Library, and member of the Chesapeake Project.

For more information about the Chesapeake Project, see the project Web site at http://cdm266901.cdmhost.com/.

Libraries or other cultural heritage institutions interested in more information about OCLC’s Web Harvester should send e-mail to digitalcollections@oclc.org.

About OCLC

Founded in 1967 and headquartered in Dublin, Ohio, OCLC is a nonprofit library service and research organization that has provided computer-based cataloging, reference, resource sharing, eContent, preservation, library management and Web services to 60,000 libraries in 112 countries and territories. OCLC and its member libraries worldwide have created and maintain WorldCat, the world’s richest online resource for finding library materials. For more information, visit www.oclc.org.

Permalink:
View Citation
Publication Year:2008
Type of Material:Press Release
Language English
Issue:July 29, 2008
Publisher:OCLC
Company: OCLC
Products: CONTENTdm
Subject: Product announcements
Record Number:13427
Last Update:2012-12-29 14:06:47
Date Created:2008-07-30 06:53:13