One of the fundamental issues facing libraries involves the long-term preservation of information. While this issue applies to all aspects of a library's collection, my main concern as a technologist involves those in electronic form. We need to ensure that our collections will be available for all future generations, regardless of their format.
Materials printed on paper have proven to be truly long lasting. Manuscripts from hundreds, if not thousands, of years have survived to our generation. When we think of preserving digital information for posterity, the ideal continues to be eons, not mere decades.
As libraries, information centers, museums, and archives become involved in creating collections of digitized materials, enormous challenges emerge when it comes to guaranteeing permanent access. Digital technologies have only been around for the last few decades, and lack a track record of long-term preservation.
As I follow the recent trends and initiatives related to digital collections and digital preservation, one theme that continually arises involves the Open Archival Information Systems reference model. This framework describes a set of principles and methodologies involved in the creation of digital collections that can be trusted to provide permanent storage and access of their materials. Well take a quick look at OAIS later in the article.
The challenge of changing technologies
Technologies involved in the storage and presentation of digital information change continuously. Removable disks have evolved through many formats over the last 20 years. Remember the 5-¼ inch single density floppy disks that held 360KB of data? How many computers remain equipped with drives that can read them today? Even 3 ½ inch disk drives seem to be fading from the scene. Optical discs race through the same transitions. In my collection of old computer gear, I recently came across a 12-inch laser disc. Very few computers would be able to read the data on it today. CD-ROM's have been a mainstay for many years, but now DVD is gaining a stronghold. Given the history of technology, we can only assume that any given method of storing information will be transient, and the cycles of life expectancy for any given format may even become shorter as time goes on.
Another problem revolves around the various file formats used to store information and the software used to create and view these formats. The way that text, images, sound, and video can be stored and accessed by computers has gone through many transitions. The software involved with creating and viewing digital information evolves rapidly, as do the hardware platforms and operating systems involved. Word processors, graphics programs, and video editing environments evolve quickly, and there are no guarantees that each new version will be backwards compatible with previous versions.
Digital media fall prey to physical deterioration. It is a common misconception that once recorded onto standard media, that digital information will last forever. Magnetic tape, optical discs, and hard disk drives all have limitations on how long their data will last if left unattended. Some CD-ROM manufacturers claim life expectancies of 200 years without data loss. But such assertions are not undisputed and environmental conditions can definitely accelerate deterioration. Optical media generally offer longer life than magnetic, but neither approaches the ideal of permanence. Yet, the real problem is obsolesce rather than deterioration. Even if the data were to stay intact on the media for 200 years, would there be playback equipment available? Given the rapid change in hardware already noted, it seems highly unlikely that compatible equipment would exist.
I don't intend to paint a negative picture regarding the long-term viability of digital information. My point is to show the challenges that must be met. Strategies, methodologies, and technologies for preservation are being developed and refined, and are a key component of any digital collection.
Basic methods for preserving digital content
The long-term preservation of digital material requires a commitment to an ongoing set of processes to move digitized materials through each generation of technology. A strict adherence to all appropriate standards will increase the likelihood that digital objects will migrate into the next generation with the least loss of content, greatest efficiency, and the lowest cost.
We know that there is no permanent and durable medium for storing digital information. This leads to the need to test and refresh information at appropriate intervals. Copying data onto fresh media can re-start the life-expectancy clock. If, for example, we assume that a CD-Recordable disc will last for 20 years, burning a new copy every 10 years would be a process that could be followed to preserve the data, given an assumption that this media continues to be viable.
New media options continually emerge, with each successive generation offering more capacity at a lesser cost. Given this trend, it's likely that in most cases information will not be refreshed onto the same type of media, but will be moved onto the next generation. It wouldn't make sense, for example to refresh data held on 3 ½ inch diskettes onto the same kind of media. Rather, it would be more practical to copy the data onto CD-ROM or DVD.
Many digitized collections reside on online storage systems of some sort. These devices offer instant access to information, are highly reliable, and can be built for very large capacities. Yet we do not expect any given online system to last forever. Storage systems typically last 3-5 years and are replaced by faster and better ones. Online storage systems become obsolete very quickly. Cycles of migration of data in the online storage arena can be very short. These systems are also volatile. It's important to follow strict data management and disaster planning procedures to ensure that systems failures do not result in loss of information.
One of the most important strategies related to digital preservation involves taking advantage of open standards whenever available and appropriate, avoiding proprietary data formats when possible. Even though standards change and evolve, they are much more stable and reliable than proprietary formats that may be tied to the commercial success of a single vendor. Given choices, for example, between digitizing video into MPEG, an international standard, and a proprietary format such as RealMedia or Windows Media, it makes sense to use the standards-based format. It often means creating a master copy in the standard format, and deriving from it proprietary versions that can be used by a wider audience. In one project that I'm involved with, or strategy involves digitizing video into MPEG-2 format for preservation, and generating RealMedia files to be used for end-user access.
While these and other methods can be used as informal guidelines, the issue of digital preservation really demands a more structured and formal approach. To this end, a number of efforts have taken place in recent years that provide a structured framework for the design of an archive that meets the requirements of long-term preservation in addition to other areas of functionality. The culmination of these efforts is the Open Archival Information System, or OAIS.
OAIS: A comprehensive framework for digital preservation
The complexities involved with changing technologies and the need to create permanent digital collections has sparked a number of initiatives of some of the major players. Some of the important activities in this arena include the CEDARS project in the UK (http://www.leeds.ac.uk/cedars/); NEDLIB, a collaborative project of several European national libraries (http://www.kb.nl/coop/nedlib/); and the National Library of Australia's PADI initiative (http://www.nla.gov.au/padi/). OCLC and the Research Libraries Group (RLG) have been collaborating on this issue and have issued a report titled: “Attributes of a Trusted Digital Repository: Meeting the Needs of Research Resources” (www.rlg.org/longterm/attributes01.pdf).
The most definitive model for creating long-lasting digital archives is known as OAIS, the Open Archival Information system. This model was developed by the Consultative Committee for Space Data Systems of NASA, and is under consideration for approval as an international standard. While developed for space data, the reference model is widely accepted as applicable to a very broad range of data types. The vocabulary and concepts specified in OAIS have become integrated into the language of the digital library community.
The official document that presents the OAIS reference model (www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf), is a long (148 pages) detailed document, written in terse, precise language, typical of other standards documents. I've found a couple of good summaries of the document that present the framework in understandable language. These include Brian Lavoie's “Meeting the challenges of digital preservation: the OAIS reference model” (OCLC Newsletter, No. 243 www2.oclc.org/oclc/pdf/news243.pdf) and the last section the OCLC/RLG report mentioned above.
The OAIS model places great emphasis on the long-term maintenance of digital collections. It recognizes that archives must deal with issues of changing technology and standards, taking into consideration the evolution of the computing environment. It offers strategies for dealing with the inevitable changes that will occur in operating systems, databases, and all the tools used to access digital information. Space doesn't allow me to describe the entire OAIS framework, but I do want to mention a few of its characteristics and some of its terminology. I encourage those involved in creating digital collections to take a closer look at this model, if they are not already familiar with it.
OAIS does not dwell on specific technologies, but rather focuses on the various relationships, concepts, and processes that apply to the overall problem of digital preservation. It is a comprehensive model or framework that addresses the issues related to designing an operating an archive that is independent of any specific hardware and software components that may happen to be current today.
Relationships defined in the model include producers, consumers, management, and the archive itself. The producers are the creators or owners of the content that the archive will preserve. Consumers make use of content in the archive. While there may be a set of general users, special consideration is given the Designated Community, which are those that have direct interest in and understanding of the content of the archive. Managers have authority over the archive and make decisions regarding its operation.
Many processes related to the creation and operation of an archive are outlined in the framework. Ingestion is the function of accepting information from producers and integrating it into the archive, taking any measures necessary to put it in the proper form for permanent storage. The <Archival Storage function involves the actual storage of the material, including capabilities to accept new items through ingestion and to retrieve items as needed. This part of the model takes responsibility for the ongoing maintenance of the archived material, such as migration to new media, error checking, disaster recovery, and the like. The Administrative function manages the routine operation of the archive; the Data Management function deals with metadata that describe each item and performs requests typically associated with a database such as queries, result sets, and reports. The Access function allows consumers, especially those of the designated community, to find and use items held in the archive.
Another set of key concepts in OAIS involves they way that items are structured, or “packaged.” The basic Information Package involves the digital object and all its associated metadata. The Submission Information Package applies to the Ingestion process; the Archival Information Package is the form that the object and related metadata take after placed into archival storage, and the Dissemination Information Package includes all the components necessary to provide access, including the digital object, metadata, and the current software tools appropriate for viewing the object.
The importance of the OAIS framework is that it provides a comprehensive and well-accepted methodology for creating digital collections in a way that ensure permanence. It may well be the case that not all the organizations that create digital collections will be able to address or implement every aspect of the complete system. It does serve as a benchmark that can be used to access current efforts at building digital archives and to guide the design of future ones.
Whether or not one chooses to follow the OAIS model, the issue of digital preservation is a vital one. I think that we all want the words, art, and other creations of our age to survive into the distant future to at least the same degree as the works of the past have survived into our time. With careful planning and ongoing commitments of resources, the ideal of long-term access to our digitized collections can be realized. The challenge continues to revolve around the fact that cycles of technology are increasingly short and that the future is infinitely long.