Archiving and preserving PDF Files

RLG DigiNews [February 15, 2001]


Abstract: Since its release in mid 1993, Adobe Portable Document Format (PDF) has become a widely used standard for electronic document distribution worldwide in many institutional settings. Much of its popularity comes from its ability to faithfully encode both the text and the visual appearance of source documents, preserving their fonts, formatting, colors, and graphics. PDF files can be viewed, navigated, and printed with a free Adobe Acrobat Reader, available on all major computing platforms. PDF has many applications and is commonly used to publish government, public, and academic documents. Many of the electronic journals and other digital resources acquired by libraries are published in PDF format.

As libraries grow more dependent on electronic resources, they need to consider how they can preserve these resources for the long term. Many libraries retain back runs of print journals that are over 100 years old, and which are still consulted by researchers. No digital technology has lasted nearly that long, and many data formats have already become obsolete and not easily readable in a much shorter time period. This document discusses ways that libraries can plan for the preservation of electronic journals and other digital resources in PDF format. After a brief discussion of the file specifications and the future plans for PDF, the article focuses on issues related to preservation of PDF files.

Review by Roy Tennant:

The Adobe Acrobat (Portable Document Format, PDF) format is not generally considered to be the format of choice for long-term preservation of digital documents. But, as Ockerbloom points out, neither should it be considered to be a format completely unsuitable for preservation. Although Adobe Systems, Inc. controls the format, the specification is freely published and widely implemented. Third-party software (including open source applications) are available that can manipulate the format in various ways, including migrating it to a different format. This article is the best explication I''ve seen of the format, the ways in which document in this format can be "rescued" or migrated into another format, and pitfalls and opportunities along the way. He includes specific steps institutions can take to reduce their exposure to document disaster down the road. As Ockerbloom says, "in summary, it is reasonable, given careful techniques..., for institutions to collect documents in PDF format with the expectation that they can be archived and preserved indefinitely, even as computer technology and standards advance." Is this a defensible statement? Time will certainly tell, but meanwhile, I''m much more convinced of it than I was before reading this piece.

View Citation
Publication Year:2001
Type of Material:Article
Language English
Published in: RLG DigiNews
Publication Info:Volume 5 Number 1
Issue:February 15, 2001
Publisher:Research Libraries Group
Place of Publication:Mountain View, CA
Notes:John Mark Ockerbloom is Digital Library Architect and Planner, University of Pennsylvania. courtesy of : Current Cites 12(2) (February 2001) ISSN: 1060-2356 Copyright 2001 by the Regents of the University of California. All rights reserved.
Subject: Digital preservation
Online access:
Record Number:8699
Last Update:2012-12-29 14:06:47
Date Created:0000-00-00 00:00:00