Digitizing historic photos can be an important activity for libraries. Creating digital representations of photographs provides an opportunity to provide broader access and helps preserve photographs for future generations of scholars and researchers. Digitizing historic photographs not only falls within a library's role in helping to preserve cultural heritage, but it also provides an opportunity to deepen its level of engagement with its community.
The standards and technical components needed for a digitizing project will vary according to the scale and scope of the project and the resources available. Such a project also involves several components including the format of the media files created through the digitization of the photographs, the type of digital storage in which the files will be placed, the metadata that describes the content of each photograph, and the applications used to manage and provide access to the digital collection.
The format of the digital image files will include specifications such as the resolution of the digitized files, the tonal resolution, and the type of media file produced. The resolution of the image will vary depending on the size of the original object, the level of quality expected to be achieved, the capabilities of the digitizing equipment, and the quantity of storage available. Only a few years ago, the typical resolution used to digitize image would have been 300 dpi (dots per inch or pixels per inch). The capabilities of digital cameras and the capacity of affordable storage makes it now possible to capture digital images at much higher resolutions. In most cases, a scan of 600 dpi will produce a high-resolution image that captures all of the visual information of the original.
The size and quality of the original material will factor into determining the resolution. 35mm slides, for example, require much higher resolution to gain the same level of quality as photographic prints. Scanning 35mm slides at 2,000– 4,000 dpi should yield excellent results.
Large-format items, such as maps or posters, would ideally be scanned at equivalent resolutions as standard-sized prints. Such scans may require significantly more storage and special digitizing equipment. Not all digitizing labs will include largeformat scanners or cameras. If you are outsourcing large-format materials, ensure that the vendor has access to the appropriate digitizing equipment and expect to pay significantly more for these materials than for smaller prints or slides.
Digitizing projects will also need to specify the tonal depth for the digital files produced. If the original images are color, the digital files should be produced with at least a 24-bit RGB profile. Black and white images should not be digitized as simple bitonal images, but rather with at least an 8-bit grayscale.
Most projects require two categories of digital images: preservation and access. The preservation images are meant to be the highest quality and resolution. Although in most cases the original prints, slides, or negatives will also be preserved, these digital masters provide an additional layer of protection should the original materials become lost or damaged. Another set of digital files can be derived from the digital masters in a format suitable for presentation through web-based interfaces. The files for access will usually have substantially lower resolution than the masters—typically 72 dpi—and require only a fraction of the digital storage and require less bandwidth for presentation.
The library will also need to specify the technical format of the digital files to be produced. TIFF files have been a mainstay for digital masters and should be provided if possible. If a digital camera is used to capture the images, it is also beneficial to receive the RAW files produced. Both TIFF and RAW files can be quite large, especially for larger items captured at high resolution. These files are well suited for digital preservation but do not work well for presentation via web-based interfaces. More web-friendly formats can be derived from the digital masters for viewing through access systems, either produced in advance or dynamically within a digital collections management application. JPEG2000 has become widely adopted as a compressed image format since it does not introduce any loss of information. JPEG images are also commonly used but do lose information upon compression.
The technical specifications of digitizing cannot necessarily be determined by a rote formula. They will depend on many factors, such as the conditions of the original materials, the capabilities of the available equipment, as well as the budget available. The curators of a collection will usually work with a technical expert to design specifications that meet the requirements of the project. The final specifications should achieve the highest image quality possible relative to the conditions of the original materials, the budget available, the capacity of available storage, and any requirements of the applications used to manage, preserve, and provide access to the images in the collection.
In order to function as a coherent digital collection, the digitized images need to be associated with metadata that describes what is represented in each image, including names, dates, locations, and other descriptive information. The structure and standards of the metadata represents part of the initial design process of a digital collection. Decisions will need to be made regarding the general metadata standard to be followed, such as Dublin Core, and how each of the available fields will be structured and populated. The ways in which all names and places are recorded will need to be standardized in order for the collection to be easily browsed and searched. In some cases, the metadata design might include use of established vocabularies or ontologies, such as the Art and Architecture Thesaurus (AAT) offered by the Getty Research Institute. Use of these standards will be important especially if the local digital collection will be linked to other regional, state, or national collections. The design of the metadata schema will make a big difference in how well researchers will be able to search or browse the collection and on its interoperability with other information systems.
The creation of metadata for each image will require an investment of time. In most cases, it will take much longer to describe an image than to digitize it. Some metadata elements may be available from documentation associated with the original photographs. Even if it is not entirely complete, this information can provide a good starting point for a more complete description of the photograph based on additional research.
Some libraries and museums have had good results in using crowdsourcing to discover additional information about historic photos. Once the image is digitized and preliminary metadata has been created, the organization can make the images available publicly and invite community members to contribute information on persons, places, and dates that they recognize. Librarians or collection curators can then follow-up to confirm any contributed metadata. Crowdsourcing not only helps the library enrich the metadata for its digital collection, but also represents an opportunity to strengthen its engagement with its community members.
Some type of collections management tool or digital asset management system will need to be implemented to bring together the digitized images and the descriptive metadata to create interfaces for searching, browsing, and viewing the digital images. These products will also provide the library with tools for ingesting the images, importing or entering metadata, and for other aspects of managing the collection.
Many different collection management tools are available for libraries and other types of institutions involved with managing digital images. In the library arena, commercial products such as OCLC's CONTENTdm or Ex Libris Rosetta have been widely implemented. Those interested in open source tools can consider products such as Fedora, Samvera, or Islandora. Several companies and non-profit organizations offer implementation and support services for open source products to libraries that may not have extensive expertise in-house.
All these factors lead us back to the question regarding the features to specify for metadata creation and user-friendly patron interfaces. As we have noted, digitizing and providing access to a collection of photographs involves many layers of standards, workflow, and technical infrastructure. The specific features and capabilities of the technology products will also depend on the complexity and scale of the project. Collections including hundreds of thousands or millions of images will naturally require more industrial-strength technical infrastructure than smaller collections. As with other categories of technology products, we can't expect a one-sizefits- all solution. Finding the best environment for a project such as digitizing historic photographs will involve a thorough review of the expectations and requirements of the proposed collection and should take into consideration a broader digitization program that the library might want to incorporate into its ongoing operations.