I have a long-standing interest in the technologies related to managing digital content. One of my most challenging roles during my tenure at the Vanderbilt University Libraries involved oversight of the Vanderbilt Television News Archive and its transformation from videotape recordings to a large-scale collection of digital video. Other major projects I worked with that were related to digital media included helping develop the Global Music Archive, particularly its Digital Collection of East African Recordings, as well as numerous collections of and interfaces for digital photographs. I've also worked through several generations of technology for my personal collection of many tens of thousands of digital photographs. Digitizing physical media and managing digitally captured content has been a constant in my career in library technologies.
These 3 decades of involvement in digital media technologies have seen remarkable changes. The current era of cloud technologies brings powerful, large-scale technologies at lower price points. Libraries, however, continue to struggle with the cost and complexity of implementing the technical infrastructure for managing their digital media collections. New opportunities for more flexible access to digital media may be enabled by initiatives such as the International Image Interoperability Framework (IIIF; iiif.io), which is quickly gaining traction in the library arena. As libraries increasingly include digital media within the scope of their collections, it is important to take advantage of any technologies that might enable more efficiency and mitigate costs.
Cloud Technologies Reduce Cost and Increase Scalability
Today's technology scene offers a range of options for working with even the largest digital collections with more efficiency and reliability and at lower cost than ever before. Consumer technologies and social media have driven the development of global infrastructure capable of handling photos, video, and other media at incredible scale. Large-scale storage is increasingly a commodity service with diminishing costs.
The emergence of infrastructure-as-aservice (IaaS) has enabled new possibilities for the ways that libraries manage their digital media collections. In the mid-2000s, when I was involved with weighing options for managing a digital video collection exceeding 160TB, cloud-based storage options were prohibitively expensive. The costs, at the time, would have been in the hundreds of thousands of dollars per year for storage alone, with additional charges for the bandwidth for uploading the digital files and for streaming to end users. Those factors led us to a storage strategy consisting of offline storage on DVD discs for high-quality working copies and local storage arrays for streaming of downscaled lightweight versions. The storage arrays, although quite expensive, were a one-time capital expenditure with minimal ongoing costs. The expense of storage and bandwidth also impacted decisions on the video formats used. We created two versions of each media file: a full resolution in MPEG format that we considered the master file and a downscaled lightweight version suitable for streaming. Costs and technology limitations led to many compromises relative to our ideal vision for management and access of this large-scale digital collection.
Today's technology environment offers a much more flexible set of options for libraries to consider as they develop their digital media collections. The charges for online storage from services such as the Amazon Simple Storage Service (S3) and its competitors in the IaaS arena in most cases should fall below the cost of purchasing local storage equipment. Operating redundant storage from multiple providers is increasingly feasible. Services such as Amazon Glacier can enable libraries to keep copies of their digital assets in offline storage as part of their disaster planning and digital preservation strategies. At least for low-level storage, technology options have expanded while costs have decreased.
Although storage costs have generally declined over the last couple of decades, large-scale projects will continue to need substantial funding. It is important to have realistic expectations, understanding that managed storage with built-in redundancy and high availability will always require at least a moderate level of financial investment. The software and systems needed to ingest, describe, and provide access to the digital media represent additional layers of technical infrastructure also needed for these collections.
Libraries benefit from the technology advances driven by the major web destinations (such as Amazon, Facebook, Twitter, and Google). The ease of uploading and sharing content through social networks, for example, sets a very high bar for user experience. Libraries can benefit from the advances made in the broader tech sphere in the realm of digital media management, including easy ways to share or ingest content, automatically generate metadata, provide facial recognition, and offer other tools.
Fragmented Digital Collection Infrastructure
Libraries do not operate at the scale of the global web destinations. Even the largest digital library collections are dwarfed by the media shared through the major social networks. This lack of scale means that libraries can miss out on some of the benefits of cloud-based technologies.
As with many other areas of library technology, libraries generally initiate digital collections via individual standalone implementations within each organization rather than through large-scale shared repositories. The typical model for the development of a digital media collection involves an implementation of one of the digital collection management products dedicated to a single library or consortium. These implementations consist of a technical storage infrastructure for the storage of the digital objects, a metadata management system, and interfaces for end-user access to the content. A variety of products have been established as the basis for library digital collections, including OCLC's CONTENTdm, Ex Libris' Rosetta, and open source tools such as Fedora, Samvera (formerly Hydra), and Greenstone.
This model of institutional instances of digital library systems can be seen as a fragmented approach that results in less impact and higher costs compared to a possible alternative based on widely shared technical infrastructure. Rather than individual implementations of systems to manage digital collections for each institution, I can imagine a Flickrlike global infrastructure available to libraries and other cultural institutions for sharing digital video, audio, and images. Libraries have a history of cooperative ventures for sharing bibliographic metadata at regional, national, and global levels, but this model has not necessarily been widely implemented for the storage and management of digital objects.
Managing digital objects on behalf of libraries and related organizations through a shared global technical platform has the possibility of achieving a scale that's able to substantially reduce costs while offering a high level of sophisticated services and options. I see the possibility of shared infrastructure emphasizing lowerlevel storage services for digital objects and their administrative and descriptive metadata. The platform might expose a set of APIs to enable each institution to provide an interface to the items they contribute, featuring branding, search, and display preferences. This imagined platform would also have to include a layer of access and authorization to allow the deposit of items under copyright that cannot be displayed by users beyond those established by the depositing institution. Hopefully, most of the items deposited on such a globally shared platform would be freely accessible, resulting in massive amounts of library-curated digital media content available to researchers.
I posit this idea of a globally shared platform for digital media oriented to libraries not so much as a project I see as likely to be fulfilled, but more as a way of pointing out the current reality of fragmented systems. I'm not optimistic that such a project would emerge in the near future, and I am not aware of any organization that's seeking this type of centralized approach. Such a global platform for digital media for libraries would be fraught with challenges in governance, business models, and technical deployment. The idea of libraries shifting from retail to wholesale models of technology infrastructure does seem enticing.
IIIF
One of the most interesting developments in the digital media arena is the increased adoption of IIIF. This framework, implemented as a family of standardized APIs, enables any compliant interface, with its own search and display features, to address any compliant repository of images. IIIF allows researchers to use a single interface to work with many different image collections. It facilitates the creation of virtual collections, where materials owned by different institutions can be brought together. It might be possible, for example, to seamlessly present a manuscript when its individual pages reside and have been digitized among multiple libraries or museums.
IIIF consists of four API clusters addressing different aspects of the interactions between image repositories and viewing interfaces. The Presentation API enables exchange of information regarding the structure of each image or set of images within a repository so that they can be properly represented in a viewing interface. The Image Delivery API manages the requests of images by an interface to a repository, including the specific image needed, desired resolution, rotation, or subregions to be delivered. The Search API supports searching within the metadata and text associated with an image. The Authentication API enables access to images on repositories with restricted content. IIIF was originally proposed in 2011 and has been increasingly adopted among the developers of image management and presentation software.
Examples of image viewers that support IIIF are the open source products Mirador (projectmirador.org), Seadragon (openseadragon.github.io), and Universal Viewer (universalviewer.io). IIIFcompliant image repository platforms include the open source Loris server. Library-oriented products that support IIF include CONTENTdm and Rosetta.
The relatively rapid development and adoption of IIIF have enabled a new level of flexibility and efficiency within the digital image community of developers and researchers. Further advancement of the capabilities of IIIF and wider adoption should result in a more unified ecosystem of access to digital images across an expanding universe of compliant repositories. I also see IIIF as a vehicle for providing institutionally branded and customized access to images managed on shared infrastructure. The combination of interoperability via IIIF and large-scale shared repositories could significantly decrease the cost of managing images and other types of media for libraries and enable access for researchers to expansive collections while preserving the customized interfaces, branding, and access control inherent to the current environment of standalone repositories.