Talk of mass digitization generally brings to mind large-scale projects to scan huge collections of books. The Google Library Print project, the Open Content Alliance, and others have taken on incredibly ambitious projects to digitize enormous numbers of books in some of the world's biggest libraries. Digitization of book collections stands to fundamentally change how individuals find and gain access to the written word. My column for the May 2008 issue of CIL explored the need for deeper search techniques to take advantage of the value of large collections of digitized books. This month, let's take a look at the opportunities made possible through the digitization of multimedia content.
Digitizing brings at least the same level of benefit to other types of collections as we've experienced for books and journals. Therefore, mass digitization needs to address the realms of sight and sound - especially in this age where rich media find a wide and receptive audience on the web. Sites such as Flickr and YouTube exemplify the heightened interest in photos and video. Distribution of digital sound through podcasting has become a routine and expected form of syndicating radio, news, lectures, and other verbal content. Given the current preferences toward rich media content, it seems that libraries should seek out any possibilities for mass digitization of multimedia content in addition to projects related to books and articles.
Vanderbilt Builds a Television News Archive
One of my responsibilities involves administering the Vanderbilt Television News Archive as its executive director. My work with this unit of the library has provided me a great opportunity to apply a variety of technologies to preserve and provide access to a unique and valuable collection and to develop a sustainable business plan.
This archive plays a unique and important role in the arena of national news archiving. Prior to the founding of the Vanderbilt Television News Archive, no comprehensive preservation of national news programming took place. Even the networks themselves didn't keep copies of the newscasts after they were broadcast-the videotapes were erased and reused after a certain period.
Given the importance of the national news in documenting the events of the world for the American public, Paul Simpson, the founder of the Vanderbilt Television News Archive, considered it vital to save these broadcasts for posterity. In the same way that we expect to go to a library to read previous issues of newspapers going back through history, it's also important to view news broadcasts of times past. Imagine a world where at the end of the day all newspapers were collected and shredded. Without our archive, many national television news broadcasts would have disappeared after their initial broadcast. A great deal of the national news broadcasts prior to when we began recording are unfortunately lost.
The Vanderbilt Television News Archive systematically records, pre- serves, and provides access to news pro- gramming of the U.S. national televi- sion networks and ranks as the largest and most comprehensive archive of its kind. Its collection includes a comprehensive run of the evening news pro- grams of ABC, CBS, and NBC since Aug. 5, 1968, an hour per day of CNN since 1995, and an hour of Fox News daily since 2004. In addition to the evening news programs, a collection of special news programs includes a variety of other events covered on the national news - presidential speeches, election coverage, political conven- tions, major wars, and other major events. Some of the highlights of the collection of news specials include coverage of the Watergate hearings, the State of the Union and other major presidential speeches, the Gulf War, continuous coverage of the events of 9/11, and the wars in Afghanistan and Iraq. Between the evening news and the special collections, the material recorded accumulates to more than 40,000 hours of video content. And what started out as a 3-month experiment to demonstrate the viability of creating a library of television news has managed to survive 4 decades, despite incredible technical, legal, and financial challenges.
In the 1970s, the archive faced its greatest legal crisis when one of the networks filed a lawsuit claiming violation of its copyright. Archiving television content involves complex intellectual property issues. The original networks clearly own the content, but what rights does a third party, such as our archive, have to record, preserve, and provide access to that content? The networks claimed exclusive access. But it also seemed unfortunate for the public to be denied access. A revision of the U.S. copyright law was already underway in 1974 at the time of the lawsuit. Sen. Howard Baker introduced a statute that provided an exemption allowing libraries to make recordings of televised news programs and to loan them to the public. When the new copyright law that included this "Vanderbilt clause" went into effect in 1976, both parties withdrew from the lawsuit.
Financial issues also threatened the archive. Operating an off-air television archive requires considerable staff, recording equipment, and media. On more than one occasion, the archive faced threats of closure due to high costs of operation and limited possibilities for income.
Digitizing the Vanderbilt Television News Archive
For the last 5 years, we've been working hard at the Vanderbilt Television News Archive to digitize our collection of national news broadcasts. In 2002, the National Science Foundation (NSF) funded a 1-year project to explore technologies and methodologies that could be applied to begin recording newscasts digitally and to digitize our existing videotape collection. We put these technologies to work right away. In 2003, we switched our off-air recording from U-Matic videotape to MPEG-2 digital files. The National Endowment for the Humanities (NEH) provided support for a 2-year project, beginning in 2004, to digitize our collection of more than 30,000 evening news programs. Once that effort was completed in 2006, we commenced digitizing our collection of specials, also with funding from the NEH. That project ended in August 2008, resulting in a fully digitized collection of about 40,000 hours of video content.
Digitizing this collection presented a number of challenges spanning issues including technical standards, metadata, storage capacity, digital preservation, and equipment maintenance. During the NSF grant phase, we selected the standards, profiles, and practices that have been used subsequently. In the digital video arena, one of the key trade-offs involves balancing compression against file size. With our collection of off-air recordings, the already compromised quality of the original material allows us to use more aggressive compression without perceptible degradation. For those with an interest, we settled on an MPEG-2 profile with compression to a 6-MB/sec. bitrate.
This project involved a constant battle with bandwidth and storage. The digital files produced required about 3GB of storage per hour of content, which, when you consider we are dealing with 40,000 hours of content, requires about 120 terabytes of storage. Our storage strategy involved making a copy of each MPEG-2 file on DVD for our local working collection, sending a copy to the Library of Congress for long-term preservation, and rendering a copy in RealMedia at a lower bitrate and smaller screen resolution for access through our cluster of streaming servers. Moving these files produced considerable strain on our local networks and, at least initially, defied the use of the internet for transfer to the Library of Congress. During the first grant project, we physically shipped files on 500GB drives. We were able to bypass physical shipping during the second project and made use of the higher bandwidth available through Internet2 to transport the files.
In order to process this quantity of material, we had to implement efficient workflows to process the collection in the allotted time frame and budget. Whenever possible, we created software tools to automate routine tasks. These tools avoided manual handling of files, allowing operators to submit each digitized MPEG-2 file through a web interface. This initiated a process that copied it to an intermediate storage server, transcoded it into RealMedia format, and deposited it onto the streaming media server cluster. Even though we automated processes where we could, the project still involved an incredible amount of hands-on work, requiring about five full-time personnel in addition to regular staff of the archive.
Benefits of Digitization
The transition to a digital collection has presented the Vanderbilt Television News Archive with many advantages not available when it was tied to a physical videotape collection, especially in the ways that we are able to provide access to the collection. When the collection resided on videotape, the two options for access were on-site viewing and making copies to ship to those that made requests for material on loan. The loan service included making copies of either whole programs or compilations of selected segments onto a videotape mailed to the individual making the request. We charged service fees to offset the expense of providing the service.
As the collection shifted from videotape to digital formats, we gained a lot of efficiency in the loan service. Not only were we able to move from VHS videotape to DVD, but the process of making the compilations and duplications can be done much more quickly when working from digital files instead of having to retrieve physical tapes.
The ability to deliver access to the collection through streaming video opens up opportunities that would have never been possible with the videotape collection. For visitors on-site within our library and in the archive's offices, we allow immediate viewing of the entire collection through streaming video. Unfortunately, limits on copyright preclude providing streaming video access to the general public throughout the web. That is because the exemption that gives us the ability to loan physical copies of materials does not apply to streaming without the explicit permission of the originating network as the primary copyright holder. We have, however, worked out agreements with two of the networks represented in our collection to provide streaming access of their content to colleges and universities based in the U.S. and Canada.
These agreements provide the basis for a subscription product that we offer to eligible institutions. This resource includes full access to our database of more than 880,000 abstracts and catalog records that describe the collection, with direct links to the streaming video for authorized networks. We offer these subscriptions at an annual fee, which contributes toward the operational expenses of the archive. This is one way in which shifting to digital technologies has improved the financial environment. We're working hard to increase the number of institutional subscribers to help us further improve the financial support of the archive.
A Promising Future
Even though we have already seen great benefits from digitizing our collection of news broadcasts, I think that we have by no means exhausted the advantages enabled by the technology. Other possibilities include finding ways to provide deeper search capabilities for the content. While technology cannot resolve the copyright issues that prevent us from opening up the content to the world, it can help us increase the impact and value of the collection and further strengthen the sustainability of the operation to allow it to continue for another 40 years and beyond.
When we began our efforts to digitize the collection 5 years ago, we faced many challenges that have moderated over that time. Costs related to storage, bandwidth, and equipment have since dropped significantly. Interest and use of video and other multimedia content has risen explosively. These factors not only make multimedia digitization projects more affordable and practical to achieve but point to the importance of expanding library collections with content rich with video, images, and sound.