Libraries have been increasingly involved in the creation of digital library collections and other content repositories. It's not unusual at all for libraries to be involved in digitizing collections of manuscripts, photographs, newspapers, postcards, or multimedia content such as audio recordings and radio or television programming. Academic libraries have become increasingly active in building repositories of content produced within their institutions and in other aspects of publishing scholarly content as well. While books and journals - both the print and electronic variety - stand as the mainstays of library content offerings, it's the collections of rare, historic, or local material that provide the greatest opportunity to increase the impact that the library makes on its community. These collections of nontraditional content provide the best jewels of the library, and delivering them digitally can amplify their impact tremendously.
Digital collections offer libraries another tremendous benefit - they can be leveraged to draw additional users into the library's web presence. This addresses one of my main concerns regarding library strategies, which involves positioning the library's web presence as the key destination for library users, second only to the library's physical facilities. (I defer to others when it comes to issues involving "Library as Place.") After all, libraries compete in an increasingly crowded landscape of information providers on the web. We must work hard to draw in our users so that they can take advantage of the authoritative content and services we provide for them rather than relying on the offerings of the open web.
As digital collections can be exploited to strengthen the library's visibility on the web, they should be part of any unified search environment that libraries adopt to enrich and fortify the impact of this representation of the library's collection. In this month's column, I offer for consideration three techniques for maximizing the content of digital collections: search engine optimization, the use of RSS feeds, and the incorporation of content from digital collections into discovery interfaces.
Search Engine Optimization
One of the essential elements in the deployment of a digital collection involves implementing a technical infrastructure that will ensure that its content is incorporated into Google and other search engines. In my April 2006 column, I described the process we followed to make the Vanderbilt Television News Archive's content available to search engines, which resulted in a dramatic increase in the use of the resource. We saw growth in the volume of search activity and in the number of loan requests, which in turn resulted in increased income from the fees we charge for the service.
The techniques of search engine optimization, which is frequently referred to as SEO in the ecommerce arena, can be applied to almost any digital collection to draw in new visitors. The basic technique involves making the contents of the collection available to the indexing process of Google and other search engines and ensuring an easy path into the resource as users click through the results found on the search engine. The scheme consists of two key components.
The first step in preparing a digital repository consists of creating a permanent URL for each item in the digital collection that deep links directly to its page presented by the system. It's important to have a unique and reliable link, free from session tokens and other artifacts. This URL allows deep linking into the repository for each individual item.
The second step involves offering a systematic way for the search engines to crawl through all of the URLs that represent the collection. The most efficient approach involves the use of the sitemap protocol, which presents a systematic description of the resources that are available on your site. Providing a sitemap greatly improves the efficiency in the way that the indexing bots navigate through the content on your site; it also gives you the opportunity to specify the highest-priority items to be indexed. The sitemap consists of one or more XML documents that list each of the unique URLs that represent items in your collection, the date it was added or modified, and its relative priority (see www.sitemap.org). The date information helps the search engine bots incrementally represent new content from your resource in their index without the need to constantly reharvest the entire set. The sitemap protocol was originally proposed by Google and is now used by all the major search engines. The sitemap file can be produced automatically, usually through a script that operates in conjunction with the software that you use to manage your digital collection. If the system that you use does not already have this capability, a programmer should be able to develop a script to build the sitemaps fairly easily. Once you have the sitemap files in place, you can place a reference in the robots.txt file on your server. Google allows you to register your sitemap through its webmaster's tools.
Another technique for improving the exposure of your collection is the generous use of RSS feeds. In today's world, RSS ranks as one of the top technologies for distributing content, and, properly implemented, it can serve as an effective tool for attracting traffic to your site. The basic strategy involves offering RSS feeds that will attract the interest of the potential users of your site. For a digital collection, feeds might include featured items that represent the best gems of the collection or those most recently added. It's also helpful to offer RSS representations of search results. This allows your users to subscribe to a feed that represents their specific area of interest and provides a mechanism to bring them back each time you add new content in that area. The tricks to using RSS to increase the traffic on your site are providing just enough information within the feed about each piece of content it describes to spark interest and offering a link back to the content's full representation on your site to complete the thought. Think of RSS as a syndication service to distribute content as well as an advertising ploy to entice potential users to visit your website.
Digital Collections in Next-Gen Discovery Interfaces
One of the major trends in the library automation arena involves the increasing adoption of a new genre of discovery interfaces that aim to provide improved access to library- supplied content. Some of the major products in this category include AquaBrowser from Medialab Solutions, Primo from Ex Libris, VuFind (an open source alternative) initially developed at Villanova University, Encore from Innovative Interfaces, Enterprise from SirsiDynix, Summon from Serials Solutions, and Vis ualizer from VTLS. I've been following these new products closely; I really appreciate the way that they present modern web interface techniques in the same way that libraries present their content to their users, offering features such as relevancy ranking of results, facets to guide users from broad results to specific items of interest, better visual presentation, "did you mean?" style suggestions, and user-supplied ratings, reviews, and tags.
The aspect of these products that I consider most important involves increasing the breadth of the content they address. On their websites, libraries had long offered web OPACs as their primary search tool for access to the content of their integrated library systems (ILS), including books,journal titles, and the like. Access to the content of articles would be provided through other means such as databases of citation indexes, A-Z lists of ejournal titles, or through a federated search platform. Libraries that offer digital collections or institutional repositories offer links to those resources as separate destinations.
These new discovery products embrace the concept of bringing the content of all these resources within a unified search environment. So far, it's been more of an ideal than an actual practice. Most of the implementations that I've seen of these new products continue to focus too narrowly on the contents of the ILS. They have only made baby steps into incorporating the larger realm of content represented in the other aspects of library content.
Libraries that have implemented one of these new discovery interfaces should be able to increase the exposure of their digital collections. In the same way that the MARC records from the ILS are fed into these products to sup- port searching for traditional library content, the metadata from digital collections and other repositories can be funneled into the indexes of these dis- covery platforms. By including records from digital collections in the results of the library's primary search tool, this content gains a much higher level of exposure. These new interfaces can be considered the first line of attack to pro- viding access to a library's resources for users, facilitating the process of help- ing users discover the many types of re- sources available to them. The inclusion of digital collections and other repositories in these discovery systems helps break the stereotypical concept that libraries deal only with books and magazines and shows users that we are involved with other types of content, such as image and video, that will also be of interest to them.
Creating digital collections involves a great deal of effort and can represent an enormous investment of resources, including personnel and technical infrastructure. As these resources are created, I think that it's important to realize that they can have an impact beyond the scope of their content - they can be an integral part of a larger strategy to improve the library's standing in the community it serves. They can serve to help draw in new visitors and reshape the image of the library as a modem and relevant information provider on the web.