I continue to be impressed with the ongoing improvements in library resource discovery services. These services aim to allow library users to access library resources with the same level of ease that they experience with the general Web through search engines like Google or Bing. But ease of use isn't the only concern. It's imperative that these discovery services deliver search results consistent with library values, which differ considerably from what applies in the commercial Web.
For library discovery services, the universe of content addressed corresponds to all of the material that a library considers its collection, or in some contexts it might be the collective body of material held across all libraries. It spans both representations of the physical items owned by the library; the material within electronic resources to which libraries subscribe; digital materials created by libraries, archives, or heritage institutions; or other material selected by librarians as appropriate for a library collection. Even though the search mechanisms may be “Google-like,” the scope of search excludes extraneous and unreliable content.
Given this ideal scope of all the material in the library's collection, the discovery services are working toward increasingly comprehensive coverage. To review the basics, the genre of Web-scale or index-based discovery services relies on access to the materials for the purpose of generating the index. Publishers of electronic resources, for example, would provide a discovery service provider one-time access to the full text of articles or to its corresponding metadata. Once the content is in the index, library patrons that use the discovery service would see these items in search results and when selected, items would link to the content provider's server for access. Not all providers are able or willing to provide copies of their content to the creators of discovery services, resulting in some gaps relative to the ideal scope of discovery. Fortunately, these gaps are closing. I observe a fairly steady stream of announcements of partnerships between content providers and developers of discovery services. With many of the general resources already covered, many of the recent announcements involve foreign language or specialized materials. I'm optimistic that gaps in coverage in the discovery services will continue to narrow.
The other area where the state of the art in library discovery is gaining definition is in relevancy ranking and techniques to guide users toward the most important or interesting items. Commercial search engines have a lot of advantages in this area. The sheer popularity of an item as gauged by the number of times selected in results, the number and quality of in-bound links, and other factors together contribute to sequencing the result candidates that match a user's query term. They benefit from very sophisticated relevancy algorithms and vast amounts of use data. The commercial search engines also track, or at least infer, lots of data about the person entering the query, such as physical location, search history, and other behaviors within the search environment, and often beyond it. When taken together, all these factors yield almost magical relevancy ranking so that the desired item almost always appears at the top of the results list. But the values behind the search process are commercial, mostly related to optimizing ad revenues.
We expect library discovery services to operate on the basis of an entirely different set of principles or values. Relevancy must be calculated according to such things as scholarly or literary value and alignment with the interests or discipline of the searcher. Some of the techniques of the general search engines may be helpful, but these library-oriented discovery services must also exploit other clues as they guide users to content.
It's important to get beyond sole reliance on matches of the keywords in a user's query. In many cases, this can lead to poor results. From a keyword perspective alone, secondary or derivative materials often seem stronger matches than the related primary work. I often use a simplistic “Harry Potter” search example to observe relevancy. In the absence of use-related factors, the results will usually display the analytical materials and obscure formats ahead of the J.K Rowling novels. By counting word occurrences, books about Harry Potter will seem disproportionately relevant. Adding consideration of even simple measures such as the number of copies held or circulation frequency would easily identify the items of more likely interest.
The current phase of development of library discovery services appears focused on increasing the sophistication of relevancy and tuning the interfaces to deliver better results from library collections. This issue of Smart Libraries Newsletter includes a feature on Summon 2.0, which employs a variety of interesting techniques to improve the quality of its search performance.
We can be sure that the other competitors in Webscale library resource discovery services are also refining and enhancing their products. The ScholarRank technology developed by Ex Libris as the basis for Primo's relevancy ranking comes to mind. ScholarRank technology makes use of inferences mined from link resolver logs and a variety of other factors to calculate relevancy based on scholarly importance.
Today discovery services exist mostly as an interface that is somewhat separate from the core automation system used in a library. For access to the local physical collections, the online catalog associated with the ILS remains available and is often preferred by librarians or expert users for accessing these materials. Discovery services, lacking some of the nuances and structure of online catalogs, in many cases have not won over these users. I anticipate a time when online catalogs will become extinct. It's imperative that the capabilities of discovery services satisfy all types of users by the time these transitions take place. The pace of development in this area gives me some encouragement that the state of the art of discovery services will ultimately meet the expectations of librarians and the full spectrum of library users.