Library Technology Guides

Document Repository

Discoverability: the Ultimate Goal

.

Copyright (c) 2014 Association for Library Collections and Technical Services, American Library Association

Abstract: Libraries have a long history of employing technology to make their collections available to their users. A succession of automation system components, interfaces, and search and retrieval technologies have been employed by libraries as ever more powerful means to bring their collections to library users. This progression, both in terms of finding materials and in gaining access to them, has continually expanded in the scope of content addressed and in better ease of use. Each of these advances in patron interfaces for discovering library collections has benefited from formal or informal standards or best practices that bring agreement and alignment among diverse technological and operational practices. The problem of providing access to library content, when distributed among different organizations employing different computer systems, data formats, and communications protocols demands finding common ground. Groups, such as NISO, that facilitate the development of standards, protocols, and recommended practices have been essential in each phase in the history of library discovery and delivery.


Introduction

Libraries have a long history of employing technology to make their collections available to their users. A succession of automation system components, interfaces, and search and retrieval technologies have been employed by libraries as ever more powerful means to bring their collections to library users. This progression, both in terms of finding materials and in gaining access to them, has continually expanded in the scope of content addressed and in better ease of use. Each of these advances in patron interfaces for discovering library collections has benefited from formal or informal standards or best practices that bring agreement and alignment among diverse technological and operational practices. The problem of providing access to library content, when distributed among different organizations employing different computer systems, data formats, and communications protocols demands finding common ground. Groups, such as NISO, that facilitate the development of standards, protocols, and recommended practices have been essential in each phase in the history of library discovery and delivery.

Discovery represents the critical functionality of enabling users to know what material is available to them through their library. But with discovery also comes the responsibility for fulfillment. It is, therefore, essential that any discovery operation also include an effective set of services that makes the items available to the user, either through facilitating the delivery of a physical resource, by linking to or direct presentation of electronic material, or other appropriate actions.

This chapter will review each of the major generations of tools that libraries have offered through the history of automated systems, highlighting the standards, protocols, or recommended practices that contributed to their effectiveness or efficiency. The discussion begins with the online catalogs associated with the integrated library system, which once addressed library collections fairly comprehensively, but have since become somewhat relegated to specialized interfaces due to the proliferation of digital content. Union catalogs and consortial borrowing systems foster collaboration among groups of libraries, offering a broader universe of discovery along with the challenge of employing effective technologies for patron requests and fulfillment. Metasearch technologies became a pragmatic tool to offer patrons a simplified way to access the growing number of electronic resources offered by libraries, though with some significant limitations. A genre of discovery interfaces improved the functionality of the ILS-based online catalog, often with the capability to expand the scope of search to article collections via metasearch. In the latest and current phase, index-based discovery services have emerged with an ambitious intent to provide access to all the components of library collections, including electronic and print materials through a centralized index. Each of these phases of discovery has built upon the technologies of its predecessors and, benefited from the cumulative body of standards efforts, and prompted new initiatives to improve issues inherent in each progressive approach.

Online Catalogs

Functional context

The online catalog module of the integrated library system (ILS) provides public access to the content it manages and corresponding services. The ILS was originally designed to manage a reasonably complete representation of library collections, emerging at a time when collections were composed entirely of physical materials. For print materials, the ILS provides a very detailed set of tools for acquisition, description, management, and fulfillment.

As library collections came to include increasing proportions of digital content, the ILS mostly retained its focus on print materials, with other applications developed to handle the management and access of non-print formats.

The functionality of the online public access catalog, or OPAC in common library parlance, varies among the different products, though with a large set of features generally available across all. Expected features include:

  • Keyword search, with the ability to specify a general search across all fields or to select that the terms be limited by categories, such as author, title, subject, date, publisher, library location, or other designators.
  • Boolean search capabilities, allowing users to create advanced queries that include or exclude combinations of terms and fields to identify a precise set of results.
  • Collection browsing according to structured indexes, usually corresponding to standard authority files, including names and subjects. Many systems also include the ability to browse by call numbers or other structured fields.
  • Viewing lists of records, with selected information displaying in the brief listing and, when the patrons selects a record, a more complete view of the metadata along with appropriate request features.
  • Patron self-service capabilities, including the ability to view or modify personal profile details such as postal addresses, e-mail, and phone number, to set preferences, view lists of materials currently charged, request renewals of items, and place holds of items of interest currently charged to other patrons.

The OPAC interface has evolved through the different generations of integrated library systems. The early systems based on mainframes or mini computers offered text-based terminal interfaces operated through cryptic commands or menus; with the advent of client/server computing, online catalogs took the form of graphical user interfaces based on environments such as IBM OS/2, Microsoft Windows, or the Apple Macintosh. Today, all online catalogs rely on interfaces that operate through Web browsers. But though the style of presentation has evolved through interface technologies, the conceptual approach and scope remained relatively constant.

Supporting standards

As dedicated modules of proprietary applications, online catalogs generally operate outside of the realm of standards. They do not interoperate with external or third-party systems, but make use of any pragmatic programming techniques needed to gain access to the databases and functionality of their associated ILS. Online catalogs are generally tightly bound to the ILS of a given vendor and are not expected to operate with competing products. In some cases, a vendor with multiple ILS products may develop a common online catalog interface, again using proprietary programming interfaces rather than standard protocols. We will see below that the genre of discovery interfaces are designed to operate across ILS products of different vendors using standard protocols or connectivity layers.

While online catalog connectivity remains in the realm of proprietary programming, they indirectly rely on standards implemented within the underling ILS. These standards would include:

  • MARC information exchange format (ANSI/NISO Z39.2 and ISO 2709 )
  • MARC 21 Format for Bibliographic Data
  • MARC 21 Format for Authority Data
  • MARC 21 Format for Holdings Data
  • Anglo American Cataloging Rules, Second Edition (AACR2 )
  • Functional Requirements for Bibliographic Records (FRBR)
  • Resource Description and Access (RDA)

Other library-specific protocols in play might include Z39.50 , used to retrieve MARC records from bibliographic utilities or from other libraries. Most online catalogs do not use Z39.50 as the mechanism for communicating with their underlying ILS, but rather tend to use proprietary mechanisms. Nonetheless, the existence of the Z39.50 standard allows disparate ILS systems to respond to queries from each other.

The ILS may optionally make use of general database access syntax such as Structured Query Language (SQL) or database connectivity layers such as Open Database Connectivity (ODBC).

The presentation of the browser-based online catalog interface will follow World Wide Web Consortium (W3C) standards, such as:

  • Hyperext Markup Language (HTML )
  • Cascading Style Sheets (CSS)
  • Hypertext Transfer Protocol (HTTP) for clear-text transport of pages across the local network or the Internet
  • Secure Sockets Layer (SSL) for the secure and encrypted delivery of webpages with sensitive content
  • Transmission Control Protocol /Internet Protocol (TCP/IP) , v4 or v6, for the low-level transport of data

Online catalogs, as well as any other patron interface product, may be required to adhere to standards or recommended practices that make them accessible to persons with disabilities, such as Section 508 of the Rehabilitation Act.

Authentication issues

To the extent possible, libraries prefer to avoid making patrons log in multiple times to gain access to the resources and services to which they are entitled. In an academic library environment, for example, libraries may want to take advantage of the authentication service provided as part of the campus infrastructure, allowing a single set of credentials to provide access to electronic mail, courseware or learning management systems, student accounts, and library-provided services. Such a unified login capability requires that the ILS and other library applications be able to rely on an external authentication service rather than credentials, typically a username and PIN, stored in a patron record.

Some of the common protocols used for external authentication include:

  • Lightweight Directory Access Protocol (LDAP)
  • Active Directory , a proprietary set of authentication and network services from Microsoft
  • Central Authentication Service (CAS)
  • Shibboleth, a federated identity protocol used especially in scenarios where the authentication might cross organizational boundaries

In some cases, the library system may serve as the authoritative authentication service. In scenarios such as direct consortial borrowing, automated interlibrary loan brokering systems, or document delivery, external applications may depend on the ILS to validate the credentials of a user. Protocols commonly used to support ILS-based authentication include the Standard Interchange Protocol (SIP) and the NISO Circulation Interchange Protocol (NCIP). SIP was originally developed by 3M Corporation in support of self-service and related products, and has been adopted widely throughout this industry sector. NCIP defines a set of transactions related to the communication with the circulation module of an ILS for resource sharing, self-service and other use cases. In 2012, ongoing responsibility for SIP was transferred to NISO by 3M and as of this writing is in the process of formal consensus standardization.

Implementation Examples

Every ILS available will include one or more online catalog modules, often given their own marketing name. Examples of the major US and international vendors and their respective online catalog offerings include:

  • SirsiDynix® online catalog products include the e-Library™ online catalog for Symphony® and iPAC for Horizon™. Enterprise® is a discovery interface now offered for either ILS.
  • The Library Corporation (TLC) offers the LS2 PAC that operates with both its Library•Solution®and Carl•X™ ILS products.
  • Ex Libris® offers the WebVoyage online catalog for its Voyager® ILS and the Aleph Web OPAC for Aleph®.
  • Innovative Interfaces, Inc. offers WebPAC Pro as the online catalog module for its Millennium™ ILS. The company has also developed Encore™ as a discovery interface.
  • Polaris Library Systems offers PowerPAC™ as the online catalog for its Polaris ILS.
  • VTLS offers online catalog products for its Virtua ILS, including the original Chameleon, and the Chamo discovery interface.
  • The open source Evergreen ILS™ includes the TPAC dedicated online catalog.
  • The open source Koha® ILS includes a dedicated online catalog.

It should also be noted that some automation products, especially those in the new Library Services Platform category, do not include traditional online catalog products but are designed to operate with a discovery service for patron interaction.

The technical structure of the ILS relies on functional modules that use proprietary programming to communicate with shared databases. The proprietary connections among modules result in the ILS as a monolithic platform: modules cannot be mixed and matched among competing vendor offerings. It is not possible, for example, to follow a “best-of-breed” approach, assembling an ILS based on the modules considered to have the best functionality among the competing products. Standard protocols have not been developed that govern the communication among ILS modules.

While online catalogs provide the starting point for the development of computerized library discovery, they generally remain in the realm of proprietary interactions with their associated ILS products without the need for standards in the way that they communicate with their internal architecture. But as libraries sought broader approaches to the discovery of library collections, both across different organizations and institutions and for deeper access into material types not directly managed within the ILS, there are increasing needs to move away from proprietary interactions to standards independent of any given product's internal architecture.

Union Catalogs and Consortial Borrowing Systems

Functional context

The ILS has traditionally been implemented to provide automation and discovery for the collection of a single library organization, including those that include multiple facilities, consortia of multiple independent library organizations, or regional systems that serve the libraries within a given geographic area or governmental entity. Whether the ILS serves one library or many, its online catalog remains directly tied to the content it manages.

In order to support resource sharing activities among groups of libraries, each of which may already be automated with its own ILS, union catalogs provide a wider universe of discovery, usually equipped with additional functionality for managing requests and routing materials. A union catalog includes the bibliographic records and holdings information from each of the ILS implementations of the participating institutions. There may be some scenarios where libraries designate specific portions of their collections as not available for sharing with partner institutions, which may likewise be excluded from union catalogs.

Union catalogs provide the discovery portion of a resource sharing environment. Once a patron discovers an item of interest, other components will manage the request and support its fulfillment. Some resource sharing environments might depend on library personnel to manage the fulfillment of the request. A webform presented to the user would generate a message to library personnel at the respective libraries, who would then manually process the fulfillment of the request. In a high-volume environment, a manually operated resource sharing arrangement would be difficult to sustain.

In an unmediated request arrangement, the resource sharing transaction can be entirely automated, eliminating the need for staff to be involved in most aspects of the fulfillment process. An automated unmediated consortial borrowing environment might perform a series of actions something like the following sequence:

  • Authenticate the user to validate eligibility to place the request.
  • Present a request form that allows the user to make any relevant selections about the request, such as the preferred pick-up location.
  • Initiate a hold transaction to the circulation module of the ILS of the library that owns the item.
  • Generate any needed messages to library personnel.
  • Library personnel at the owning location would pull the item and route it to the library associated with the patron making the request.
  • Patron will be notified the item is available.
  • Item is charged to the patron and picked up.
  • Standard circulation features and notices take effect.
  • Patron returns the item.
  • Item is routed back to the original owning library.

In this direct consortial borrowing workflow, library personnel are only involved in the essential material-handling tasks such as pulling requested items from the shelves, routing to the receiving library, routing returned items and then re-shelving them .

At least two different strategies have been widely used for creating union catalogs, including one approach that creates a centralized combined bibliographic database and another that used federated search technology to form a virtual combined catalog.

Physical Union Catalogs

One model for creating a union catalog relies on a centralized database populated from each of individual ILS implementations of the participating organizations. Each of these systems would export all of its records for the initial load, and then would provide incremental updates of added, removed, or modified records on a routine schedule—or have a mechanism for synchronizing changes made in real time.

A centralized database populated with records from each of the constituent ILS implementations can provide very fast performance and can scale to handle very large numbers of records. The disadvantage lies in complexity and overhead of its maintenance. Records are inherently duplicated between bibliographic databases for each ILS and for the union catalog platform.

Virtual Union Catalogs

An alternative approach involves not creating a physical consolidated catalog, but rather providing union catalog functionality through a simultaneous search of each of the participating ILS instances. This method avoids the overhead in maintaining a centralized bibliographic database and keeping it synchronized with each ILS. Since each ILS is searched in real time, the results always reflect current holdings. The main disadvantage of this approach lies in performance and scalability. Casting a live search among target systems becomes increasingly unwieldy as the number increases.

Supporting standards

Virtual union catalogs require a standard search and retrieval protocol for presenting a query and retrieving records across a network of interconnected ILS systems. Having to support simultaneous search using a variety of proprietary techniques native to each ILS system would be incredibly complex and would likely not yield very consistent results.

  • Z59.509 was developed as a standard search and retrieval protocol for MARC-based bibliographic systems, though as we will see below, it can also be used with other metadata structures.
  • NISO Circulation Interchange Protocol,24 or NCIP, provides support for the request and fulfillment transactions. NCIP is designed to automate the interaction with the circulation system of an ILS, with a variety of transactions defined relating to patron and item records.
  • ISO 10160 / ISO 10161, commonly known as ISO ILL can be employed to route requests to an external interlibrary loan system that cannot be satisfied by the institutions participating in the consortial borrowing group.

These standards—developed in the context of bibliographic data exchange, consolidated library catalogs, and resource sharing networks—have allowed library automation systems created by different vendors to effectively interoperate for discovery and resource fulfillment.

Implementation Examples

The genre of virtual catalog and direct consortial borrowing systems has narrowed over the years as major products have been withdrawn. Some of the current products in this area include the following:

  • Index Data has created the Pazpar2 metasearch middleware and the MasterKey discovery platform which can be used to create a virtual catalog or a local index of harvested metadata.
  • Relais International offers the Relais D2D (Discovery to Delivery) platform that uses Index Data's MasterKey components for discovery and Relais' own request management system for unmediated consortial borrowing or interlibrary loan.
  • Innovative Interfaces, Inc.29 developed INN-Reach to provide centralized search and unmediated consortial borrowing capabilities. INN-Reach was originally developed to connect together a network of Millennium systems but now supports other ILS products through NCIP transactions.
  • Auto-Graphics, Inc. offers the SHAREit™ resource sharing environment, which can be deployed for configurations using either physical union catalogs or virtual catalogs.
  • OCLC® offers the WorldCat Navigator™, which is based on the ZPORTAL™ and VDX™ (Virtual Document eXchange) originally developed by a company called Fretwell-Downing Informatics that it acquired in 2005.

Metasearch: Portals to Aggregate Library Resources

Functional Context

For the last decade or so, libraries have been making major investments in subscriptions to electronic resources, including aggregated databases of articles and e-journal collections. In the early phase, when the numbers of these resources were modest, libraries might present lists of the major packages to which they subscribed on their website, and would create different finding aids to help guide researchers to the resources that might contain articles or other information related to the topic of interest. As these resources proliferated, the research process became quite complex, with library users needing to navigate through dozens, if not hundreds, of packages. These resources each come with their own search interfaces, which often work quite different from one to another.

At the same time, search engines on the Web, especially Google, created a general expectation from a user's perspective, that a single search box should be able to provide access to a vast body of information. The general web search engines are easy to use, requiring no training or instruction, and present results using relevancy rankings that generally succeed in presenting the items of most interest at the top of the result list. Libraries naturally sought to offer their users the same power and ease of use for their collections of resources.

Following the same model established for virtual union catalogs and other consolidated bibliographic search services, a model of metasearch was developed to provide a simplified way for library patrons to search multiple electronic resources at once. These metasearch platforms accept a search query from a user, cast it to the selected resource targets and then receive, collate, sort, and present the results. This approach requires real-time communications with the resource targets, with each parallel session transmitting the query and receiving the response. Only a limited number of these parallel communications streams can be maintained simultaneously, with each subject to the relative performance of the network and target servers.

For libraries with very large numbers of total information resources, and with metasearch technology practically limited to a handful of simultaneous connections, these platforms would often require the creation of groups of targets based on subject categories or would use a default set of general interest databases.

One of the key challenges in this method lies in the communications protocol used between the metasearch platform and the information targets. In the early days of metasearch, few of the information resources to which libraries subscribed offered a mechanism designed for computer-to-computer communication. The metasearch platforms would therefore basically need to mimic a session as if a person was accessing the target service via a web browser. This process of session emulation would send queries as if sent via the native interface and capture the webpages issued in response. The metasearch engine would then need to parse these webpages—designed for humans to view— into records by reverse engineering the HTML coding of each page delivered. While it is possible to predict the structure of these pages, even the slightest changes made by the information provider would cause the parsing algorithms to fail and require reprogramming.

Part of the metasearch framework included creating modules, usually called connectors, programmed with all the data needed to communicate with each of the resource targets. Each connector would describe any protocols supported by the target and the rules needed to parse the server responses into discrete records. For those that use session emulation, connectors would be programmed with the markers needed to identify records associated with the response and each field within the record. Connectors have to be defined for each of the potential targets in a metasearch environment and maintained as needed to accommodate changes affecting the session parsing algorithms. The development and maintenance of connectors became a specialized service provided to metasearch platform implementers. While connectors eventually were implemented in code packages that would work with metasearch engines from different vendors, no formal standards emerged in this arena.

Metasearch also introduced new demands on the servers of information providers. In addition to the sessions conducted directly by users, the servers also had the additional set of sessions conducted via metasearch platforms. For services that might be commonly configured within a general search group, the servers may be asked to respond to almost every search conducted through that library's metasearch environment. Out of the many searches routed to information provider's servers via metasearch engines, only a small portion result in content selections by users.

As metasearch products became more widely implemented in libraries, it become desirable to create more efficient and less fragile ways for them to communicate with resource targets. Instead of the session emulation technique, many information providers began to offer access to their content via more efficient protocols. Some were able to establish responders using existing search and retrieval protocols such as Z39.509 or SRU (Search/ Retrieve via URL), which provides the functionality of Z39.50 using modern web-oriented protocols such as SOAP (Simple Object Access Protocol) or REST (Representational State Transfer). In addition to these established protocols, some providers offered specialized responders, often called XML gateways, designed to accommodate requests from metasearch platforms.

NISO Metasearch Initiative

Metasearch provided a pragmatic approach to providing library users access to content from a variety of providers. Yet, no standard protocols were available to support the technical interactions between metasearch platforms and resource targets. The expanding user of metasearch brought considerable implications for the realm of organizations involved, including metasearch platform developers and information resource providers, as well as the libraries that implement these products.

To address a diverse set of issues that arose out of this approach, the NISO Metasearch Initiative was launched, beginning in May of 2003. The initiative included a variety of activities including a strategy meeting (May 2003), a workshop (October 2003), and the establishment of workgroups on Access Management, Collection and Service Descriptions, and Search/Retrieve. A survey was conducted by the Search and Retrieve task group to gather information from content providers regarding awareness, policies, and technical details related to support for metasearch.

The NISO Metasearch Initiative did not attempt to create one all-encompassing standard, but rather worked to define standards and recommended practices in multiple areas of interest. Standards developed included Collection Description Specification (NISO Z39.91-200x) and Information Retrieval Service Description Specification (Z39.92-200x). Recommended practices issued include those for authentication and access methods (NISO RP-2005-01 ), Results Set Metadata (NISO RP-2005-02 ), Citation Level Data Elements (NISO RP-2005-03 ), and the NISO Metasearch XML Gateway Implementers Guide (NISO RP-2006-02 ).

Implementation Examples

Metasearch tools oriented to libraries began to emerge in about the mid-1990s, with the popularity of the products taking off around 2001.

  • Ex Libris 28 developed a metasearch product named MetaLib in July 2000.
  • Deep Web Technologies™ offers the Explorit™ federated search platform. Its technology is used to power portals with specialized content, such as Science.gov and the xSearch service offered by Stanford University Libraries.
  • Serials Solutions® introduced its metasearch platform in January 2005, originally named Central Search and reengineered and rebranded as 360 Search in December 2009.
  • Auto-Graphics 37 offers the AGent Search platform, which has been available since about 2001 and has been implemented for large-scale projects such as access to statewide databases in Connecticut.

  • EBSCO Information Services offers a federated search capability called EBSCOhost Integrated Search that can be used to extend EBSCOhost® and EBSCO Discovery Service with results from additional resources not otherwise available on those platforms.
  • MuseGlobal developed its MuseSearch technology as early as 1993. The company did not sell directly to libraries, but rather licensed its technology to ILS vendors and other technology companies interested in offering metasearch platforms. MuseGlobal developed expertise and capacity for the creation of connectors for the large number of information resources of interest to libraries and also exploited its technologies in other industries outside the library arena.

A few federated search products have been discontinued, largely through business acquisitions.

  • WebFeat developed metasearch technology that it sold both directly to libraries and consortia as well as through licensing arrangements with ILS vendors. WebFeat was acquired by ProQuest and merged into Serials Solutions in April 2008. Serials Solutions has since discontinued the WebFeat platform, incorporating some of its technologies and migrating its customers to its own 360 Search metasearch product.
  • Endeavor Information Systems developed a federated search platform called ENCompass for Resource Access. This product was discontinued in 2006 following the companies' merger with Ex Libris.

Products based on metasearch technologies continue to see some use today, although interest in this approach is waning as index-based search gains steam as the preferred approach for broad-based discovery of library resources. Metasearch continues to be well regarded in some areas, especially those involving specialized information with a limited number of potential resource targets.

Next-Generation Online Catalogs

Functional Context

Another thread of development resulted in the creation of a genre of what have been called discovery interfaces or next-generation library catalogs. These products break away from the model of the online catalog as a module of an ILS. They address a broader range of locally-managed content based on an index of harvested metadata, often supplemented with metasearch technology to include external subscription-based electronic resources.

These discovery interfaces generally include features such as faceted navigation, more visually appealing interfaces than the traditional catalogs (often dressed up with cover art images), and enhanced metadata such as tables of contents, summaries, or reviews. Most are based on modern search and retrieval technology, such as Apache SOLR™, that enables rapid, relevancy-based searching, the use of facets to narrow search results, and other features not necessarily available in ILS online catalogs. The index of a discovery interface would include the MARC records from the library's ILS, as well as metadata from other sources such as institutional repositories, digital collections, or other content resources managed by the library.

While addressing a broader range of content, the discovery interfaces must also offer many of the same features to patrons as the online catalog module of the ILS. These features would include the ability to view the current status of an item (such as whether it should be on the shelf or is checked out to another patron), place holds on materials, view and manage a user's account details, list items currently charged, and request renewals. In general all the item status and patron self-service features should be available.

ILS-DI: Integrated Library System – Discovery Interface Specifications

These discovery interfaces are usually intended to work with any ILS, so they must rely on standard protocols or APIs, yet be able to provide functionality that online catalogs accomplish through proprietary programming. Connecting discovery interfaces to integrated library systems pressed the need for a generalized model of data exchange and real-time system interactions.

The Digital Library Federation, which has since been subsumed into the Council on Library and Information Resources (CLIR), charged a task group that began work in August 2007 to propose specifications related to the interoperability between discovery interfaces and integrated library systems. The ILS-DI task group published a Technical Recommendation in December 2008 that describes “an API for the effective interoperation between integrated library systems and external discovery applications.” The API included multiple levels of interoperability, beginning with Basic Discovery Interfaces that provide a new interface for search and retrieval, but rely on invoking the existing online catalog of the ILS for availability and patron features. Each increasing level brings in more functionality from the online catalog to the discovery interface, up to Level 4 which provides a robust set of functionality without the need for users to interact with the native online catalog of the ILS.

Examples of functions described in the API include:

  • HarvestBibliographicRecords
  • HarvestExpandedRecords
  • GetAvailability
  • GoToBibloGraphicRequestPage
  • LookupPatron
  • AuthenticatePatron
  • GetPatronStatus
  • RenewLoan

The ILS-DI specification made use of existing standards or protocols, when applicable, as the “bindings” that could be used to implement a given function. The HarvestBibliographicRecords function, needed to harvest and synchronize records from the ILS into the index of the discovery interface, could, for example, use OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). In other cases, such as GetAvailability, a RESTful web service was defined as the recommended binding rather than NCIP.

In the five years since the ILS-DI specification was issued, the kind of functionality it describes, even at the highest level, is routinely implemented in discovery interfaces. Following the April 2009 merger of the Digital Library Federation into CLIR, the ILS-DI specification has not been updated nor is there any specific organization formally charged with its maintenance. The ILS-DI today serves more as a reference model for the functionality expected in a discovery interface and the interoperability needed with the ILS. The mechanisms for the interoperability used today may differ.

Implementation Examples

A variety of products have been created that fall within the genre of discovery interfaces, including both proprietary and open source options.

  • Medialab solutions, based in Amsterdam, The Netherlands, developed AquaBrowser Library as a discovery interface designed to operate with any ILS. One of its prominent features is a word cloud of concepts reflected in a search result to explore related terms. AquaBrowser® was acquired by Bowker in 2007 and was subsequently shifted within ProQuest to its Serials Solutions division.
  • Ex Libris launched Primo® as its discovery service in 2006, designed for academic and research libraries. Primo Central was extended with a discovery index in 2009.
  • Innovative Interfaces, Inc. offers Encore as its discovery interface that features a single search box. Encore was enhanced with the ability to include access to article databases in 2010 through a set of web service APIs. In 2013 Innovative struck a deal with EBSCO to integrate the EBSCO Discovery Index into Encore.
  • BiblioCommons offers a discovery interface (hybrid with index-based discovery) geared toward public libraries, incorporating a variety of social networking characteristics.
  • SirsiDynix offers its Enterprise discovery interface which can be used with either its Symphony or Horizon ILS products. Enterprise is not currently used with ILS products from other vendors.
  • Axiell, a library automation vendor that operates primarily in the United Kingdom and Scandinavia, offers Axiell Arena, which includes the features of a discovery interface, but is also able to provide a comprehensive web presence for a public library.
  • Infor™ Library and Information Solutions developed Iguana as both a discovery interface and library portal platform for public libraries.

Index-Based Discovery

Functional Context

Building on all of the approaches that have come in earlier phases, index-based discovery aims to provide the widest scope and most granular level of access to library collections. The general strategy employed by these discovery services is the use of a centralized index built from representations of all the components of a library collection, including content from the ILS and other local repositories, the digital collection, and the article-level databases and e-journal packages to which a library subscribes. This index-based model provides immediate access to search results, without the target limitations and performance bottlenecks of metasearch. Indexing all the content centrally also enables more sophisticated techniques for calculating the relevancy of results compared to what was possible in metasearch platforms.

Library users no longer have to search with the native interface of each of the many resources available to them through their library, but instead are able to use a single interface that addresses the content in all—or at least most—of the collections. The native interfaces may continue to be used by those who want to take advantage of some specialized features or want to continue to work with a select number of resources appropriate for their discipline. Index-based search supplements, but does not necessarily replace, the specialized native interfaces offered by each of the content providers.

The library-oriented index-based discovery services aim to deliver many of the qualities of the general search engines for the Web, though scoped to the content available through the library and tuned for access to academic or scholarly materials. The general search engines have shaped expectations that library patrons have in how to access information, particularly the provision of a single search box to query all available content. A technical architecture based on a single index promises to overcome many of the limitations of previous library discovery models and to meet user expectations for a fast and easy way to search. In earlier phases, it was not deemed feasible to create such a vast index, but with today's technology such an ambitious approach is well within practical limits. This genre of products are often termed “web-scale discovery,” reflecting the qualities shared in common with the general search engines.

The index-based discovery model offers the potential for delivering results to library users much more rapidly than possible through metasearch technologies. This approach also relieves the servers of the content providers of processing numerous real-time search queries broadcast from metasearch engines. For searchers, making use of the library's discovery service rather than a resource's native interface shifts the processing overhead from real-time demands on operational servers to batch transfers of metadata that can be managed through non-production servers or off-peak periods.

To effectively represent a library's collection, the index needs to be populated with quite a variety of content components:

  • Bibliographic records from the library's integrated library system

    To represent the library's physical holdings, the index-based discovery services follow much the same model as the earlier model of discovery interfaces, generally following some level of the ILS-DI specifications.

  • Metadata records from local digital collection platforms, institutional repositories, or other local information systems
  • Content from each of the library's electronic resources, including aggregated databases of article collections, individual e-journal subscriptions, reference resources, and abstracting and indexing services
  • Open access content, including materials from commercial packages outside of the library's subscriptions

For each of these categories, the material indexed includes metadata records, such as MARC bibliographic records or citations. When possible, the full text of the items can also be indexed, which can greatly expand the search capabilities. Indexing the full text of materials increases the size of the index and requires a more sophisticated approach to calculating relevancy in search results so the sheer number of keyword matches does not inappropriately skew results.

Building a comprehensive index requires large investments in technology infrastructure and in the allocation of personnel. The technology platform needs to include servers and storage scaled to support a massive index and the search activity of large numbers of simultaneous users. Personnel are required for technical tasks such as software development, interface design, or index maintenance. Other activities include executing the business processes necessary to develop agreements with the entire universe of organizations that provide content to libraries, coordinate the transfer of data and metadata, and perpetually synchronize the index among all the many sources. So far, the creation of index-based discovery services has resided within the domain of large, frequently commercial, organizations. Such an endeavor would be beyond what most individual libraries or consortia would have the resources to carry out.

The discovery index ideally represents the broad body of content of interest to many different library customers, but when a patron performs a search through the discovery service, results need to be scoped according to the subscriptions and collections held by the user's particular library. A key aspect of the implementation of a discovery service is defining the profile of available content. The knowledgebase of e-journal holdings and library subscription profiles are generally integrated into the functional infrastructure of index-based discovery services. The discovery service often ties into the components in place for the access and management of electronic resources, especially those related to OpenURL-based link resolution. The profiles associated with the institution's link resolver can generally be applied to govern the scope of search of an index-based discovery service.

Dynamics between Content Providers and Discovery Services

Index-based discovery relies on creating and maintaining an index that represents the totality of a library's collection—print, electronic, and digital components. One of the most difficult challenges in the creation of index-based discovery services lies in obtaining citation metadata and even the full text of each of the articles, book chapters, and other materials represented in the broad body of subscription products and open access resources available to libraries.

This model of search depends on an arrangement where content producers provide a copy of their information assets to discovery providers solely for the purpose of indexing, not for resource delivery.. A developer of an index-based discovery service would ideally work with the organizations that supply content to libraries to receive citations or full text of all the material to be processed into its index. Once represented in the index, the material will be included in relevant search results, subject to a library's content licenses. When an item, such as a journal article, is selected for full-text delivery, a link is presented to the copy that resides on the publisher's server. Discovery services do not aim to re-publish content, but rather to provide an efficient way to bring users to that content. The success of this search model depends on partnerships where library-oriented content providers provide copies of their materials to discovery service creators, with safeguards in place to protect that content from unauthorized access.

One of the critical concepts in the index-based discovery service ecosystem relates to mutual subscribers. The materials that a content provider contributes to a discovery service creator should be made available only to patrons associated with a library that subscribes to that product. It is essential that discovery services not provide access to licensed materials beyond what they are entitled to through their subscriptions.

To achieve the goal of complete comprehensive coverage of library collections, all content providers would need to cooperate with the discovery service creators. Yet, participation is not currently universal. Many publishers do not see the benefit of having their materials represented in discovery services or may have other reasons for not contributing their materials.

Open Discovery Initiative

The NISO Open Discovery Initiative was established to improve the ecosystem of index-based discovery services. Its work explores the possibilities for defining a more transparent, consistent, and efficient set of interactions between the creators of these services and the content providers whose resources they represent—to ultimately benefit the libraries that acquire these services and the library patrons that use them. Some of the complications that have arisen in the index-based discovery arena include the uneven participation by content providers, the difficulty that libraries experience in determining the content covered in a service, knowing whether search results are presented objectively without bias toward a given publisher, consistently measuring usage, the provision of materials by each content provider in different formats and in varying levels of completeness, and the lack of standard mechanisms for the transmission of data between content providers and discovery service creators.

Several major products have been developed based on the model of indexed search, as listed below in the implementation examples. Their indexes have been built based on private agreements and pragmatic practices between content providers and discovery service creators. The adoption of these index-based discovery services has become relatively wide-spread in many libraries. These products represent a significant investment for libraries and thus it increasingly becomes in the broader interest of the community to address some of the issues that hinder their impact.

Libraries have come to expect the content to which they subscribe to be made available within the discovery service in which they have invested. The effectiveness of these tools is diminished when pockets of library resources are not represented or when the scope of content covered or the depth of indexing is not transparent.A clear understanding is needed of the content coverage and level of indexing libraries can expect in any given discovery service—specifically, what articles are available and whether they are indexed in full text, by citations only, or both; whether the indexing is accomplished from structured metadata or directly through the full-text; and when thesauri, structured vocabularies, or abstracts are included in the index.

To explore interest in forming a group to address these issues, an invitational exploratory meeting was held at the American Library Association Annual Conference in June 2011. Participants agreed that an initiative would be beneficial to the community and a proposal was subsequently developed and presented to NISO to create a workgroup to address issues in this arena. The proposal was accepted by NISO and the group formed under the Discovery to Delivery Topic Committee.

The Open Discovery Initiative (ODI) includes three key stakeholder groups—content providers, discovery service creators, and libraries—both in the composition of its working group members and in the topics addressed. The ODI aims to address the issues of identifying and mitigating the barriers to participation of content providers in contributing their materials, providing transparency related to what products are included and at what level they are indexed in a given discovery service product, and providing assurance to both libraries and publishers that discovery services do not bias results in favor of any given provider's content.

The main output of the Open Discovery Initiative will take the form of a Recommended Practice, expected in the Fall of 2013, that will provide an overview of the discovery services domain, identification of critical issues, and recommendations on streamlining the process by which information providers, discovery service providers, and librarians work together to better serve libraries and their users. One of the key activities in the information gathering phase involved the creation and execution of a survey that summarizes perspectives on current discovery services from each of the stakeholder groups . The work of the Open Discovery Initiative is anticipated to be complete by the time this book is published. The ODI webpage should be consulted for the latest information.[Note and caveat: Marshall Breeding, the author of this chapter, is co-chair of the Open Discovery Initiative, along with Jenny Walker .]

Supporting Standards

The search model based on a central index of all the potential resources also builds on the work of the Open Archives Initiative. Beginning in 1999 at the Sante Fe Convention the idea of an index of pre-harvested metadata began to be established as preferable to real-time queries to multiple resource targets. When addressing the problem of providing access to the many repositories of pre-print articles maintained by specific disciplines and organizations, the model of search that prevailed involved harvesting metadata from each repository of interest and building a centralized service. In support of this centralized search service model, which prevailed over the metasearch model, the Open Archives Initiative Protocol for Metadata Harvesting 62 was developed and continues as a mechanism frequently used in the transfer of metadata from content providers to those organizations involved in operating discovery services. Work is currently underway on ResourceSync, a joint project between the Open Archives Initiative and NISO, to revise and modernize OAI-PMH and to extend its capabilities to handle very large-scale resources efficiently and to synchronize not only the metadata but the associated full-text or digital objects as well.

The index-based discovery services also incorporate some of the technologies and standards from the preceding generations of products. Interoperability with the ILS for current status of physical items and for patron account and self-service features builds on the ILS-DI reference model established in the genre of next-generation library catalogs. The conventions of faceted navigation and other search and retrieval techniques continue to be prominent with index-based search services.

Context-sensitive linking based on OpenURLs resolved with the support of knowledge bases of e-journal holdings naturally continues to play an important role in discovery services, though some increasingly display pre-resolved links, avoiding the need for users to interact with link resolver selection menus.

OAI-PMH, (possibly ResourceSync in the future); SIP and NCIP for communication with underlying ILS; OpenURL for link resolution; many use pre-resolved links to full text or embedded full text.

Implementation Examples

Products currently available for indexed-based discovery include the following:

  • Serials Solutions launched Summon®, the first index-based discovery service, in 2009. A significantly enhanced product, Summon 2.0 was developed for release in 2013.
  • Ex Libris created Primo Central with a central article-level index addressing the content of research and academic libraries. Primo Central was designed to work with the company's Primo® discovery interface.
  • EBSCO Discovery Service, based on the EBSCOhost platform and interface, also indexes content from a library's non-EBSCO content providers.
  • OCLC WorldCat® Local builds on the WorldCat.org database, extended with article-level metadata.

Conclusion

This chapter has taken a tour through the history of library discovery, describing the evolution of the functionality expected, the constant expansion of content addressed, and the technologies employed. It has given emphasis to the standards, recommended practices, and informal collaborative efforts that have contributed to the success of each generation or variation of these tools. In each of phase, new tools attempted to make improvements over earlier efforts and make the best use of available search and retrieval technologies, information workflows, and user interface conventions. Yet, ever heightening end-user expectations and the availability of ever more powerful technology platforms drove new cycles of innovation that resulted in new models of library discovery tools.

A steady progression of technologies, methodologies, and standards has provided ever more powerful support for products and services that facilitate discoverability and access to library collections, whether owned, licensed, or open access. Today, the state-of-the-art in this arena takes the form of index-based discovery services. We should not, naturally, expect that this model of search represents the end of the road in this critical aspect of library technologies. While it's difficult to know in the midst of any given technology cycle what will come next, we can anticipate that the use of linked data will eventually re-shape the realm of library discovery. Linked data has recently become a hotbed of interest in the library community, with many efforts underway in the development and deployment of projects. We might anticipate that future discovery interfaces will take advantage of the browsing and structured exploration of related resources enabled through linked data to create a powerful extension to the index-oriented search and retrieval technologies.

Permalink:
View Citation
Publication Year:2014
Type of Material:Chapter
Language English
Published in: The Critical Component: Standards in the Information Exchange Environment
Publisher:Association for Library Collections and Technical Services, American Library Association
Place of Publication:Baltimore, MD
ISBN:8100-7445
Record Number:18577
Last Update:2018-09-02 07:19:54
Date Created:2013-11-06 16:19:39