Providing access to electronic resources is a core role of libraries. Academic libraries devote most of their collection budgets to subscription fees for these resources and therefore have a critical interest in making sure that they can be easily accessed by their patrons. Many of these resources cannot be accessed freely by the general public on the web but must be restricted to those associated with a subscribing institution. Those not associated with a library subscribing to the resource might be able to purchase a personal subscription or pay for individual articles. Although an increasing portion of the new scholarship is being published as open access, the model of restricted subscription-based resources remains a fundamental part of the library information ecosystem and requires a technical solution.
The longstanding technical approach for enabling access to restricted resources for individuals associated with subscribing has been based on IP authentication. Basically, if a person connects via a device associated with one of the network addresses assigned to an institution that has purchased a subscription, then access is allowed. Access from any other address encounters a paywall, a page that offers other alternatives to view the article, such as paying a per-article charge or prompting for an individual or institutional login.
IP authentication assumes that a person's institutional identity can be associated with the IP address of the device used to access the content. Each device must have a globally unique IP address and blocks of these addresses are assigned for use by the networks of educational institutions. The scheme of using IP addresses to validate access to restricted scholarly content worked well in the early phase of electronic resources but has been problematic ever since.
This type of authentication requires libraries to provide lists of the IP addresses associated with their institution to each of the publishers from which they purchase subscriptions. The IP addresses associated with an institution change over time, requiring continual updates to be distributed to each publisher. Pressures related to the shortage of IPv4 addresses and the transition to IPv6 further complicate the problem. In most cases, academic institutions with large networks will use generic IP addresses internally, which are translated to an external global IP address for access to resources beyond the local network. This process of network address translation (NAT) can also be problematic for cases when only specific departments subscribe to a resource but are not differentiated in the IP address seen by the publisher.
From the perspective of those accessing the resources, IP authentication is completely transparent when access originates from the campus network. Since the publisher recognizes the IP address of the user, no additional steps are required to view or download the resource. IP authentication works well at preserving the privacy of those accessing restricted resources. No specific information regarding the identity of the researcher is passed into the publisher's infrastructure other than an institutional IP address.
The major failing of IP authentication relates to the reality that an increasing portion of access by people associated with an institution and authorized to access resources originate from IP addresses outside of its network. The students, faculty, and staff members of an institution expect to access library resources from off-campus locations that will not carry the institutional IP addresses. This problem is exacerbated by the skyrocketing use of mobile devices that may not use an institutional IP address even if used on campus. Problems with IP Authentication
The problem of providing access to electronic resources is even more difficult for other types of libraries. Public libraries often acquire electronic databases and other content resources that likewise need to be restricted to their direct patrons. Since the constituency of authorized patrons come from diverse and unspecified IP addresses, the scheme of IP authentication does not work well at all for public libraries. It is common for public libraries to provide a login page where their patrons can enter their library card number or PIN to access resources. For programs offering statewide access to resources it is possible to use geolocation services to identify persons accessing resources within the authorized service area. Geolocation techniques offer a relatively low degree of accuracy, though they are generally within the tolerances accepted by publishers.
IP authentication represents a significant burden both for the providers of proprietary resources and for libraries. Each publisher must maintain a registry of the IP addresses associated with each of its customers and continuously update that registry. Libraries likewise must maintain a list of the valid IP addresses associated with their campus, which then need to be transmitted to each vendor when initiating a subscription and updated as the institutional network inevitably changes. For institutions that maintain separate subscriptions for specific schools or departments, the corresponding IP address sets must also be managed and distributed.
The effort involved in IP authentication is substantial for both libraries and publishers. A company called RedLink has developed a service to assist libraries and publishers with managing IP addresses and other access credentials. The RedLink Network is a free service that enables libraries to provide their IP addresses in one place, which is then available to all their content vendors. Publishers can then retrieve IP addresses from the RedLink Network for all participating customers instead of receiving them individually. In addition to its free services, RedLink sells products, such as the Library Dashboard and Publisher Dashboard, that offer statistical reporting tools and other features.
The main technical approach that libraries have implemented to provide access to restricted resources for users not associated with institutional addresses involves the use of proxy services. A proxy service operates by performing some type of authentication for users outside the institutional network, and once authenticated, it conducts the resource request through an authorized IP address. From the publisher perspective, access continues to rely on IP authentication requiring no additional effort. Proxy servers can use a variety of mechanisms to validate users, such as a SIP request to the library's integrated library system, an internal database of users and passwords, or an institutional authentication service.
Access to a proxy service usually involves using a modified form of a URL to access restricted resources. The modified URL prepends the resource URL onto the base URL of the proxy server. For example, the url www .restrictedresource.com would be re-written as proxy.myinstitution. edu/?url=www.restrictedresource.com. Once the session is initiated, the proxy service will rewrite all links displayed from that resource to append its own URL so that they remain valid and authorized. To access resources via a proxy server, patrons usually need to access it through a catalog, discovery service, or by finding aid that provides the link in the modified form. Users accessing restricted resources via search engines or through bookmarked URLs from off-campus addresses will encounter the paywall.
The maintenance of a proxy server represents significant effort for a library since the URL for each resource must be registered in the proxy server, and all references to that resource in the library's environment must be adjusted. Most proxy servers record each transaction they process in a log file. These logs can be processed to produce statistics on the resources accessed via the proxy server. Some universities even channel on-campus access through their proxy server so that they will be included within those statistics. It should be noted that proxy server logs represent only a subset of overall access since it does not include many on-campus users that access resources outside the library-provided interface or those that access resources from off campus using other forms of authentication.
A key weakness of proxy servers lies in the possibility of uncontrolled access to all remote resources if a single set of authentication credentials becomes compromised. Any individual gaining such access could perform wholesale downloads of restricted content from any of the providers available through institutional subscriptions. Most publishers monitor their services for such occurrences and will quickly disable access to that entire institution until the proxy issue is resolved.
The proxy services most widely used in libraries is EZproxy, an OCLC product. It was originally developed in about 1999 by Chris Zagar and sold through his company called Useful Utilities. OCLC acquired EZproxy from Useful Utilities in January 2008 (see the March 2008 issue of Smart Libraries Newsletter). Since its acquisition, OCLC has continued to develop EZproxy, which is now available in its sixth major version and is offered both as software for installation on a local Linux or Windows server or as a hosted service. EZproxy supports multiple authentication methods including LDAP (Lightweight Directory Access Protocol), CAS (Central Authentication Service), SIP2 (Standard Interchange Protocol Version 2.0), and Shibboleth.
Although IP authentication was initially a pragmatic solution, it has become increasingly problematic as the internet has evolved. But despite its limitations, this method persists as the dominant approach used to provide access to restricted scholarly resources even as more modern approaches have emerged. The main alternatives to IP authentication rely on some type of federated identity management. Almost all educational institutions today have some type of centralized authentication service available used to provide secure access to all technology-based services. Rather than each application maintaining its own login scheme, most can instead rely on an external authentication service. These services can be based on Active Directory, LDAP, Kerberos, or other technology with a mechanism to validate the credentials of a user. Most institutional networks also offer a single sign-on capability so that once the user has performed the login sequence successfully for one application, access to other applications is granted without having to log in again. Some of the common single sign-on protocols include CAS, SAML (Security Assertion Markup Language), and Kerberos.
While single sign-on implementations work within a given institutional network, the problem of providing access to restricted scholarly resources extends beyond that domain. Federated authentication has emerged as the main architecture able to solve this problem. The basic idea is that each institutional network implements its own scheme to authenticate its users, and access to services in another domain is allowed based on the trust between domains. A service does not need to know the identity of an individual making a request from an external domain. It only requires a mechanism indicating that the user was definitively authenticated within their home domain and that there is a previously established trust relationship among the domains that comprise the federation. In some cases, generic attributes are passed across domains to inform the authorization of resources, ideally without revealing personally identifiable information.
Several federated authentication services have been implemented, including
- Shibboleth, an Internet2 initiative initially launched in about 2000, which has steadily gained adoption. All of the components involved in Shibboleth are available as open source software, and it is designed to protect the privacy of users as they access resources across domains. Shibboleth is based on SAML and includes Identity Provider, Service Provider, and metadata aggregation components.
- OpenAthens, which is a single sign-on service based on Shibboleth and SAML. OpenAthens is offered by the nonprofit organization Eduserv based in the United Kingdom. OpenAthens has been adopted by over 2,000 organizations, including many higher education institutions in the UK and internationally as well as by healthcare and research organizations, such as the National Health Service (NHS). In October 2017, Eduserv launched OpenAthens Cloud, which provides a less complex way for content providers to enable access to their resources compared to the locally hosted option.
RA21: Resource Access for the 21st Century
Providing access to remote restricted resources remains an unsettled issue. No single service or architecture has gained universal adoption, and basic IP authentication remains widely used despite its problems and limitations. Both publishers and libraries have strong interests in finding solutions that are technically sound, have a low level of difficulty and expense to implement, and that ensure privacy of access. The International Association of Scientific, Technical, and Medical Publishers (STM) and National Information Standards Organization (NISO) have created a new initiative called RA21: Resource Access for the 21st Century. Launched in 2016, RA21 aims to solve the problems associated with providing selective access to information resources on the web and to finally end the dependency on IP authentication. The initiative will work toward defining recommended practices, taking advantage of relevant standards and protocols, and will not define a specific technical solution. Principles for the initiative surround open solutions that avoid proprietary software or protocols, that can be implemented with a low threshold of difficulty, and that are neutral relative to any technology or content vendors.
Three pilot projects are currently underway, two in the academic sector and a third in the business environment. Each project is based on a different implementation of SAML to achieve federated identity management.
- The Corporate Pilot will validate the use of SAML technologies among pharmaceutical companies affiliated with the Pharma Documentation Ring. (see https://ra21.org /index.php/pilot-programs/universal-resource-access -ura-pilot/)
- Academic pilots:
- Privacy Preserving Persistent WAYF is based on Shibboleth but incorporates additional information, such as the email domain into metadata exchanged across the federation. This additional information is termed WAYF (where are you from) hints (see https://ra21 .org/index.php/pilot-programs/p3-wayf-pilot/).
- WAYF Cloud project aims to validate the use of a cloud service for the exchange of data among publisher platforms (see https://ra21.org/index.php/pilot-programs /wayf-cloud-pilot/).
The pilot phase of the RA21 initiative is expected to run through early 2018, which will be followed by a possible publication of a NISO recommended practice that could foster future implementations.
For more information, see https://ra21.org.