What changes can we implement to ensure our catalog and discovery services are protecting our patrons' privacy? How can we help ensure our patrons' confidentiality in regard to reader privacy, reading statistics, and digital information access?
Safeguarding the privacy of patrons as they use library resources is one of the basic values of the library profession. In the context of library catalog and discovery services protecting privacy involves several different areas of concern.
A typical session that includes a patron using a library search tool must be treated with the same degree of care as the data related to the borrowing of a physical item from the library. The need to protect circulation records is well accepted and the related technical and operational processes are followed by almost all libraries that lend materials to the public. When an individual searches for content on a library catalog or discovery interface, it involves personal data of equal or higher sensitivity than circulation. The network address, browser cookies, or other identifiers can easily be resolved to the personal identity of the searcher. The text entered into the query box, identifying content of interest, is transmitted across the internet from the web browser to the servers processing the search. The search results returned to the user and items selected, links to resources, or even specific items read or downloaded become part of a package representing a very sensitive transaction between an individual and the library.
Given the sensitivity of these data, there are steps that should be taken to ensure patron privacy. The first and most fundamental action is to use encryption. If the transaction is not encrypted, it can be captured by unknown third parties using readily available network eavesdropping software or equipment. Configuring the service to use the HTTPS protocol provides end-to-end encryption that cannot be penetrated. All ecommerce sites depend on this protocol to protect credit card numbers and other sensitive financial data. Configuring library websites or the servers running catalogs and other search services today should be considered an essential requirement. Chrome and other web browsers currently display prominent warnings for any site that continues to run the unencrypted HTTP protocol. I have been tracking the use of HTTPS vs HTTP on library websites for the last several years. Recent scans of all of the public library websites in the US reveal that about 15 percent still use HTTP; about 6 percent of academic libraries use HTTP.
Other measures can be taken to protect data possibly stored on the search service and falling within the bounds of patron privacy. Almost all search services create logs or other types of records for each transaction. These logs support important functions such as statistical reporting and analytics. To ensure privacy, it is essential to anonymize these records. This can be accomplished by truncating IP addresses to identify only users' network or domain and not a specific device. It is important to ensure that all copies of the transaction be anonymized, including raw web server logs in addition to transactions captured in databases within the application.
Many, if not most, of the search services offered by the library will be implemented on the technical infrastructure provided by an external vendor. The major indexed discovery services such as EBSCO Discovery Service, Ex Libris Primo and Summon, and WorldCat Discovery services are almost always deployed this way. Socially oriented services such as those from BiblioCommons tap into an even greater set of personalize data, likewise hosted on vendor-provided infrastructure. In these cases the library must work closely with the vendor to ensure that the technical operation of the service matches their expectations in regard to the treatment of personally identifiable data, opt-in or opt-out retention of search history; and that the vendor's privacy policies and the technical behavior of the system matches the library's own privacy policies.
Limiting the collection of personal data can be counter to the interest of the library in delivering personalized services and in performing detailed analytics on the usage of its services. Expectations regarding these capabilities are set by the commercial environment that puts massive effort into extracting all possible personalized data from both online and in-person activities. Libraries, consistent with our distinct interest in protecting private data, cannot necessarily replicate the full extent of personalization and targeted marketing seen in the commercial arena. It is possible, however, to build effective services based on anonymized data, category and demographic data, as well as opt-in personalized data. This difference in values means that any marketing and analytics services used by libraries needs to be built around a different set of assumptions than those developed for the commercial arena. That requirement does not necessarily mean avoiding commercial customer relationship management or marketing engines but populating them and using them in ways that respect the library's privacy policies and practices.