Library Technology Guides

Document Repository

Smart Libraries Newsletter

Smart Libraries Q&A: Analytics and Patron Privacy

Smart Libraries Newsletter [November 2018]

by

Analytics provide valuable information to organizations regarding the effectiveness of their websites. Libraries value their virtual presence and find value in measuring the level of use their site receives and detailed information on the types of users accessing the site, the devices used, and many other details that may help them improve its content, design, and technical implementation.

Many different tools are available for website use analysis, including Google Analytics, which is not only one of the most powerful services, but also is able to be implemented without financial cost. As with other free services, the provider may receive value through other types of currency. In this case, the currency takes the form of detailed data describing the use patterns of all the sites taking advantage of the service. The data also includes the IP address of the user requesting each resource. Libraries need to make informed decisions regarding how user and usage data are shared with external commercial organizations and whether the user data associated with specific resources accessed on the library website fits within their policies. Given most libraries have policies to protect patron privacy, personally identifiable data must be handled very carefully. Libraries routinely implement procedures that limit the collection of patron data and carefully control its dissemination to other parties, within the library, within its parent institution, and especially to any external third parties.

Google Analytics and competing services operate with an area of sensitive interactions among multiple stakeholders. Google provides its analytics service as a free product, primarily to optimize its advertising business and to help commercial sites maximize income. Organizations using its advertising services rely on its Analytics service to measure the impact of their spending and to inform the design and workflow of their site. Google offers a free tier for its Analytics service, but gains important data and insight relating to use patterns based on data collected. Libraries must understand that use of Google Analytics involves transmitting data about every page request to Google servers.

This data includes direct information regarding the user and the page delivered, as well as indirect information related to associations to other pages that this user may have visited in other contexts.

Google offers a configuration option for Google Analytics that anonymizes data before it is transmitted to its servers. This option must be set in the code snippet that is added to each page to enable Google Analytics. Unless this configuration setting is activated, Google will associate each page use captured with a specific address. Setting the ‘anonymizeIp' parameter to “true” will instead mask the last portion of the IP address, making it more difficult to associate access to any given resource to a specific user (see Figure 2). The default version of the code snippet generated from the Google Analytics console will not include this configuration option, meaning that by default it collects and transmits the full IP address. In most cases, this code will be pasted into a single place in the content management system for the website in a way that it is presented on each page delivered. For more information, see the article “IP Anonymization in Analytics: A Technical Explanation of How Analytics Anonymizes IP Addresses” on the Google Analytics Help page (https://support.google.com/ analytics/answer/2763052?hl=en).

Example of a webpage script for Google Analytics to anonymize IP address
Example of a webpage script for Google Analytics to anonymize IP address
Enabling the displayfeatures plug-in
Enabling the displayfeatures plug-in

Another possible behavior of Google Analytics enables tracking of the user through advertising networks. This option is configured either from within the embedded code snipped seen above or through the Google Analytics console. Advertising tracking is enabled through activating the “displayfeatures” plug-in (see Figure 3). Do not add this line to your code snippet if you want a higher level of privacy for your patrons.

This feature can also be set through the Google Analytics console. To check this setting, navigate to the ADMIN page, select the web property to be configured, and under the “.js Tracking Info” group, select “data collection.” The page titled “Data Collection for Advertising Features” will include two toggles, one for Remarketing and the other for Advertising Reporting Features (see figure 4). Both should be set to off to prevent user activity from being tracked in the advertising networks.

The page “Data Collection for Advertising Features” states the consequences of enabling these features: “Note: By enabling the toggles below, you enable Google Analytics to automatically collect data about your traffic. If you don't want to collect data for advertising features, then you need to turn off both toggles as well as ensure that you have not manually enabled any advertising features data collection in your Google Analytics tags.” For more detailed information, see Eric Hellman's article, “How to Enable/Disable Privacy Protection in Google Analytics (It's Easy to Get Wrong!)” at https://go-to-hellman .blogspot.com/2017/02/how-to-enabledisable-privacy-protec tion.html.

Google Analytics Remarketing and Advertising Reporting Features
Google Analytics Remarketing and Advertising Reporting Features

An alternative model of website analytics relies on server logs instead of the page tagging technique used by Google Analytics. All web servers, unless configured otherwise, record every page request into a locally stored log file. Using software to analyze a web server log file does not require sharing user traffic with any third party. These tools range from very simple reports to those that provide comprehensive site analysis as sophisticated as Google Analytics. As an example, AWStats is an open source log analysis tool, which has been continually developed since about 2000.

Google Analytics is one of many different tracking tags that may be embedded on web pages. Each tracking tag operates differently. Most will transmit some type of information to a third party. To provide an environment that protects patron privacy, it is important to know about every tracking tag and to understand what data are transmitted and what entities receive that data directly or indirectly.

In some cases, the page tracking tags are placed on a library page intentionally to activate some desired feature or service. It is also possible for tags to be enabled accidently, or as a side effect of a widget or service embedded on the page. It's tempting to add features by borrowing JavaScript or other code snippets from other sites without understanding all the details of how they work. Even experienced developers can introduce tracking mechanisms, making it important to survey your site periodically, especially after implementing a redesign or implementing major changes to ensure that it does not contain unintended tracking tags.

Libraries can perform a comprehensive assessment of their environment to determine all tracking tags present. Many tools are available, including Ghostery, which operates as a browser plug-in. Using Ghostery, a site using only the Google Analytics tag would be reported as shown in figure 5. As libraries deploy technology-based applications or services, they must take into consideration many factors related to the security and privacy of patron data.

Example of the browser plug-in Ghoster
Example of the browser plug-in Ghostery, which displays all tracking tags on your website

An important aspect of a patron privacy policy involves the non-collection or protection of personally identifiable information. Most web-based applications and services are designed to collect data on every aspect of their use. Personal information has become the currency of the internet, making user and usage data a valuable commodity. Any web server, for example, will come with a default configuration that logs all activity, usually tied to an IP address. Depending on the context, the IP address itself may not reveal a specific person but may be combined with other data points to triangulate activity to a specific person. In response to this possibility, libraries may opt to scrub web server logs of IP addresses or other personal information.

Most library-specific applications have been developed to limit or anonymize activity that can be linked to a specific individual. Products such as discovery services, integrated library systems, library services platforms, or others specifically developed for libraries will almost always include configuration options that can be enabled to limit collection of personal data. Libraries may need to be more careful when deploying applications or components designed for commercial use. It is likely that these services will need to be carefully configured to reflect the user privacy policies of the library.

Any general-purpose content management systems, authentication services, or other components should be carefully reviewed regarding how they collect, store, and transmit personally identifying data. In broader terms, it is important to periodically audit the library's entire technical infrastructure, including vendor-provided services, to ensure that the actual behavior of all software and services conforms to the organizations stated privacy policies.

Permalink:  
View Citation
Publication Year:2018
Type of Material:Article
Language English
Published in: Smart Libraries Newsletter
Publication Info:Volume 38 Number 11
Issue:November 2018
Page(s):6-7
Publisher:ALA TechSource
Place of Publication:Chicago, IL
ISSN:1541-8820
Record Number:24073
Last Update:2022-11-14 05:41:08
Date Created:2019-03-05 21:15:57
Views:102