Analytics provide valuable information to organizations regarding the effectiveness of their websites. Libraries value their virtual presence and find value in measuring the level of use their site receives and detailed information on the types of users accessing the site, the devices used, and many other details that may help them improve its content, design, and technical implementation.
Many different tools are available for website use analysis, including Google Analytics, which is not only one of the most powerful services, but also is able to be implemented without financial cost. As with other free services, the provider may receive value through other types of currency. In this case, the currency takes the form of detailed data describing the use patterns of all the sites taking advantage of the service. The data also includes the IP address of the user requesting each resource. Libraries need to make informed decisions regarding how user and usage data are shared with external commercial organizations and whether the user data associated with specific resources accessed on the library website fits within their policies. Given most libraries have policies to protect patron privacy, personally identifiable data must be handled very carefully. Libraries routinely implement procedures that limit the collection of patron data and carefully control its dissemination to other parties, within the library, within its parent institution, and especially to any external third parties.
Google Analytics and competing services operate with an area of sensitive interactions among multiple stakeholders. Google provides its analytics service as a free product, primarily to optimize its advertising business and to help commercial sites maximize income. Organizations using its advertising services rely on its Analytics service to measure the impact of their spending and to inform the design and workflow of their site. Google offers a free tier for its Analytics service, but gains important data and insight relating to use patterns based on data collected. Libraries must understand that use of Google Analytics involves transmitting data about every page request to Google servers.
This data includes direct information regarding the user and the page delivered, as well as indirect information related to associations to other pages that this user may have visited in other contexts.
Google offers a configuration option for Google Analytics that anonymizes data before it is transmitted to its servers. This option must be set in the code snippet that is added to each page to enable Google Analytics. Unless this configuration setting is activated, Google will associate each page use captured with a specific address. Setting the ‘anonymizeIp' parameter to “true” will instead mask the last portion of the IP address, making it more difficult to associate access to any given resource to a specific user (see Figure 2). The default version of the code snippet generated from the Google Analytics console will not include this configuration option, meaning that by default it collects and transmits the full IP address. In most cases, this code will be pasted into a single place in the content management system for the website in a way that it is presented on each page delivered. For more information, see the article “IP Anonymization in Analytics: A Technical Explanation of How Analytics Anonymizes IP Addresses” on the Google Analytics Help page (https://support.google.com/ analytics/answer/2763052?hl=en).
Another possible behavior of Google Analytics enables tracking of the user through advertising networks. This option is configured either from within the embedded code snipped seen above or through the Google Analytics console. Advertising tracking is enabled through activating the “displayfeatures” plug-in (see Figure 3). Do not add this line to your code snippet if you want a higher level of privacy for your patrons.
This feature can also be set through the Google Analytics console. To check this setting, navigate to the ADMIN page, select the web property to be configured, and under the “.js Tracking Info” group, select “data collection.” The page titled “Data Collection for Advertising Features” will include two toggles, one for Remarketing and the other for Advertising Reporting Features (see figure 4). Both should be set to off to prevent user activity from being tracked in the advertising networks.
The page “Data Collection for Advertising Features” states the consequences of enabling these features: “Note: By enabling the toggles below, you enable Google Analytics to automatically collect data about your traffic. If you don't want to collect data for advertising features, then you need to turn off both toggles as well as ensure that you have not manually enabled any advertising features data collection in your Google Analytics tags.” For more detailed information, see Eric Hellman's article, “How to Enable/Disable Privacy Protection in Google Analytics (It's Easy to Get Wrong!)” at https://go-to-hellman .blogspot.com/2017/02/how-to-enabledisable-privacy-protec tion.html.
An alternative model of website analytics relies on server logs instead of the page tagging technique used by Google Analytics. All web servers, unless configured otherwise, record every page request into a locally stored log file. Using software to analyze a web server log file does not require sharing user traffic with any third party. These tools range from very simple reports to those that provide comprehensive site analysis as sophisticated as Google Analytics. As an example, AWStats is an open source log analysis tool, which has been continually developed since about 2000.
Google Analytics is one of many different tracking tags that may be embedded on web pages. Each tracking tag operates differently. Most will transmit some type of information to a third party. To provide an environment that protects patron privacy, it is important to know about every tracking tag and to understand what data are transmitted and what entities receive that data directly or indirectly.
Libraries can perform a comprehensive assessment of their environment to determine all tracking tags present. Many tools are available, including Ghostery, which operates as a browser plug-in. Using Ghostery, a site using only the Google Analytics tag would be reported as shown in figure 5. As libraries deploy technology-based applications or services, they must take into consideration many factors related to the security and privacy of patron data.
Most library-specific applications have been developed to limit or anonymize activity that can be linked to a specific individual. Products such as discovery services, integrated library systems, library services platforms, or others specifically developed for libraries will almost always include configuration options that can be enabled to limit collection of personal data. Libraries may need to be more careful when deploying applications or components designed for commercial use. It is likely that these services will need to be carefully configured to reflect the user privacy policies of the library.
Any general-purpose content management systems, authentication services, or other components should be carefully reviewed regarding how they collect, store, and transmit personally identifying data. In broader terms, it is important to periodically audit the library's entire technical infrastructure, including vendor-provided services, to ensure that the actual behavior of all software and services conforms to the organizations stated privacy policies.