Libraries, like most other organizations, face increasing pressures to strengthen their impact and continually optimize how they allocate resources. Collecting data, detecting patterns, and creating visualizations that can be readily interpreted have become important tools for managers. Organizations increasingly value data-driven decision making and are investing in tools to provide those capabilities. The possibilities for collecting and exploiting abound and are used aggressively in the commercial and social networking arenas. The pervasive drive to exploit data presents challenges for libraries that strive to use it to benefit their organizations, yet stay within bounds of their professional values and privacy policies.
Libraries have a long tradition of counting events and actions and producing statistics describing their collections and operations. Advancing into the realm of analytics represents a natural step forward. Technologies available today provide powerful tools for guiding the management of a library to shape its collections and services to optimally serve its constituents. Analytical tools can be incredibly useful to measure performance and to anticipate future outcomes. Daily operational decisions and organizational strategies can be formed through the patterns and trends seen in relevant data, presented via well-designed visualizations. Library requirements for data management and analytical tools differ from other organizations, especially from those in the commercial sector, which take a more aggressive approach to personal data. Libraries have a strong interest in presenting personalized services to their users, but also impose strict limits on the use of personally identifiable information. These limits do not apply to all categories of data. The ethical framework for data privacy of library patrons differs from that of library workers and other sources of data generated through operational applications.
Patron Privacy: A Basic Assumption
Libraries hold a high standard of privacy for their users. This approach to privacy enables patrons to access library materials confidentially, without concern that their information will be shared with any other individual or organization. Not only do libraries protect the personal details of their patrons more strictly than would be the case in a commercial context, but they also retain data related to patron interactions with collection items and services within strict parameters.
Operations require collection of some data during an active period of use, though on the completion of that transaction, all data are removed or anonymized. Patterns of data retention consistent with library ethics were initially established with physical items and the manual or automated circulation systems that manage lending transactions. Library lending involves connecting a record associated with the patron with a record associated with a collection item. Until the patron returns the item, an active link enables communication channels for messages such as overdue, fine, or recall notices.
Once that transaction is complete, any link between the patron and item record is released. Any logs are then sanitized to eliminate record transaction details that could be used to reconstruct the use of an item by the patron. Rather than deleting transaction record entirely, libraries generally replace any data elements that directly represents the patron with placeholders that anonymously preserve significant categorical information needed for statistics and analysis.
These privacy enhanced practices likewise apply to digital transactions, despite the complications related to the many different platforms and networks involved, each of which has the opportunity to collect detailed data. Commercial providers of digital content may not share library patron privacy values. Amazon, for example, gains access to some data when libraries lend ebooks to patrons using Kindle readers. Despite these complications, libraries strive to enforce the confidentiality of patron transactions for their digital services.
Other Organizational Data
Data related to tasks performed by library workers do not necessarily fall within the same level of protection as applies to patron transactions. Library workers do not expect anonymity as they perform their work on systems provided by the library or its parent institution. Integrated library systems, for example, will capture productivity statistics that can be used for assessment of individual workers or the institution. Library workers often use the email and other communications systems of their parent organizations, which do not necessarily guarantee employee privacy and may be subject to freedom of information act requests. Analytics tools used by libraries should be able to differentiate the privacy and legal frameworks associated with each category of data.
Performance and Assessment
Libraries expect their business systems to generate data that can be used to measure performance and impact of their resources, including personnel, collections, and financial. Collection of data related to tasks performed within a system informs decisions related to the number of personnel needed to meet demands and the allocation of personnel according to changing workloads. Libraries especially benefit from data related to collections. On the back end, pricing and vendor data enable libraries to evaluate the value, performance, and competitiveness of their suppliers. Given constrained budgets, libraries must use all available means to build collections tuned to meet the information needs of their constituents and to make selection, renewal, and cancellation decisions based on the impact and cost effectiveness.
Collection Use and Impact
Libraries also depend on data related to the use of collection items, provided these data operate within the confines of privacy concerns. For physical and digital items, libraries depend on data describing the volume of use and the categories of users. The circulation systems of ILS or LSP products provide definitive data regarding the use of physical items. Measuring use of digital items is more complex and comes from diverse sources, such as publisher-supplied COUNTER statistics, proxy servers, or link resolvers. The key concern for libraries is that any data collected about resource usage describe only the categories of users accessing materials and never specific users.
Reports: Basic Counts and Statistics
All library automation products offer some level of reports and statistics. Integrated library systems, for example, produce basic reports of daily, monthly, and annual circulation totals by categories, and breakdowns of new acquisitions by disciplines. These products come delivered with standard reports generated automatically and may also offer customized reports that can be programmed to create lists or tables addressing any category of data available in the system. The custom reporting capabilities of systems may include procedures to count or extract data using SQL queries or API tools. Reports provide simple representations of data but do not necessarily provide additional perspective or insights.
Analytics: Beyond Statistics and Reports
In recent years, expectations have advanced beyond basic reporting and statistics to more sophisticated analytical tools. Analytics provide a more sophisticated set of tools to enable organizations to more fully explore any available data to answer operational or strategic questions. These tools are scaled to work with massively large data sets and can combine data from many categories and sources. They include more advanced computational capabilities that can extract patterns and trends that may not be apparent using standard reporting tools. Some may incorporate machine learning and other aspects of artificial intelligence to analyze large and complex data sets.
Data tables can be difficult to interpret, so most analytics packages create visual representations that summarize data and trends. These visualizations make complex data scenarios or computational analysis accessible to managers or administrators. An analytics package will enable users also to explore the data underlying any graph or visualization. It is also common to bring together several visualizations into a single page to form a dashboard that gives a broad overview of a variety of related activities. Dashboards serve as landing pages that lead into more detailed data summaries or graphs.
While reports address data within a single system, analytics can incorporate data from many disparate sources. In the library context, an analytics package can bring together multiple categories of data from the ILS, usage data from content providers, selected data from student information systems, census or other demographic data sets, building entrance counters, or Wi-Fi and other network traffic data.
Underlying Technologies
Most library-oriented products rely on commercial analytics engines. General purpose analytics engines can be licensed as infrastructure components in the same way that an integrated library system would make use of relational database. Major options include Oracle Business Intelligence, Tableau, MicroStrategy, IBM Cognos Analytics, and SAP Analytics Cloud. Each analytics engine offers different capabilities, integration options, and data models. In most cases, which analytics engine a vendor chooses for its analytics infrastructure will be transparent to the library. Library-specific analytics modules or platforms provide front-end interfaces and data connectors and import tools. Libraries will want to pay careful attention to any data sharing or privacy terms associated with the analytics service.
Ethical Issues and Professional Values
Data warehouses and analytical engines have powerful capabilities. In a commercial environment, these tools can combine data from many sources to compile detailed profiles of preferences and spending for an individual consumer. These data in turn power advertising and ecommerce platforms to enhance revenue through highly personalized marketing techniques.
These technologies must be approached cautiously by libraries. Even when libraries follow standard approaches to anonymize data sets, they should also be careful to prevent the possibility of re-identifying transactions to personal identities though triangulation with other data sets. Advertising networks routinely associate use data with specific individuals even when the sources may not have definitive identifiers by techniques such as clueing off of network traffic data.
Data analytics enable many scenarios that may benefit library decision making, but that raise ethical questions. Historical data on the items borrowed or accessed digitally enables multiple services, and patrons may expect to view such lists in their profiles. Libraries can offer patrons an option to retain this data, but the specific terms may not be consistently understood. Is it within library ethical practices to collect this data by default, with an option to opt out? Are library systems able to use this data in recommendation engines? If such nonanonymized data exists, it may be challenging to ensure that it is not shared beyond the specific boundaries defined in the terms of privacy statements.
Libraries should also be vigilant to ensure that analytics or data practices do not introduce bias or inequities into their information ecosystems. Strategies or intentions that libraries conceive to provide materials and services to vulnerable or underserved segments of their communities may not necessarily be addressed algorithmically. Recommendation services can also introduce unintended biases. Both in the collection and presentation of content items, libraries must oversee any analytic-driven processes to avoid bias. Principles of diversity and inclusion should be applied to the data, algorithms, and analytics that shape the performance of library systems and platforms.
Indispensable Technologies
These concerns do not mitigate the value of data and analytical tools for libraries. It is important for libraries to invest in the technologies able to help them manage their resources and carry out their missions. This genre of technology products wields substantial power and can be deployed well within the bounds of library ethics to produce significant benefits. Libraries that implement these tools need to be aware of any implications related to privacy and bias and configure their use accordingly.
An Expanding Genre
Many of the major library technology vendors offer enhanced analytics packages integrated into their platforms or as optional product offerings.
- Alma Analytics: included as a built-in component of the Alma library services platform. https://exlibrisgroup.com/ products/alma-library-services-platform/alma-analytics
- BLUEcloud Analytics: an optional component of the BLUEcloud suite that can work with either the Symphony or Horizon ILS products. https://www.sirsidynix.com /bluecloud-analytics
- CollectionHQ: an analytics package oriented to library collection management, offered by a division of Baker & Taylor. https://www.collectionhq.com
- Gale Analytics: combines local ILS data with public demographic data sets. https://www.gale.com/databases /gale-analytics
- OCLC WorldShare Report Designer: provided with WorldShare Management Services or the Tipasa ILL workf low manager. https://www.oclc.org/en/worldshare -report-designer.html
Analytics offerings for libraries continues to expand. This issue of Smart Libraries Newsletter features the latest new offering, Panorama from EBSCO Information Services.