Library Technology Guides

Document Repository

Mining Data for Library Decision Support

Computers in Libraries [June 2013] The Systems Librarian

Image for Mining Data for Library Decision Support

A library's technology infrastructure should be well-aligned with its broad strategic mission, providing support for its operations in ways that results in the best possible service. In today's economically challenging environment, libraries lean on technology to automate their daily activities and to accomplish as much as possible with the fewest number of people, stretching their collection funds as far as possible.

Each of the layers of technology in place must also provide the strongest tools possible to yield analytical data in support of measuring the performance and efficiency of the related areas. The churn of daily activity performed on a system produces an incredible amount of data that, with the right tools and processes in place, can be exploited to refine workflows, analyze budget allocations and expenditures, or inform other operational or strategic decisions. Libraries increasingly strive to follow data-driven management of their resources as they shape their collections, deploy their personnel, and design their virtual presence.

As library collections become more complex, distributed between print and digital formats, the challenges to produce and dynamically present management data becomes ever more challenging, especially when separate applications manage different collection components. The basic vision embraced in the new-generation library services platforms provides the means to unify the workflows related to the management of library resources, but it also provides an opportunity to bring statistical and use data across different collection components.

One of the key characteristics of newgeneration library management relates to this area of analytics and data. Many of the products recently launched aim to extend the basic level of reporting - common in earlier generation systems - into the realm of more sophisticated decision support systems. Integrated library systems typically have basic reporting capabilities regarding collection statistics, budget and fund expenditures and levels, and patron use. Basic reports, or ad hoc queries, do not necessarily provide the full level of analytics ideally needed to drive resource allocation and management decisions.

Integrated Library System Reporting Tools

Integrated library systems have offered reporting modules from the beginning, including capabilities such as the ability to count or list collection items by category or subject area, view detailed circulation statistics, financial reports related to acquisitions, or other operational scenarios.

While some ILS products have more sophisticated and complete reporting capabilities than others, the key limitation lies in not having access to data in a broader context, lb the extent that materials are managed outside of the ILS, it lies outside the scope of ILS reporting tools to measure them. Most ILS reports modules also don't go beyond listing and presenting data in response to query parameters. Many libraries funnel that data into another system, along with data from other sources, to be able to perform more complex analyses. Products such as Directors Station, from SirsiDynix, provide an additional layer of analytic and presentation tools based on data derived from the ILS. But the key challenge lies in integrating data from the many other sources relevant to library collections, operations, and patron use that happen beyond the bounds of the ILS. The ILS can provide a comprehensive reporting and analytics function for the library only to the extent that it manages all aspects of the library's activities. But when the HjS manages only a portion of the library's materials, its reporting and analytics can contribute only that component of the comprehensive analysis.

Library Services Platforms Embrace Analytics

The new generation of library services platforms aims to manage a broader representation of library collections, including electronic and digital materials in addition to print. A broader scope of management brings the potential to use that operational data for reporting and analysis. One of the themes that I hear from the organizations creating each of the new library services platforms involves an emphasis on leveraging the data generated to power decision support. As I review the functional descriptions of each of the members of the library services platform genre, most appear to be trying to address this issue.

Innovative Interfaces, Inc. announced Decision Center in January 2012, which by July of that year was in use by libraries in its early adopter program. This product mines operational data from Millennium or Sierra to help libraries identify candidates for weeding, selection, or transfer; manage budget allocations; or handle other operational activities. (See ducts/decisioncenter. shtml . )

Ex Libris Ltd. provides a variety of tools in Alma Analytics based on use data and operational data generated within Alma to inform operational decisions based on "purchasing trends, comparative analysis, and even predictive analysis." (See gory/ALAAnnual20 12_alma_analytics.)

Serials Solutions has identified collection analytics as a key priority, announcing Intota Assessment as the initial offering within its new library services platform. Intota Assessment not only will make use of the library's own collection and usage data, but it will also leverage resources within the ProQuest arsenal (such as the Serials Solutions knowledgebase of e-resource data, Books in Print, and Ulrich's) to support library decisions in operations and collection development. (See intota-assessment-available-in-2013.)

Supplemental Utilities

Most libraries rely on a variety of applications to manage their physical and electronic collections and to provide discovery and access to them. A few products have emerged that stand outside any of the core library automation products; these extract data from each relevant resource to provide comprehensive reporting and analysis. I am aware of a couple of products that follow this approach. collectionHQ, developed by Glasgow-based Bridgeall Libraries Ltd. and acquired by Baker & Taylor in 2011, specializes in tools to support the management of print collections and has been adopted by more than 7,000 public libraries worldwide. (See

Logi Analytics, headquartered in McLean, Va., has developed a comprehensive platform called Logi Info. It is designed to be used by a wide variety of organizations to create custom tools for reporting, analytics, web portals, mobile applications, and other data-oriented activities. Originally introduced into the library arena as Logi Insight, Logi Analytics has worked with academic libraries including Brown University, New York University, and Purdue University to track and analyze the use of print and electronic resources. (See

Some libraries have created their own tools for mining operational data for reporting and analysis. Harvard University, for example, is developing the Harvard Library Analytics Toolkit as open source software. It will create visualizations of data in support of collection development and other activities. (See lab/proj/library-analytics-toolkit. )

Global Analytics

Another level of analysis requires data beyond that managed within the library's local automation systems. When making collection decisions, for example, it is essential to have detailed data from a larger universe, such as available titles that are not held by the library and associated usage patterns as experienced by other libraries. Libraries may benefit from the ability to tap into analytics generated at a level above the local collection rather than having to perform comparisons title-by-title in a manual way. Useful data might include analysis of the library's holdings within a given disciple relative to material available and comparisons with holdings of other peer institutions.

OCLC, for example, has offered the WorldCat Collection Analysis service for many years, providing libraries with data about their collections, such as subject strengths, unique items, and other characteristics determined through WorldCat holdings. This service will become one component of the upcoming WorldShare Analytics that is part of the expanding set of services based on its new WorldShare Platform. (See https://www-oclc-org/collection-analysis.en.html.)

Comprehensive Website Analytics

In addition to mining different aspects of the data generated by applications used behind the scenes, it is vital to measure, analyze, and assess the activity that takes place on the library's virtual presence - including its website and all of the services offered to patrons for access to its collections and services. The data from these applications represents another component of the comprehensive analytics for the organization as well as the basis of analysis focused on website design and user experience.

Making sense of the totality of the use of the library's resources can be quite a challenge given the many components involved. Beyond the basic website - which may be delivered through a content management system such as Drupal - other patron-facing interfaces might include the online catalog associated with the ILS, a discovery service, link resolver, digital collection platform, and institutional repositories.

Within the virtual presence of the library, each of these individual components likely has at least some capabilities to log and report user activity. It can be quite a challenge, however, to aggregate usage data across all these components in a way that provides a complete view of user activity. It seems essential to have a clear understanding of how library patrons navigate through the website, and how they traverse through its various applications and content offerings. It is relatively easy to measure the churn of activity within each component, but assessing the overall success of a library's virtual presence can be much more complicated.

For patron interfaces, sheer quantitative reports may not be great indicators of success. A larger number of transactions within a user session, however, might reflect a frustrated user trying desperately to find an item of interest - a short session might represent an efficient search result. While high search statistics may be impressive, it seems desirable to have a relatively low ratio of search-related activities to content connections. It takes quite a bit of data and analysis across several different web-based applications to effectively assess the effectiveness of the library's web presence and to inform ongoing adjustments in its design.

Successful analytics for a library's virtual presence should be able to measure the paths of visitors as they enter the site and plunge into any of the applications or resources contained within. To fully evaluate the success of the library's virtual presence, it's essential to be able to understand the entire paths traversed and not just simple counts of how many times each of the components were exercised.

One of the techniques that can be used to measure the success of the virtual presence involves setting specific actions as goals. Google Analytics, for example, offers the capability of defining goals as well as a sequence of pages that funnel users toward a goal. Once defined, the overall performance and success rate of these goals can be monitored. The analytics of a library's website should be used not only to report activity, but as a tool to improve usability or guide decision making about the tools and services offered.

Actionable Data

As analytics become more sophisticated and extend into wider areas of library activities, they create opportunities to help libraries shape their collections, reallocate resources, and finetune the workflows of the tasks they perform. It seems important to get beyond reporting that simply reflects past activity; the data needs to be used in a more predictive way to inform ongoing operational decisions and influence future actions.

Libraries want to work as responsively to the needs identified as possible, but they also have some areas of limited flexibility. While data-driven management may be the ideal, it would also be unrealistic to assume that it can be the sole factor in every decision. Libraries often struggle between fixed organizational structures or policies and a more dynamic and agile management approach.

For any library, better data, with the appropriate analysis and context, has great potential to help improve each aspect of its operations. Given that libraries make use of many different applications and services that create immense amounts of data through the churn of daily use, letting that data go untapped would represent an unfortunate missed opportunity. Whether it is through exploiting the reporting and analytic capabilities available in those systems, acquiring specialized business decision systems, or going through local programming, I would urge libraries to not allow that data to remain fallow - but to exploit it to its full potential.

View Citation
Publication Year:2013
Type of Material:Article
Language English
Published in: Computers in Libraries
Publication Info:Volume 33 Number 05
Issue:June 2013
Publisher:Information Today
Series: Systems Librarian
Place of Publication:Medford, NJ
Notes:Systems Librarian Column
Record Number:18135
Last Update:2024-06-18 22:39:42
Date Created:2013-07-02 09:45:58