One of the realities of today's libraries and information centers is that we provide more services via the Web, with fewer face-to-face encounters with our users. Although the delivery of information and library services via the Web is well suited to today's environment, it's difficult to know what your users really want and whether they use your site successfully. In this month's edition of the Systems Librarian, I'll focus on ways that a library can gain insight into the effectiveness of its website through the analysis of web server logs and by creating additional routines that record data about your website's visitors and that help you establish communication channels with them.
Web server log analysis
The first place to start in understanding the use of your website involves analyzing the log files that it automatically produces. All web servers generate detailed logs of page requests. For every page it delivers, the server creates a record in the log file that includes: the page requested, the exact date and time, and the network address of the computer making the request. Optionally, Web server logs can be configured to record the web page that linked the user to the page and the version of Web browser used. These logs can provide a wealth of valuable information about the use of the website. They measure the overall volume of use; reveal patterns of how visitors navigate through the site, as well as showing basic geographic demographics about the site's visitors.
A number of software products are available that perform detailed analysis of Web server logs. They range from sophisticated—and expensive—commercial packages such as the WebTrends Analysis Suite from NetIQ (www.netiq.com) to freely available Open Source packages such as Analog (www.analog.cx). A good Web server log analysis system will not only provide basic numbers of page views, but will track user sessions, the average number of pages viewed per session, which pages visitors used to enter and leave the site, and lots more.
Server log analysis provides a wealth of information about how users interact with your website. You can easily see what pages visitors select most frequently and which ones are neglected. In reviewing the frequency of views for each page, you may discover problems related to the usability of your site. Pages that you think may be of significant interest may have a low number of requests if they are difficult to find within your site.
In addition to processing the server's access log with analysis software, you can also learn about use patterns through manually reading through the file. You might, for example, randomly select a few user sessions and follow their progress page-by-page as they navigate through the website. Patterns of use or problems in usability may become apparent that were not obvious through statistical analysis alone.
Even the most sophisticated Web server log analysis packages have limitations on what they can reveal about the use of your website. They help you understand the volume of use and the popularity of individual pages within your site but do little to tell you who your users are. In order to capture this information, you can use scripts and databases to collect and store information about your site's visitors, which in turn can be used to communicate with them.
Program your site for two-way communication
One of the ways to make your website more dynamic involves setting up channels of communicate with your users. At a minimum, every website should have a Web form or mail link that invites comments and suggestions, but there are also more powerful techniques possible if you are able to run scripts on your server. Going beyond static HTML content requires the use of some type of scripting language. Though it takes a little more technical ability, the pay-offs are well worth the effort. Though I tend to use Perl as the scripting language for the websites I manage, lots of other alternatives work equally well such as ASP and PHP.
Optional registration. Offering the ability for frequent users to become subscribers to your site is one way to enable communications with them. On my Library Technology Guides website, for example, I provide a form that allows visitors to register for monthly news updates. The form collects their name and e-mail address and provides a space for comments. Once submitted, the information goes into a simple database. Each month, a script runs that uses this database to send an e-mail message to each of the subscribers that summarizes all the recent developments in the library automation industry. If your website covers multiple disciplines and has a high volume of content, then you might give users the ability to select their specific areas of interest and how frequently they want to receive updates.
With Library Technology Guides, registration is entirely voluntary and optional. All content on the site is available regardless of whether you've registered. Not surprisingly, only a very small number of the site's visitors take the time to register. You can expect only one or two percent of site visitors to register if the process is voluntary and if no additional resources are provided to subscribers.
Mandatory registration. A more aggressive approach might require visitors to register in order to gain access to the main content of the site. For the Vanderbilt Television News Archive, we have a strong need to understand the types of individuals that use the resource. In order to gather this information, the site requires first-time users to register before they are allowed to search the main database. We make the registration process as simple as possible, while collecting the level of information needed for in-depth use analysis. Some may perceive that holding back content to non-registered uses a bit heavy-handed, but experience shows that very few visitors register without such enticements. It is also consistent with what many users have experienced on the commercial web where they can browse through a limited amount of information for free, but must pay for full access—often labeled as “premium content.”
Application logs. Although web servers automatically record each page requested in their access log, you can often extend the amount of information related to the use of the website by creating additional log files. When web pages are created dynamically by a script, it's often possible for that script to create its own set of log files. Such application logs are especially useful for database-oriented sites. I find it enormously helpful to generate a log that contains the query formulated by the user, the number of records found in the database retrieved, and the number of records viewed. Reviewing these application logs reveals whether or not users understand the kind of information available in the database, if they are formulating queries that work well for the database, and if they are receiving the appropriate results. With knowledge of the most frequently requested search terms, the content of the database can be refined and optimized to return better results. In many cases I've changed the default search types and search operators based on patterns I've observed in the database application logs. Application logs are not difficult to create. In creating a Web-enabled database site, scripts are used to create web pages that display the results of a database query. With just a few additional commands, the script can be programmed to record statistical information about the query and results to a log file at the same time it generates the Web page viewed by the end-user. I rely extensively on these application-generated log files on the websites that I've built, and find that they give me much more useful information about website activity than the built-in server logs. Application logs are completely customizable, and can be set up to capture specific pieces of information that are difficult or impossible to obtain from the server's native log files.
Putting Use Statistics to Work
A healthy website needs constant attention; likewise a neglected website quickly becomes irrelevant. It is vital to continually monitor the usage data generated by your website to understand how its visitors interact with the resources available. At a basic level, it is interesting to see if the overall use of your website is growing. If it is flat or declining, you know that something is wrong. As you add new content and add new features, it is always helpful to study the impact of the change on the visitors to the website. Statistical measurements and log file analysis are valuable tools in understanding whether changes made result in a positive or negative effect on the usability of the server.
Information available in your server logs can also be used to help you make decisions about potential changes. If you are tempted, for example, to use features of HTML that may not work in all Web browsers, then study your Web logs first so that you'll know what portion of your users will be inconvenienced. The range of countries noted in your log files may help you decide whether you need to offer pages in other languages.
A new venue
This month's Systems Librarian column is the last to be published in Information Today. As part of a major redesign effort, the Systems Today section, including the Systems Librarian column will no longer be a standard feature of Information Today. Beginning in 2003, look for my Systems Librarian column in Computers in Libraries, also published by Information Today, Inc.