Library Technology Guides

Document Repository

Addicted to Data

Computers in Libraries [February 2006] The Systems Librarian

Image for Addicted to Data

I've never been one to do things in a small way. It seems for the last decade I've been obsessed with creating databases on the Web and working to fill them up. That common thread weaves itself through the projects that I gravitate toward in both my work at the library and in my personal time at home. While some enjoy video games as a diversion, my Type-A personality drives me to spend at least some of my leisure time plugging away at something productive. When I'm not pounding out my next column or frantically finishing up an article that's due, you're likely to find me at the computer working with one of my database projects. I'm not complaining-I find it relaxing and enjoy the challenge of ferreting out information and stashing it into a database.

In previous columns, I've mentioned some of the databases that I've created as part of my work at Vanderbilt. These include the technical systems that underlie the Vanderbilt Television News Archive, an image management system for our photographic archives, an interface for displaying and comparing art images in the classroom, an oral history repository, and many others. They all rely on a common infrastructure that I developed in Perl for Web-accessible databases.

In this month's column, I'll talk about some of the projects I do at home on my non-Vanderbilt time. I'll discuss some recent work I've done on my Library Technology Guides (LTG) site, additions I've made to the lib-web-cats online directory of libraries, and applications I created for managing my family's photographs. Much of the development work that I do for my personal collections benefits my Vanderbilt projects, and vice versa.

Library Technology Guides

I've mentioned Library Technology Guides ( in previous columns. While the site resides on a server in the library, I work on its content almost entirely on my own time. LTG includes several database components. All of these, including a directory of the companies that produce library automation software, a bibliographic database of literature of the field, and a full-text archive of news releases and announcements from all the automation companies, are related to library automation in some way. lib-web-cats functions as a standalone directory of libraries, but its primary purpose is to help me track the automation systems used in libraries throughout the world. Each of the components of Library Technology Guides supports my research interests in library automation and provides the raw data for the articles I write and the studies I conduct in this field. I'm also interested in making sure that this information is widely available to anyone else that might be interested. Last month's column, "Designing Sites to Distribute Content via Various Mechanisms," included a description of how I added an RSS feed to the site's current news feature to better disseminate the industry announcements that I add to the site practically every day. In 2005, I posted 427 news releases to LTG.

Library Technology Guides gives me a chance to manage a Web site with a great deal of freedom. While I try to make sure that it's always up and operational for all the users that have come to rely on it, I can make changes without the organizational complexities that apply to a Web site for a large university library system. For example, I recently implemented a redesign that went away from using tables to manage the layout of the site to using an all-CSS (cascading style sheet) approach. While it's not perfect, it gave me a chance to try this technique out on my own before we tackle such a change at Vanderbilt.


The lib-web-cats component of Library Technology Guides has been one of my pet projects for many years. I've been working on this database of libraries for about a decade, and it has grown steadily. I initially focused on making sure that the largest and most prominent libraries were represented, and while lib-web-cats includes libraries from throughout the world, it definitely has a slant toward those located in North America. Now I'm interested in making the directory more comprehensive so that I can use it as the basis for future studies of library automation trends that would require data from all the libraries within a given sector.

The category that I'm currently focused on is the one for public libraries in the U.S. As of mid-December last year, I'd added 12,033 listings to libweb-cats, bringing the total number up to 24,355. While in the past, I've added libraries essentially one at a time, for the public libraries in the U.S. category, I was able to take advantage of external data resources to add libraries en masse. The National Center for Education Statistics (NCES) gathers information on libraries each year and makes its data files available for download. It also allows them to be repurposed (see libraries). The NCES data file for 2003, the most recent year for which data is available, describes 9,211 administrative entities representing a total of 17,299 library outlets. All of these libraries are now included in lib-web-cats.

My strategy for making lib-web-cats comprehensive for U.S. public libraries involved using the NCES data as a reference. I wrote a Perl program that loaded the NCES data into lib-webcats. As the program processed each NCES record, it attempted to match any existing one in lib-web-cats. When the program found a match, it tagged the existing record with the NCES identification number; if it didn't find a match, it created a new record. That part was easy-it created more than 12,000 new entries. I'm working on the harder part now, which is to flesh out all these new skeletal records. The NCES listings come with basic information, such as the library's name, organization, and address, but they lack the info I track in lib-web-cats, such as the URLs of the libraries' Web sites and online catalogs and various details regarding their library automation environments. The benefit of the NCES data is that it provides a comprehensive list of libraries. Once I know that a library exists, it's fairly quick work to obtain the additional data. At the time that I wrote this column, in mid-December, I expected to finish the U.S. public library project by the end of the year and then to move on to a similar project for U.S. academic libraries. I believe that an analysis of the completed data will provide some interesting results.

lib-web-cats also allows libraries to add or update listings themselves. Not unlike the wiki concept, the community of users that use lib-web-cats can contribute to its upkeep. I welcome all my CIL readers to visit the site and review their listings.

My Site for Family Pics

lib-web-cats resides on hardware at Vanderbilt, because its level of use requires a more industrial-strength Internet connection. I also have a server on my home network that uses a software infrastructure similar to my Vanderbilt projects and that hosts a few of my personal Web sites. It connects to the Internet via DSL. The version of residential DSL service to which I subscribe is asymmetric-outbound traffic flows at a speedy 3 Mbps, but inbound traffic takes a slower pace of 384 KB/sec. So accessing the Web sites hosted on my home network isn't blazingly fast for external users, but it is adequate, given the site's finite set of visitors.

Another one of my recreational projects is the Web site on my home network that manages our family photographs. It uses an interface similar to the one I developed at work for managing the photographic archives of the library's Special Collections unit. I simplified the metadata structure and refined the Web interface to improve its usability.

While setting up the infrastructure was fairly easy, the process of pouring in the content is never-ending. The family photo collection includes both digital photographs that we've taken in the last couple of years as well as scans taken of prints. I scan each photo at 600-dpi resolution or higher and use Photoshop to perform color correction, fix imperfections, and create the thumbnails and intermediate-sized images needed for the interface.

Just like any other digitizing project, creating the metadata associated with each item is just as important and time-consuming as the digitizing process itself. But the effort pays off. The ability to call up photos by dates, names, places, and events makes a collection of family photos much more useful and interesting. We've settled disagreements many times about when or where something happened by using the database.

I especially like the idea that everyone in the family across the country can get to the pictures. While I don't advertise the URL of the Web site to the general public and restrict access through usernames and passwords, I do provide access to friends and relatives.

Through this effort, I've become established as the archivist of our family photos for the immediate family and, increasingly, for the extended family. A lot of my relatives are shutterbugs. So far, I've added about 10,000 photographs to the database, and I'm not seeing any light at the end of the tunnel. The photos, which were taken in varied parts of the globe, range from the early 190Os to the present.

This project also has a digital preservation component. Recent national disasters showed us how easy it is to lose irreplaceable items such as photo albums. In addition to the two copies of the photographs I keep on the servers, I also have copies on DVD that I store away from the house.

It's interesting to compare this approach with Web sites such as Flickr ( Flickr, a site for storing and sharing photos on the Web, has become enormously popular. It's easy to register for a free account that allows you to upload about 20 megabytes of photos per month. Flickr relies on tagging-an informal metadata scheme, often called a folksonomy, that allows users to make up their own classification terms instead of using those from a formal thesauri-to organize photos. Not only can you tag your own photos, but any other Flickr member can too. Folksonomies have gotten a lot of attention lately in library discussions about how nonlibrary applications are beginning to pick up on the importance of metadata.

Manage Your Digital Life

While my approach of using fullblown library applications to manage personal collections may be a bit unusual, I do think it reflects problems that arise as our lives become increasingly digital. I think that the rapid transition to digital photography and downloadable music and video are but the beginning of a shift to technologies that will have huge implications for our lives. The transition from handwritten letters to e-mail for personal correspondence happened years ago. Electronic banking and online bill paying eliminate many paper financial records. Yet most households have few tools to help them keep track of all this digital stuff. How many times have I heard about the loss of years' worth of e-mails or even digital photos because of a computer failure? We're all going to need content management systems to help us handle our personal collections of digital media.

While this column makes it sound like I work all the time at home, that's not quite true. I have my nondigital diversions as well. We watch lots of movies on DVD, for example. Did I mention I created a Web database for our DVD collection? Oh well, maybe I am addicted to data after all.

View Citation
Publication Year:2006
Type of Material:Article
Language English
Published in: Computers in Libraries
Publication Info:Volume 26 Number 02
Issue:February 2006
Publisher:Information Today
Series: Systems Librarian
Place of Publication:Medford, NJ
Notes:Systems Librarian Column
Record Number:11809
Last Update:2024-06-13 05:15:27
Date Created:0000-00-00 00:00:00