In this “Systems Librarian” column I cover a wide variety of topics that I hope are of interest to librarians and others that work in information centers and libraries with an interest in technology. In the last year, themes I've covered included reports and analysis of major events in the library automation arena, general industry trends, emerging products or technologies, and descriptions of projects or programs related to library technology. As I begin a new year with this column, I encourage readers to write to me with suggestions of topics that they would like to see covered.
Many of the projects in which I've been engaged for the last few months have involved creating Web-enabled databases. As I work on these projects, I've been consistently impressed with the flexibility and power of the Perl programming language. For a glimpse of what can be accomplished with Perl take a look at my “Library Technology Guides” website (staffweb.library.vanderbilt.edu/breeding/ltg.html) or some of the database-driven Web pages on Vanderbilt's Heard Library site (www.library.vanderbilt.edu). In this month's column, I would like to talk about Perl and why systems librarians and other library technical staff might consider adding it to their repertoire of skills.
Perl stands as one of the most popular programming languages today, with about one million users. While not necessarily best suited for every programming task, it thrives on tasks related to text processing, Web interfaces, and database access—all staples of the typical library technical environment. If you work in this kind of environment and haven't already taken on a programming language, then Perl is a great place to start.
Unsurpassed text-processing capabilities
Perl excels at dealing with textual information. For highly complex mathematical needs, most programmers prefer C or other languages. But when the task at hand involves manipulating text, then Perl stands above all others. It's not that other languages cannot do the same kinds of things with text. It's more that Perl can do in one or two statements what others might take dozens of lines of code to accomplish. Here are some of Perl's characteristics that make it great for working with text:
Flexible variable structure. Perl's variables, known as scalars, arrays, and hashes can all be loaded up with text strings of any length, absent of any need to declare in advance the length of a variable or how its going to be used.
Pattern matching and substitution. One of Perl's most powerful features involves its ability to match patterns. This aspect of Perl is almost a language in itself. The programmer can specify what character, type of character, group of characters, in any position, to be matched within a piece of text. Not only can patterns in text be found, but can then be substituted and transformed in just about any way. This capability comes in very handy when parsing free-form text into a structured record, for transforming text from one form into another, or for making global substitutions throughout a body of text.
Parsing. A common library programming task involves taking some sort of delimited file and converting it into a database structure. The Perl “split” command is close to magic for such operations. Just identify the character or group of characters that separates the elements, provide some variable names, and its done.
A Web-friendly programming environment
A great deal of the technical work done in the library relates to making information available on the Web. Though Perl was originally developed before the Web came about, it now includes a number of modules that make creating Web-enabled applications easy. The basic model for Web programming is called the Common Gateway Interface, or CGI. CGI defines the ways that data structures are transferred between Web pages and Web servers. With the CGI module (CGI.pm) Perl takes on the ability to process data sent from Web forms, dynamically create pages, and other Web-related tasks. Perl is one of the most frequently used Web programming languages.
Another typical library application involves dealing with information stored in databases, and making it available to users on the Web. Perl can easily communicate with databases through software interfaces such as DBI and ODBC. All databases have their own proprietary methods for managing data internally. While it might be possible to write a program that talks to each database's own proprietary procedures, a more flexible approach would use a software layer that can translate into many different database applications. In this model, a piece of software stands between the database's own internal command structure and that of the programming language.
DBI. Perl offers a database model called DBI that provides the ability to write Perl programs that can interact with any database, given the availability of a DBI driver for that database. You can find DBI drivers for almost all of the common commercial and Open Source databases.
ODBC. An alternative approach to dealing with databases involves the ODBC (Open Database Connectivity) model, originally conceived by Microsoft. ODBC follows the same abstract model for communicating with databases, and is supported by Microsoft Access, SQL Server, and many non-Microsoft databases.
SQL. Whether you use DBI or ODBC, the way that you construct Perl programs to access data is much the same. Both approaches depend on statements sent to the database formulated in Structured Query Language, or SQL. Very much the lingua franca for communicating with databases, SQL statements can query and update a database and can even be used to define its structure.
Perl, though originally associated with the Unix operating system, can be found for almost any major computer environment. If you work with any of the variants of Unix, such as Sun Solaris, IBM's AIX, HP's HP/UX, IRIX from Silicon Graphics, Inc., or any of the distributions of Linux, Perl will almost certainly be included as part of the basic operating system. While Perl doesn't automatically come with the Windows or Macintosh operating systems, it's readily available. MacPerl, the most common version of Perl for the Mac is available from www.macperl.com. The easiest way to get Perl for any of the versions of the Windows operating system is from a company called ActiveState (www.activestate.com). They offer what they call ActivePerl for free, while they sell a suite of development tools for the serious programmer. Novell also includes Perl in their NetWare network operating systems, though there are far fewer modules available than with the Unix distributions.
I tend to work on both Unix and Windows systems. Having a scripting language that works just the same on both platforms often proves to be convenient. It's common, for example, to build prototypes on a small Windows-based system, leading up to a production implementation on a Unix server.
Perl operates just the same way, whether it reside on Unix or Windows. In most cases, the only adjustment needed involves the way that you reference file names. With Windows you use backslashes to separate the components of a file name while Unix uses forward slashes. Another complication sometimes involves the characters that the operating system uses for line feeds in a text file, but Perl can generally accommodate that transparently.
A community of users
As one gets involved with using Perl, it soon becomes apparent that it is more than a programming language, but a community of users built around the language and its culture. This community offers a number of resources for those learning and using Perl. All the 300+ modules that have been created to extend Perl's functionality have each been contributed by the its users. Many sources of information are available about Perl including news groups, Web sites, books, journals and magazines. Perl users groups, often called Perl Mongers, have been formed in many colleges, universities, and cities. New users of Perl will find abundant resources to help them learn to use it.
Perl Version 1.00 was first created in 1987 by Larry Wall. A programmer for Unisys, Wall had previously been involved in developing other Unix utilities such as “rn” a program for reading USENET news and other system administration tools. Wall also has a background in linguistics, and a strong interest in the intersection between human languages and computer languages. He now works full-time for O'Reilly & Associates, a publishing company focused on technology, and remains the chief architect of Perl.
Larry Wall's original plan was that Perl would be a tool for Unix system administrators, replacing utilities such as “sed”, “awk”, and “sh”. While these Unix utilities haven't disappeared, Perl has grown enormously. Each subsequent version of Perl has been a step in an evolutionary development, adding new features and capabilities while maintaining compatibility with its predecessors. The version of Perl in current use is Version 5.6, representing a very mature and stable programming language.
The definitive resource for Perl is Programming Perl by Larry Wall, Tom Christiansen, and Jon Orwant, published by O'Reilly & Associates. Now in its third edition, the book is affectionately known as the “Camel Book,” a reference to its distinctive cover art. The book presents a general introduction to Perl and its culture, and the offers a detailed guide to using all the features of the language.
CPAN, or the Comprehensive Perl Archive Network, is a very large collection of Perl software and documentation. If your looking for any software related to Perl, this is a great place to start. CPAN can be found on the Web at www.cpan.org.
The price is right
One of the basic rules of Perl is that Perl is free. You shouldn't have to pay for the Perl interpreter, which is the software that makes Perl programs work on a computer. Those with a technical bent may want to start with the source code distribution. Ready-built binary executable versions are also readily available. Either way, it shouldn't cost you a dime.
That isn't to say that applications written in Perl have to be free. There is nothing about the Perl license that prohibits commercial use. Several companies sell integrated development environments or other enhancements to Perl. Many commercial applications are built using Perl, as well as those following the Open Source model.
The Future of Perl
The development of Perl continues. While the current Version 5.6 can be considered a mature programming language, there still exists a need for the language to evolve along with other aspects of technology. On December 18 of this year, Perl will be 14 years old—that's fairly ancient in the realm of computer languages. Perl's designers want the language to continue to be relevant onward into the future, but see the need to introduce some changes. The next step planned for Perl is a bold one. Version 6 will bring to Perl much more of an object-oriented flavor. All the versions till now have been backwardly compatible, meaning that programs written in older versions of the language will continue to work with the current version. Not so with Perl6. Some of the planned changes will likely involve translating existing Perl programs to work with Perl5. Larry Wall writes about the future of Perl in a series of essays called the Apocalypse. These essays and others can be read at http://www.perl.com/pub/au/Wall_Larry.
Perl by no means is the only programming language around. There are many other alternatives that might better fit your technical environment and your programming abilities.
For those serious about developing large-scale or high-performance applications, C or C++ continues to be very highly regarded. C is a compiled language. Once written, the C source code is passed through a piece of software called a compiler that translates it into an executable binary program. The disadvantage of this programming model is that it must be re-compiled for each operating system on which it is intended to be used. The advantage is speed. Compiled programs generally operate much faster than interpreted languages such as Perl. Recent versions of Perl, by the way, include the ability to translate Perl scripts into executable programs, gaining this performance boost.
Java, though a relative newcomer to the programming arena, has quickly become a mainstay for Web programming. Learning to program in Java is a bit more difficult than Perl. Java was introduced by Sun Microsystems as a programming language that would be ubiquitous among all computing platforms. Though it has become very well accepted for Web-related applications, it has not completely lived up to its original ambitions. Disputes between Microsoft and Sun regarding Java have also slowed its advancement.
PHP (www.php.net) is a scripting language specifically designed for the Web. Many find PHP to be a faster and easier way to build Web applications than Perl or other programming languages. PHP is often used in conjunction with MySQL (http://www.mysql.com/), an Open Source database management system, to build dynamic Web applications.
Perl for Library Use
I hope that my brief sketch of Perl's features, culture, and history has aroused your interest in learning more about his handy programming language. There are many technical problems that commonly arise in libraries that are well suited to Perl's abilities. While I'm not advocating that all systems librarians become programmers, those that have an interest in developing their technical skills might find learning Perl a nice way to gain a little programming experience. Perl, unlike many programming languages, lets you start small without having to know a great deal about programming or the nuances of the language itself. But the more you work with it, the more that it unfolds itself to you. Over time you will find yourself whipping out Perl scripts that solve day-to-day problems, that automate routine tasks, or that really spiff up your library's Web site.