It seems that everyone is talking about Web 2.0. This new vision of the Web enables greater interactivity, more user control of information, radical personalization, the development of online communities, and more democratic management of information. Supporting technologies include blogs, wikis, RSS, podcasts, tagging, XML, and Web services. [Editor's Note: For more information on Web 2.0, see the March 2006 issue of The Information Advisor.] While I wholeheartedly agree that we need to be thinking about Web 2.0 concepts, I worry that we haven't yet fully achieved Web 1.0. Basic Web functionality needs to become ubiquitous as we move forward on building Web 2.0 concepts and technologies.
Today, there's an uneven deployment of Web technologies in libraries. On the one hand, I see many well-designed Web sites that deliver library services to users. Lots of these sites already embrace many aspects of Web 2.0.1 also see a large number of libraries that either have no Web site at all or that try to get by with a site that's underdeveloped and not able to meet the expectations of the current generation of Web users.
I made these observations based on many hours of looking at all types of library Web sites. One of my long-standing projects is maintaining the lib-web-cats online directory of libraries. I've created entries for more than 25,000 libraries, and I've looked at the Web sites of those libraries that have them. That experience has given me a bit of perspective.
First, let me say that there are lots of outstanding Web sites out there. I expect large municipal libraries and university libraries to have top-notch sites, and I'm rarely disappointed. I also see Web sites of smaller libraries that strike me as first-rate. These sites reflect the work of clever, artistic, and Web-savvy folks.
Catching Up to Web 1.0
That said, I also see a vast number of libraries that either have no Web sites at all or have sites that are seriously outdated, often in content but particularly in terms of visual appeal and functionality.
I've recently been working to make lib-webcats more inclusive, especially for U.S. public libraries. I used a comprehensive list of these libraries provided by the National Center for Education Statistics to systematically add all the U.S. publics that were previously missing. With the skeletal records in place, I began the process of manually adding the URLs for each library's Web site and online catalog. It became apparent to me right away just how many libraries have no Web site at all. Those without sites tend to be small and rural, but they can also be medium-sized. I think these omissions present an important issue: A lot of libraries still need to get to Web 1.0.
I'm also surprised by the number of libraries that still offer sites from the early Web. Remember your first Web site? It probably had a garish background, used every font size and style you could think of (because it was possible), and had animated GIF images. But it got the basic job done. As I plow my way through the lists of library Web sites, it's hard not to notice how many of these early sites remain.
For many libraries, putting up even a basic Web site can be a challenge. They don't have technical staff to run a server and have limited local expertise in site design and coding. A site that a library might have commissioned from a consultant or volunteer 5 years ago may look a little stale by today's standards. A Web site's content, design, and appearance require constant updating.
Libraries that don't have the facilities to maintain their Web servers or the local technical expertise to design and code a professional-looking Web site may have several options. Many state or regional library agencies and consortia offer Web site design and hosting services for their members. Lots of libraries take advantage of commercial services. In today's environment, it seems that all libraries need to have some representation online. It would be beneficial for those not yet on the Web to have easy and convenient ways to establish an effective presence.
Creating Valid HTML
Another issue that I've taken up lately is the technical soundness of library Web sites. We expect Web pages to load correctly and to look about the same regardless of which browser we use. The Web is intended to be a universe of interoperability based on internationally established standards. Whether we write code by hand or use some authoring tool, all Web authors are aware that their pages live behind the scenes in some flavor of HTML. It's the HTML that tells a browser how to present the page.
When it comes to which version of HTML to use, the Web gives us lots of leeway. You can use whatever you deem best. If you're worried about supporting older Web browsers, you might need to use HTML 4.0. Today, XHTML 1.0 is generally the preferred version. But it's important to follow the syntax and tags specifically allowed by the HTML version you specify in the headers of your Web pages.
I don't consider myself an expert on the aesthetic aspects of Web design or usability, so I generally refrain from making comments on the finer points in those areas. But I do try to stay on top of the behind-the-scenes technologies, and I spend a lot of time building applications that generate dynamic Web pages. I hope that sites such as Library Technology Guides and the Vanderbilt Television News Archive reflect my technical ability. I know they reflect my aesthetic inability, since they both suffer from a plain, no-frills design. But they're coded to HTML standards and are validated.
One of the main rules I follow as I build Web sites is that the underlying code must be consistent and valid. Perl scripts generate most of the pages. My basic approach involves writing a script to produce the page, making the necessary adjustments so that it displays all the required information, and then validating the HTML. If I see any errors either in the visual presentation of the page or in the validity of the underlying code, I iteratively adjust the script until it looks right and validates.
I'm a strong believer in coding Web pages to standards. After all, we wouldn't want records in our library databases that didn't properly conform to MARC standards. The standards-compliant pages will migrate well as your Web server environment evolves. They will also be much more likely to display correctly and consistently for site visitors.
Checking Validity
While many tools exist for validating HTML, I prefer to use the service that's available from the World Wide Web Consortium (W3C; http://validator .w3.org). Since these folks establish the standards, I trust their validator. It's easy to use-just copy and paste the URL into the "Validate by URL" box and click the "check" button.
Given the different versions of HTML, what's valid depends on the rules of the version you've selected. First, the validator reads the headers of your page to see which rules apply. It looks for the HTML version to use and the character set to expect. I prefer to use XHTML 1.0 and the "latin-1" character set, which is specified in the ISO-8859-1 standard. The code below illustrates a Web page header.
The validator becomes most angry when a Web page doesn't provide the proper information in the header. If you forget to specify the HTML version to use, you'll see something like "No DOCTYPE found! Attempting validation with HTML 4.01 Transitional." This means, "I can't validate your page if you don't tell me what rules to follow." But if you supply a valid Document Type Definition in your header, the validator will systematically examine your Web page to see if you followed all the rules that are set within that standard.
The validator will tell you when you've made a mistake, show you where it is, and give advice on how to fix it. Some of the most basic and easy-to-fix errors involve forgetting to close tags and improperly nesting tags. Some pages rely on multiple layers of nested tables for the page layout. It's easy to lose track of each table and cell and where each row ends and begins. This can lead to validation problems. The validator will also complain if you use characters in the text of your page that aren't supported by the encoding rules you specified. You may need to translate some special characters into alternate representations to achieve perfect validation. Lots of things can go wrong, but it's usually fairly easy to identify and fix errors so that you end up with perfect code. If you don't go through the validation process, you'll probably have errors in your pages that you don't even know about.
Forgive and Forget?
One of the features of the latest Web browsers is that they don't complain when they encounter mistakes on Web pages. All of the major browsers will attempt to render a page even if it contains errors. In most cases, the errors do not result in major problems in the page's visual display. But there are times when coding mistakes will lead to inconsistencies and problems. So if you leave errors in the page, it may look OK with your browser, but others may have different assumptions as they attempt to work around your mistakes.
It's a good thing that browsers exhibit such forgiving behavior. It would be very frustrating if pages refused to load because of minor syntactical infractions.
What's not so good is that the various Web-authoring tools allow you to create pages with errors. While popular tools such as Macromedia's Dreamweaver include a built-in validator, they don't necessarily prevent you from generating pages with errors. Whether you code by hand or use an authoring tool, you have to take a proactive approach to ensure that your pages conform to standards and are valid.
While HTML tends to forgive errors, XML-based applications demand strict adherence to the rules. Most XML parsers reject an XML document that has a single error. Documents that follow all the syntactical rules can be considered well-formed XML. Those that also conform to the associated Document Type Definition or schema can be processed as valid XML. Many of the technologies connected with Web 2.0 involve XML. So get used to precise coding practice if you want to move forward to some of the more dynamic, interactive, and XML-oriented Web applications.
Let's Take a Survey
I recently conducted a small exercise to learn how well library Web sites conform to coding standards. Using a listing on my Library Technology Guides site (http://www.librarytechnology.org/ arl.pl) and the W3C validation service, I attempted to validate the main page of each of the 123 members of the Association of Research Libraries (AEL). These libraries all have very sophisticated Web sites and should have technically proficient site administrators.
The results surprised me. My initial hypothesis was that the majority of the pages would be valid. I knew there were a few issues on my own institution's Web site, but I figured we were the exception. To my amazement, only 21 out of the 123 Web sites passed validation. Four sites had only a single error, 11 had two to five errors, and seven had more than 100 errors.
On a pass-fail basis, the ARL members that passed were CISTI, Dartmouth College, Duke University, the Library of Congress, McMaster University, the National Library of Medicine, the Research Libraries of New York Public Library, New York University, Oklahoma State University, Princeton University, the University of Arizona, UCLA, the University of California-San Diego, the University of California-Davis, the University of Chicago, the University of Kentucky, the University of Maryland, the University of Minnesota, the University of Saskatchewan, the University of Texas-Austin, and the University of Wisconsin-Madison. Most of these specified "XHTML 1.0 Transitional" as their Document Type Definition. New York University, the University of California-San Diego, the University of California-Davis, the University of Kentucky, and the University of Texas-Austin each get extra credit for conforming to "XHTML 1.0 Strict."
Given the unexpected 82 percent failure rate, I decided to check other types of Web sites. Large urban public libraries didn't fare any better. I maintain a list of the Urban Libraries Council's 136 members. I didn't take the time to work through the list comprehensively, but after seeing only one or two valid sites out of the first 30, I anticipated findings similar to those for ARL's member sites. So much for libraries. Next, I tested the validity of each of the commercial ILS vendors' sites. Most failed. Only Follett and Keystone Systems passed validation. Naturally, the Librarian's Internet Index (http://www .lii.org) passed.
Overall, library Web sites didn't fare so well in my survey. What about commercial sites? Again, I was surprised. Surely the high-powered sites on the commercial Web would deliver standards-compliant pages. That isn't so. Google had 50 validation errors, eBay had 248, Yahoo! had 270, CNN had 68, Flickr had 15, and Amazon had 1,292.
Overall, this little experiment proved my hypothesis wrong. I expected to find a high proportion of Web sites that follow coding standards. In reality, most sites fail to completely conform to them.
How do I interpret these results? If you were to view them as a model, you'd think it's a wonder that the Web works at all. But in fact it works quite well. While following standards does lead to increased reliability and better Web site operation in the long term, in practice, Web sites riddled with theoretical errors usually work fine. Nevertheless, I believe that creating Web pages with valid HTML is well worth the time.
Shoring Up the Infrastructure for Web 2.0
As we progress more toward Web 2.0-style interfaces, this lack of adherence to standards may haunt us. If Web 2.0 requires more use of XML-based technologies, we're moving into a realm less forgiving of sloppy coding. Improving the technical practices so that the current generation of Web sites relies on valid XHTML and CSS coding standards will provide a stronger footing as we move forward with next-generation Web technologies.