Is not exactly the epic struggle of good versus evil, but year after year we find ourselves fighting against a constant onslaught of obnoxious and predatory activity on the Internet. There seems to be no end to spam, phishing, viruses, worms, and other malicious activity. This month, we'll take a look at sortie of the recent forms of attack out on the Internet and I'll offer a couple of tips on how to keep your site a bit more secure.
When I started studying and writing about Internet security issues more than a decade ago, we anticipated future generations of network infrastructure with more stringent security built in that would eliminate obnoxious and dangerous activity. While great strides have been made in network security technologies, they are mostly deployed in internal networks. These security layers rely on industrial-strength authentication and authorization, along with strong encryption. Applying this level of security to the Internet at large would result in a restrictive environment that would have a huge impact on its openness. It's important to keep in mind that while it's annoying to deal with floods of spam and to maintain constant vigilance for viruses and worms, that's the price we pay for free and open access. That doesn't mean that there isn't room for improvement, but it illustrates the compromises and trade-offs involved in balancing strong security with convenient access.
Spam Still Proliferates
Spam continues to be one of the top annoyances of the Internet. In many cases it's more than just an annoyance. Much of the spam out there relates to phishing and identity theft, which causes economic and personal damage to its victims. While it's not exactly big news that the onslaught of spam continues, it's important to take note of its recent evolution and of what we in libraries need to do to avoid unwitting complicity in its spread.
As incredible as it seems, spam continues to be a profitable activity. The phenomenon of spam-unsolicited commercial email-involves sending messages to millions of recipients in the hope that a few of them will buy their wares or fall for their scams. Even when the odds are 1 in 10,000 that a recipient will bite, when people can spin out many millions of messages with very little cost, many of them continue to find it worth their while. It takes only an infinitesimal number of positive results from a spam barrage to create an adequate payoff to the spammer and to the scoundrels that hired him.
Spam's Internal Anatomy
In times past, spammers were able to churn out messages by the millions by taking advantages of open mail relays out on the Internet. Let's take a quick look at the path of an email message.
In most cases, you create a message in a client of some sort that allows you to use basic word processing tools to compose and edit. Once you press the Send button, your email client makes contact with its designated mail server, engaging in a conversation following the rules specified by SMTP (Simple Mail Transfer Protocol). True to its name, SMTP requires only a few fields to formulate a valid email message and send it on its way. A mail message consists of a header, which includes structured fields such as "to," "from," "cc," "bcc," "date," and "subject"; and the message body. The mail server essentially stamps each outgoing message with its own domain name, denoting its origin.
Before a mail server generates a message and relays it on to the Internet, it usually checks to be sure that the mail client generating the message belongs to its designated organization. Some mail servers enforce tightly controlled access, requiring usernames and passwords or digital certificates. Others simply check that the mail sender falls within its own local network. Either way, mail servers normally do not provide service to those outside their immediate organization.
The classic spammer strategy is scanning pages on the Web in search of email addresses and popping them into a database. These databases of potential addresses are then fed into automated scripts which churn out the messages we see in our mailboxes as spam. Many of the viruses and worms churning on the Internet are designed to hijack ordinary PCs connected to broadband networks to spew out huge quantities of email messages.
Closing Open Mail Relays
A mail server that has been configured to generate mail without regard to the sender of the message is considered an "open relay." These open mail relays exist to the delight of mail spammers. While humans generate mail messages one at a time through a mail client such as Microsoft Outlook, spammers generate messages programmatically through scripts and huge databases of email addresses. Spammers constantly scour the Internet for open mail relays that they can commandeer to ship their goods.
One of the tactics in the battle against spam is eliminating open mail relays. All legitimate mail servers enforce some type of verification to lock out spammers. Any organization that allows an open relay risks appearing on "blacklists" that many mail systems check before accepting incoming messages. If your organization appears on a blacklist, then even legitimate mail from that domain name may not be delivered. These blacklists provide strong motivation to organizations to ensure that their systems administrators keep their mail servers under tight control.
Port 25 Filtering
Another tactic to disrupt the spammers is shutting down unauthorized activity on port 25, the network socket that delivers most email. Most ISPs (Internet service providers) and organizational networks enforce technical restrictions on their networks to curtail spam activity. Atypical ISP will provide a mail server for its own customers but will block access to others. As an added measure to reduce opportunities for generating spam, ISPs tend to block all activity on port 25, except for its own customers accessing its own mail servers.
While port 25 mail filtering helps reduce spam, it also makes it a bit more complicated to use some types of mail services. If, for example, you need to access the mail system of your workplace from home, you might have difficulties. When my ISP initiated port 25 filtering, I had to adjust the configuration of my email client to use my ISP's mail server for outgoing mail, while using Vanderbilt's mail servers for incoming mail. Fortunately, the Mulberry Mail client that we use allows multiple configuration profiles, so I can easily switch between the configuration that works at home on my ISP's network and the one configured for use at Vanderbilt. (Port 25 mail filtering does not affect mail systems that you use through your Web browser, only those that use specialized mail software.)
New Tactic: Focus on Forms
Strategies such as eliminating open mail relays and filtering port 25 have closed off much of the infrastructure once used by spammers. But this hasn't necessarily reduced the amount of unsolicited and obnoxious content coming our way. The spammers continue to find new ways to exploit the system in this cat-and-mouse game.
Fill-in Web forms have become the favorite targets of spammers for the last year or so. Many organizations and individuals have moved away from providing an email address and have moved to Web forms as a means for visitors to get in contact. It's a lot harder for spammers to get usable email addresses these days. That's probably a good thing. Unfortunately, the spammers have devised methods of attack using these Web forms. These attacks go far beyond simple forms designed to send a mail message. Web forms of all types have become targets, including those associated with updating databases, online directories, wikis, and just about any online Web-based system.
In their never-ending quest to distribute content, the spammers scour the Web in search of forms that they can use to either generate email messages or to get their content or links to their content on the Web. In the same way that earlier spambots harvested individual email addresses, today's formbots feed on Web sites with forms. So if your Web site includes forms, its essential to take certain precautions.
My Web site, for example, has a Web form for sending me a message. I set this up in an attempt to reduce the spam that I get by having a link to my email address on my site. No such luck. I get just as much spam through the Web form as I do through my regular email address.
I also see some of the same kind of abuse on my lib-web-cats online directory of libraries. I provide a Web form to allow librarians to submit the information necessary to be included in the directory. These forms require the submitter to identify themselves with a name and email address, and the fields are each checked for validity. I check the entries submitted each day to avoid duplicates in the directory and to touch the data to be consistent within the directory. It's not unusual at all to see entries created by spambots that link to some of the more unseemly parts of the Web instead of to libraries.
While it's highly annoying to have to keep such a close watch on the submissions, it's interesting to see the level of sophistication programmed into these bots. They quickly learn how to fill out the forms with the required responses and to make valid selections. The URLs provided are especially dangerous. They often provide URLs with embedded scripts with undesirable effects. One version of this URL redirects a user's Web browser to another (undesirable) site just by viewing a page that includes the URL as a link.
The formbots know Web forms and their associated processing scripts inside out. They understand all the different elements of the form, commonly used parameters, and can cycle through permutations until they hit upon valid responses. Keep in mind that we're dealing with automated scripts that have infinite patience and unlimited time.
Ways of Fighting Back
If your Web site offers forms, then you too are a potential target for some of this abuse. Fortunately, there are measures that you can take to mitigate or even eliminate these problems.
The first line of defense against the formbots involves rigid checking and validation of the data submitted through the form. It's important to filter out or to reject any data submitted that may be potentially dangerous. One should routinely filter out all special characters unless there is a specific need. Most fields should be checked for reasonable length. Be wary of excessively long fields. So whether you use PHP, Perl, ASP, or some other scripting language for form processing, you might want to beef up the validation of form data.
It's especially important to pay special attention to fields related to email addresses. One of the favorite techniques of spammers is email header injection. A typical email form requests the email address of the sender. Rather than supply a simple email address, a formbot may attempt to submit data that subverts the form processor to spew out other mail messages. By injecting line feeds and additional mail headers, an unchecked email address in a mail processing form can turn your Web server into an unwitting spam generator. If your form generates email, it's essential that the processing script strip out any line feeds, check for embedded mail headers, and reject addresses that are suspiciously long.
Create a 'Captcha'
No matter how well you validate your forms, you may find that you need to take more aggressive action to curtail the formbots. Enter the "captcha," or the Completely Automated Public Turing test to tell Computers and Humans Apart. The idea involves posing questions in your form that cannot be answered by a computer, but are easily handled by a person.
One of the most common captchas involves presenting a group of letters within the form, but as a distorted graphic. The graphic is clear enough to be easily recognized by a person, but altered enough to prevent a computer from successfully performing OCR to decipher the letters. In order for the form to be accepted, the user must type the correct sequence of characters as represented in the distorted image. This type of captcha, while fairly complex to implement, can virtually eliminate abuse of Web forms by bots. Ecommerce and other sites that require a high degree of security routinely employ this technique.
The downside of this type of captcha is the difficulties it poses for those with visual impairments. The captchas stymie those who depend on screen readers or other assistive technology.
It's possible to construct a simple captcha that will be fairly effective in deterring formbots. For example, you might include in your form a question that poses a trivial problem for a human, but might be harder for a formbot to correctly answer. You can ask a simple math problem posed verbally, such as "add six and five" or other simple questions like "what color is a red rose?" Don't use the same question all the time, but cycle through them. While this approach might not be stringent enough for high-security applications, I've found it effective for less critical applications.
There's No End in Sight
The battle against the purveyors of spam and other forms of attack on the Internet continues. While we've looked at some small tricks that help in the current round, we can be sure that it's good only for the short term. The main point is to stay vigilant, monitor the threats of the day, and implement the appropriate security responses. While time and resources for library systems always seem to be stretched too thin, we can't afford to neglect important security issues.