It is now commonplace for libraries to subscribe to numbers of electronic resources, typically accessed through standard web browsers from the Internet or local servers. Electronic Resources may include Citation databases, internet-accessible full-text journals, other databases or other internet sites. Managing access to these resources poses several problems.
The most widely discussed of these is the problem of changing URL either due to relocation of the resource by the provider, or due to a change of provider. This is a particular problem when the same URL might be referred to in several places within the organisation, for example in multiple Web pages, and in the library's catalogue: a change to a URL can require many changes locally, with the risk that some references will be forgotten. One solution to this problem which has been received wide support is the use of PURL, or Persistent URL service. This takes care of the problem of providers changing their URL, but not that of the library changing providers. This will become a major issue as more resources are provided through "aggregator" services.
A third problem arises when the user is required to use different URLs depending on the user's location: one URL for local intranet users, another for users connecting from outside of the intranet.
These problems have been addressed at this library by the use of a CGI script, "director". The discussion of this script will form the remainder of this paper.
Director works by hiding the actual access URL for a resource from the user. The user will be given a URL for a resource which points to the director script, including a single parameter which identifies the resource required. The director script uses this ID to look up a table of resources. The resource table entry will contain information which director can use to construct the actual URL, which it then returns to the user's browser as an http Location: response.
At this library, the URL for a resource will look similar to this:
where 'CC' is the ID code for a particular resource (in this case "Current Contents". Director will use the code 'CC' to look up its Resource Description Table. This table reveals that the resources is accessed through the http protocol, using the host name 'melba.unilinc.com.au/ovidweb/ovidweb.cgi', and from this information the script returns the http header
which redirects the user's browser to the required address.
We will now take a more detailed look at how the script works:The Resource Description Table
The core of director is the Resource Description Table (RDT), which is used to define individual resources. This table is a Hash, meaning that a key (the resource ID) is used to locate the resource entry in the table. This makes for a very efficient lookup.
ID => [ type, param1, ... ],
where 'ID' is the unique identifier for the resource, 'type' defines the category of resource, and 'param1, ...' defines a series of parameter values to be used in constructing the URL for that resource.
For example, the entry for the example used above (Current Contents) looks like this:
CC => [ 'http', 'melba.unilinc.edu.au/ovidweb/ovidweb.cgi' ],
The 'type' is 'http' (which is the trivial case), and the parameter (only one in this case) is 'melba.unilinc.edu.au/ovidweb/ovidweb.cgi'. The ID is 'CC', which is the value indicated in the original URL.
The Type Table
The Type Table provides director with a template for constructing the URL for each resource of a given type. (In Perl terms, the template is just a format string for the printf statement.) The format of a Type Table entry is (very simply):
type => 'template'
An example, for the 'http' type, is:
http => 'http://%s'
where each instance of "%s" in the format string is replaced by a parameter from the Resource Description Table entry. Note that in this case, the string 'http://' can be omitted from the RDT parameter, since it is already contained in the template. Only variable data needs to be supplied in the RDT.
Becaause director is written in Perl, it becomes quite easy to accomodate special cases, simply by writing additional code to cater for them. This is a major advantage of director over other approaches.
With the example mentioned in the introduction, we wished to direct the user to different addresses depending on their browser type, version and platform. The script detects the resource type ('webspirs') as a special case, for which it chooses different templates based on the user's browser type, version and platform. This information is provided by the browser as part of the standard CGI request data, in the User-Agent string.
So, a RDT entry for one of our ERL databases might look like this:
SIAL => [ 'webspirs', 'serials', 'serials' ],
And director will use one or other of these templates according to the user's browser, version and platform:
webspirs3 => 'http://digital.library.adelaide.edu.au/cgi-bin/webspirs.cgi?sp.username=%s&sp.password=%s', webspirs4 => 'http://digital.library.adelaide.edu.au:8590/?sp.username=%s&sp.password=%s',
The URL seen by the user for this example would be:
With the current version, the RDT is generated with inline code -- meaning that the Hash table has to be built by Perl each time it is run. This is not a problem now, but with a larger number of resources this may become inefficient. A solution would be to pre-build the table into a DBM file, which would be opened and read by director.
Using a DBM file would have another benefit, in that it would make it possible, through additional scripts, for staff to maintain RDT entries, for example through another CGI interface.