We've been hearing a lot about a web-scale" lately. It has become one of the buzzwords of the library technology arena, and it's often used rather loosely. While the term has taken on something of a marketing bent, it also characterizes some important trends and strategies for libraries to capitalize on today's large-scale technology platforms. Though web-scale is not necessarily a term that lends itself to precise définition, it represents an important shift in the way that libraries engage with technology. Within the realm of library technologies, web-scale is a concept worth exploring as one of the new alternatives on the horizon.
What Is Web-Scale?
As it stands in the library community, the creators of a certain number of discovery products have described their offerings as web-scale. As a result, the term has also been applied to a certain class of library management products. But more than an architecture applied to specific products, web-scale represents a new alternative paradigm for libraries to operate, both in terms of their internal operations and in the way that they provide access to their collections and services for their users by leveraging current internet technologies and concepts.
As I think about web-scale things, four characteristics come to mind:
- Large-scale technology platforms
- Applications delivered through m ulti tenant software as a service
- Massively aggregated approaches to data
- Highly cooperative arrangements among participating libraries
Web-scale things combine cloud computing, highly shared data models, and expansive aggregation of library-related resources.
Web-Scale Enters the Library Vocabulary
As far as I can tell, the term web-scale was first brought into the library vocabulary by Lorcan Dempsey, a vice president and chief strategist of OCLC. In January 2007, he wrote a blog posting that noted how the term had already been used by organizations such as Amazon to describe its infinitely scalable infrastructure services such as the Elastic Compute Cloud (EC2) or Simple Storage Service (S3). Dempsey noted, " 'Web-scale' refers to how major web presences architect systems and services to scale as use grows. But it also seems evocative in a broader way of the general attributes of the large gravitational hubs which are such a feature of the current web. ..." (http://orwe blog.oclc.org/archives/001238.html).
OCLC has since taken web-scale as the fundamental concept behind the way that it provides its products and services through its global platform. OCLC's WorldCat, originally created as a cataloging utility, has spawned an expanding suite of services, such as World-Cat Local and, more recently, Web-Scale Management Services, which folds the conceptual label into its product name. OCLC clearly sees itself operating services at "web-scale."
Use of the term has gained momentum. Though dismissed by some as a gimmicky marketing term, it echoes some of the major technology trends that impact libraries today. Whether or not one resonates with the term, its underlying concepts are increasingly pervasive in the products and services available to libraries.
Web-Scale in Product Strategies
OCLC does not hold exclusive claim to the concept of web-scale. Other organizations, such as Serials Solutions, have explicitly adopted the term, initially to its Summon discovery service and, more recently, for its planned offering for library management. While Serials Solutions states that it plans to give the product a name, initially, it describes it as its "web-scale management solution." Even though OCLC and Serials Solutions clearly both embrace these core concepts, we can expect each to have distinctive and different functional approaches in their product offerings.
Forboth OCLC and Serials Solutions, web-scale stands as one of the basic principles that define their strategic product development, and they use the term explicitly. Other organizations follow many of the same principles, even if they do not necessarily use the term.
Though initially used within the library arena by these two organizations, web-scale has caught on as a general term, including a broader list of products that embrace the similar characteristics. As specific organizations take on to themselves the moniker of web-scale, there are others that would logically fit within this category. Products such as Ex Libris Primo, EBSCO Discovery Service, OCLC's WorldCat Local, and Serials Solutions Summon, all based on massive indexes, can be considered web-scale discovery services. Library management platforms in the web-scale arena would include OCLC's Web-Scale Management Service, Ex Libris'Alma, and Serials Solutions' web-scale management solution. We can anticipate other products to emerge to join these initial offerings or others to evolve in this direction.
BiblioCommons might also be considered web-scale as a discovery service for public libraries. It's implemented through mul ti tenant software as a service, incorporates social networking concepts, aggregates users across all libraries implementing the service, and addresses the full domain of content appropriate to public libraries.
It's also possible to design systems with characteristics of both web-scale and local computing. The new Alma system from Ex Libris, for example, includes a tiered data model that includes a Community Zone, embodying the shared data model typical of web-scale systems, and a Local Zone with institution-specific data.
Not all of the members of the new generation of automation products follow the web-scale approach. The open source Kuali OLE project, for example, will be initially deployed individually in the partner institutions rather than through a m ulti tenant software-as-a-service model. While the design will include some flavorings from the web-scale approach, and there may be shared implementations in the future, the basic design initially seems to focus on implementation within the enterprise infrastructure of the partner institutions.
Web-Scale as a Scalable Technology Platform
We can easily recognize that web-scale implies large-scale systems. It carries a connotation of massively large scope, size, or extent. We all understand the enormous expansiveness of the web. Web-scale uses this point of reference as we think about the qualities of services that operate within different domains, such as those related to libraries.
Web-scale also has a connection with cloud computing, though the terms are not synonymous. Any web-scale service would, almost by definition, be deployed using some kind of cloud-computing infrastructure. In practical terms, we see all of the products within the web-scale category offered through software as a service rather than as software for local implementation.
But not only does web-scale mean large, it also implies the ability to expand without constraint. By leveraging cloud-computing technologies such as software as a service and infrastructure as a service, products based on this model will be able to grow in proportion to the libraries that adopt the service, with the expansion of the scope of content managed by libraries, and the many other factors that we can expect to increase over time. It seems like all of the dimensions of what libraries do are undergoing rapid change and require technologies capable of keeping pace, unlike many of the legacy systems that have held us back relative to these challenges.
Both the terms "cloud computing" and "web-scale" find use as general characterizations, more than as precise terms. Cloud computing more correctly encompasses specific virtualized computing deployment models such as software as a service, infrastructure as a service, or platform as a service. Even older techniques such as vendor-hosted implementations of client/server systems, formerly called application service provider, have also recently been labeled as cloud computing. But this latter approach would not fit into the model of web-scale since it continues the mode of individualized systems without the emphasis on collectively built cooperative systems.
Cooperative Data Models and Workflow Management
In itself, software as a service relieves the library of the need to expend its limited resources on maintaining server hardware and operating systems. A web-scale system takes the benefit a step farther, layering on data models and workflows that enable new levels of efficiency.
Web-scale computing stands in direct contact with the traditional model of library automation based on a server housed and operated on premises, based on an isolated self-contained database. Web-scale computing offers the potential tobring together the collective efforts of many libraries to create systems more powerful than possible through many separate and independent implementations.
The new slate of web-scale products for library management generally emphasizes using collectively built knowledgebases. Rather than having each individual library separately and redundantly maintain such details as the address of vendors, it might be possible, for example, to have a shared vendor file. Libraries participating in the system can simply tag onto an existing supplier's records, with the ability to add contacts, notes, or other local details if needed. The same idea carries over to bibliographic records describing collection items. Instead of having each library maintain its own separate bibliographic database, each library tags its holdings and items onto a shared record. This approach to data could be applied to other aspects of library management. While this model comes with some disadvantages as well as efficiencies, the concept of workflows based on shared knowledgebases is characteristic of the web-scale approach.
Expansive Scope of Search
One of the main trends in the area of discovery interfaces involves the extension to an ever-wider scope of search. The first round of online catalog replacements, sometimes characterized as next-generation library catalogs, brought more modern interface techniques and more powerful relevancy-based search technologies for access to a library's local collections. Some of these products also integrated metasearch technologies to extend access to at least some of the materials represented in a library's subscriptions to ejournal databases or other scholarly content from external providers.
Web-scale evokes a more expansive view of the scope of search in library discovery products. The online catalogs of the integrated library systems or even the initial wave of next-generation catalogs address content managed directly by the local library. Anew genre of web-scale discovery solutions attempt to provide instant access to the broadest view of library content, including not only local collections but also the vast amount of materials represented in externally provided resources. A comprehensive view of library collections today consists of many components: physical print and media collections, locally created digital collections, subscriptions to ejournal s and databases, ebook collections, and selected free materials on the web such as open access scholarly journals or digital collections. Such a scale of search cannot be achieved through contacting all information resources in real time, as seen in antecedent metasearch products, but it requires gathering and indexing content of interest in advance.
In the context of library discovery products, web-scale isn't the same as the content of the web. Rather, the ideal scope involves the entire body of content provided by either the user's own library or even the collective collections of multiple libraries or libraries worldwide. Even in the broadest view, web-scale discovery products return results of content that the library has selected as reasonably trustworthy.
The general web search engines yield search results including all types of materials from all sources. While these services deliver much larger quantities of results to any given query, the searcher must be very careful regarding the reliability of the items returned. Libraries work hard to teach information literacy skills to their users, such as learning to discriminate reliable items from the results of the general search engines. With the general premise that the body of information addressed comes from reliable library-selected sources, discovery products should provide users with convenient access to higher-quality resources than they will find through general search tools.
Perspective Among Competing Alternatives
While web-scale products may be an interesting and growing segment of the library automation scene, we're at a very early stage of the development, marketing, and adoption cycles. Discovery products based on this model have gained quite a bit of traction, with a steady expansion of their underlying indexes, broader participation by the publishers and other content providers, and maturing software platforms. The competition among web-scale management products has just begun. We can expect things to heat up considerably in this arena over the next year or so.
The emergence of these so-called web-scale products does not necessarily entail a widespread or rapid decline of products based on localized computing. Not only do the cycles of change turn slowly in the library technology arena, but many libraries will continue to prefer the control and other characteristics of locally implemented automation systems. Some libraries may not have the same attraction to web-scale products as others based on a variety of factors. Over the course of the next decade, I expect to see these web-scale products gain a larger presence in the library technology arena, though I also do not expect any single approach to gain absolute dominance. The web-scale model adds an interesting and important alternative to the mix of choices available to libraries as they develop their future automation strategies.