Commercial digital libraries and the academic community: how new firms might develop new relationships between

D-Lib magazine [January 2001]


Copyright (c) 2001 Corporation for National Research Initiatives

The surge in dot com funding has brought us commercial ventures such as netLibrary [1], ebrary 2], and Questia [3], which are creating electronic libraries with collections that include tens of thousands of books. And they are growing fast: although modest by the standards of print collections, these commercial digital libraries already dwarf even the largest non-profit collections.

Whatever their technical strategies, these enterprises provide at least one enormous service. They make available in electronic form materials that are protected by copyright and controlled by publishers who depend upon sales revenues. The challenge is enormous: these new digital libraries must negotiate agreements for tens of thousands of books and then convert these materials into a useful electronic format. Nevertheless, developing a good business plan and building the library challenging as those tasks may be may prove to be the easy part. Developing a productive relationship with higher education requires more.

First, academics have grown wary of commercial publishers. Many feel that the chain from author to audience has already grown too long and expensive. The appearance of new for-profit digital libraries means that the consumers of information will be asked to support yet another organization (with its accompanying salaries and investors). Universities are more complex than publishing houses, and much as they could use extra revenue, it is by no means clear that it is in the interest of universities to mortgage their intellectual property to subscription services. The better a university, the less dependent it is upon income streams such as tuition: big universities derive their fundamental strength from augmenting their endowments and attracting research funds. Nevertheless, the ultimate source of their strength is prestige. Prestige draws gifts and grants. The prestige of academia as a whole strengthens the social contract between higher education and society. In cases where the social contract is clear and popular, the results are tremendous: projects such as the Human Genome Project and agencies such as NIH flourish because the American people understand that such research may help them live longer, healthier lives. By contrast, the National Endownment for the Humanities (NEH) was almost eliminated and the National Endowment for the Arts (NEA) permanently crippled in 1995 because the population felt (and not without some justification) that humanists and artists largely disdained the general public. The near-death experience forced the NEH and (at least some) humanists to begin rethinking the relationship of their work to society. The strategic mission of institutions individually, and of higher education as a whole, is to reestablish public prestige by demonstrating that they are contributing to society. To achieve this goal, the broader the audience is, the better.

The implications for the new commercial digital libraries are substantial. Publishers need an income stream; universities and academics, while happy to have income, fundamentally need exposure. As a rule of thumb, anything published in university presses is written for prestige. Economic pressures have forced university presses to worry more about print runs, but this pressure has, I believe, lessened our ability to communicate complex arguments to our colleagues. Insofar as the new commercial digital libraries increase exposure, they advance the interests of higher education.

Every new entrant into the commercial market needs to reflect on the problem of who owns scholarly output. Universities, in my experience, are less interested in owning their faculty's work than in making sure the faculty assign only non-exclusive rights. Although publishers hold exclusive rights to most content, that model has provoked widespread dissatisfaction. In the print world, where libraries build up permanent collections of inexpensive books, copyright is not viewed as a problem: the physical reproduction of print has been so expensive that it is simpler to buy books from a publisher. In the electronic world, however, the marginal cost of additional copies is essentially zero; when (some) print publishers use their monopolistic positions to drive up prices and create mechanisms that protect copyright by restricting the spread of information, many academics question the tradition whereby faculty transfer exclusive rights to publishers.

If push comes to shove, universities have a very strong case that they contribute the overwhelming majority of resources to scholarly publication. Consider this rough and very conservative estimate: Assume that it takes two years of labor to produce a solid academic book (this is conservative as it assumes a professor publishes a major book every six years, working part time during the academic year, full time in the summer and full time during a six month sabbatical). Even paying a minimal salary ($40,000/year before benefits), the university has invested at least $100,000 in that book. How much money does the publisher really invest with editorial advice and even modern copyediting (i.e., stylistic editing and intensive XML tagging)? Note that many of the most prestigious book series are edited by professors. Faculty advisors do the work for free or for a pittance because of the immense power and patronage they receive. The money people at universities can do the math also. Universities end up paying for the editing through their library acquisitions budgets. They are perfectly capable of reallocating that money so that they do the editing themselves and publish the books electronically.

Consider another figure. If 25,000 of the books in the original release for a commercial digital library, for example, are university press books or written by professors, then you are looking at a $2.5 billion investment of academic labor. Huge as a commercial digital library's investment may be, it is at least an order of magnitude smaller than the investment made by the university community. When we consider the cost of creating content, a strong case can be made that any commercial entity adds only a drop of value to the immense pool of knowledge fed and funded by academia.

Already, projects such as the Open Archive Initiative, the American Memory Project at the Library of Congress, the Perseus Digital Library, and others make their data freely available. These may or may not be the wave of the future, but the new commercial digital libraries need to decide how (and whether) they can survive if such projects do become the rule. What happens if all the data is made freely available and all the core software is available under open source licenses? Can they make money?

I suspect that they can. The academic world invests a substantial amount of money, through both individual and library purchases, to aquire third party information. The new digital library companies have an opportunity to explore new models of partnership with higher education the current economic and social structures certainly seem problematic. I do not know what those new models should be, but I offer as a kind of "open letter" to commercial digital libraries the following ideas for ways they might pursue productive relationships with higher education:

  • "Dot coms" must recognize that they are commercial firms and that they will be viewed as such. Always start by making it clear that you need to make money, even if you go on to stress the public benefit you hope to provide.
  • Academics are clannish and pay close attention to who starts and who runs a company. Corporate web sites that present a team of managers with few academics may attract investors but can alienate academics. Many times I have heard librarians and administrators remark that they really don't want to work with commercial firms if they can help it; if, however, a web company was founded by an academic, they take a second and much more positive look at that company.
  • Academics in general and humanists in particular are also professional cynics. They may make short-term deals based on your firm's current management team, but no responsible leader will establish a long-term relationship with you based on what she thinks of the current management team or of your personal ideals. Everyone has to assume that, if a start-up company is profitable, it will be acquired by a larger commercial concern; for this reason, no one can give any commercial digital library the benefit of the doubt on any long-term deal or assume that it won't exercise the full power of its market position as ruthlessly as possible.
  • Make it a public policy that your company seeks only non-exclusive rights. Academics want a "second source" to information. They have had plenty of experience with monopoly capitalism, and while they may accept your position as a necessary evil, in their eyes, you will be just another publisher out to make a buck. Your power will be the aggregate of materials and services that you provide. You may get market advantage from exclusive deals, but taking the high road of non-exclusivity will net you more in the long run.
  • Entering large numbers of books is "the easy part." The real issue is how to help people improve the questions they can ask. Giving scholars an on-line research library -- an electronic environment where all the books were on-line and all the citations had been converted into active links -- would be an important first step, but still only a first step. The hard part comes developing the tools people really need. Such tools tend to be domain specific and often are not obvious. Some commercial systems are beautifully organized to perform the staggering job of converting and delivering hundreds of thousands of books, but I am not sure whether any are really designed to reach the next level.
  • In structuring data, avoid proprietary formats. Use of proprietary structures will provoke resentment and resignation at the best, public outrage at the worst. Academics love open standards that's one reason why UNIX has flourished in universities. Publish your format. Make it possible for other people to create documents that conform to your service without your having to do the data conversion. If you have a brilliant new Document Type Definition (DTD) that lets you add new services, for example, the XML community will figure out what the underlying data structure is anyway. It may be that your tags will only bear fruit in the future (that's a situation we know very well, having worked with SGML for almost fifteen years) but the ill-will caused by a proprietary DTD could really damage your standing. Many academics love Adobe, for example, because Postscript is a public standard and because Adobe is pushing other open standards (such as SVG).
  • Learn from the Genome debate. There is an emerging consensus about how to balance public good and private interest, allowing people to patent some things but making sure that enough information is in the public domain to help push science and industry. The analogue in the humanities would be the following: "the primary sources are free; copyright protects the secondary sources with the notes and interpretation."
  • Make a sharp distinction between public domain data and the copyrighted data you license. Allocate 10% of your data conversion program to public domain materials and give the XML source texts out freely under something like a GNU Public License (GPL) meaning that you have to get credit and any modifications people make must be freely distributed. Academics love things like open source. The more you can do with something like the GPL, the more people will trust you and the more you will dramatize the fact that you aren't just another publisher. One member of a commercial digital library firm mentioned with justified disgust that one publisher wanted royalties for its reprint of Moby Dick. Lots of people have had the same experience use the short sightedness of the publishers to strengthen your own company. If you start with 90% protected and 10% fully open materials, you still have a huge added value to sell. And you get the advantage that people (who unreasonably want everything for free) will direct their resentment against the publishers and not your company. Every person who uses the public domain sources will see your company's name and URL. Banner ads may not prove effective for many companies, but discrete "PBS style" sponsorship links may be ideal for new commercial firms seeking credibility as well as exposure. Imagine the press that you would receive if you were able to "tithe" your data conversion. "Questia/Netlibrary/Ebrary, etc. presents the Library of Congress with 5,000 key source texts for literature and history, with promises of 20,000 more in the coming two years."
  • Make tagged sources available. However much money you invest, you can't possibly pay for the kind of careful tagging and editing that many complex sources really require. Such "level 5" tagging (to use an LC taxonomy) requires knowledge of the source materials and the subject. Even with clever software, this sort of editing is very laborious. But this sort of editing is exactly what scholars have traditionally done. There is a growing number of young academics who combine domain knowledge and technical expertise let them do the hard work and let them validate the documents (another reason to publish your formats!). Make fully tagged sources available, however, individuals are going to be less likely to add value to your e-texts if they know they are putting back elaborate tags that you have stripped out.
  • In general, look for places where you can help institutions as well as students. A lot of the most expensive tasks that drive up overhead and eat up budgets could, in a world of fast networks, be outsourced. Library functions are only one example.
  • Your investment in technical infrastructure is very important. Even in the most radical open source community, people accept the notion of paying for better service. The Perseus Digital Library, for example, has many strengths but we are outgrowing our infrastructure. Thus far, we have been able to add capacity in tiny ($5-10K) increments. The next jump probably reflects a $500K investment. Furthermore, a free site such as ours cannot guarantee 7*24 service. People use the site at their own risk with no guarantees.
  • Provide access to expert librarians. My own institution, for example, is a major university, but we have only one humanities librarian. It is impossible for him to keep abreast of everything going on in the humanities. If I want to find out if there are microfilms of nineteenth century British newspapers, he is not much better placed to answer this question than I am. If I had access to a librarian specializing in 19th century British history and literature, the answer would be authoritative and fast in coming. Collect a group of domain experts to provide authoritative information. They could do so by e-mail and phone; perhaps better still might be having queries be public so that everyone could see the questions and answers. You can collect domain experts very quickly and immediately provide a service far more expert than all but the strongest schools have. Thus, even before you have the on-line books, you can have on-line reference librarians to level the playing field between the small schools and the great universities.
  • If you have particular expertise at structuring information, offer this as a service. Projects such as mine can get basic XML tagging from India and China, but the important work the structure that really adds value to the document has to be done in-house. Real tagging (like cataloguing) requires serious expertise and (for now at least) benefits greatly from an investment in customized software tools. OCLC has done pretty well selling catalogue records. Creating XML is even harder. The key is being able to draw on expertise not available in a data entry firm off-shore. I know how much we pay for outsourcing my data entry. You ought to be able to get great rates given your volume. Why not turn this into a profit center? My university, for example, is developing the infrastructure to convert its publications into high quality XML. You could probably do this more efficiently than we could in-house. Even after you have entered every back list book you want, you will have plenty of documents coming through. This industry plays to the insecurities and aspirations of the universities. They want to own the data. Let them own it. Come to us and we will structure your documents for you. You can then put them up on your own sites. We will, however, also offer added value services that people can pay for and, if we do the work, then we guarantee that the data will derive the maximum benefit from our system. If someone else can do better with their software, more power to them. Ultimately, publishers may have to go this route anyway. Why not get there first? And helping universities maintain non-exclusive rights to their intellectual property would make you heroes in many circles.
