OCLC's stance on linked data and open data has become a key topic of interest for libraries. OCLC has been involved with the Semantic Web and linked data as research projects for several years. It has released open data licenses for some of its ancillary products. Where the massive WorldCat database, OCLC's prime strategic asset, fits into plans for open data remains a pressing concern for librarians. Through a series of actions, OCLC has demonstrated a progressively more open approach to the information assets under its stewardship.
OCLC Steps into the Realm of Linked Data Linked data and the degree to which it should be openly available is not an entirely new area of interest for OCLC. Since about 2009, the organization has made a number of forays into exposing information assets as linked data, with such initiatives as dewey.info, FAST, and VIAF.
Dewey.info was launched as an experimental linked data service in August 2009, applying the principles of linked data to the Dewey Decimal Classification scheme. OCLC acquired Dewey Decimal Classification from the Lake Placid Education Foundation in 1988. The dewey.info service exposes each of the Dewey concepts as RDF triples available through an URI and as Web pages. OCLC makes the data underlying dewey. info available under the Creative Commons AttributionNon-Commercial-NoDerivs 3.0 license, meaning that reuse is allowed with attribution for non-commercial purposes and that no derivative works can be created based on the data (see: http://creativecommons.org/licenses/by-nc-nd/3.0/).
In December 2011, OCLC released the Faceted Application of Subject Terminology (FAST) as an experimental linked data service under the Open Data Commons Attribution License. The FAST authority file was created through collaboration between OCLC and the Library of Congress. FAST derives from the Library of Congress Subject Headings, breaking the long pre-coordinated strings into simpler terms better suited for discovery services and other interfaces that employ faceted navigation.
The Virtual International Authority File, or VIAF, recently became a service under OCLC's sole stewardship under an open data arrangement. The VIAF aggregates a variety of international name authority files and makes it available through a Web-based service with an underlying linked data model. VIAF has been underway since April 1998 as an experimental project, initially including the Library of Congress, the German National Library (Deutsche Nationalbibliothek), and OCLC. These organizations formally joined as the VIAF Consortium in 2003, which has since expanded to include National Library of France (Bibliothèque nationale de France). More than 22 organizations currently contribute to the project. In September 2009, through an initiative of OCLC Research VIAF was released as linked data.
In April 2012, VIAF transitioned to become a production service of OCLC, with the understanding that it would continue to be open and freely available under the Open Data Commons Attribution License (ODC-BY), which includes the freedom for others, including commercial organizations, to share, create, and adapt the data with the provision that the original source be attributed (see http://opendatacommons.org/ category/odc-by/). With this transition, OCLC's operational activities in VIAF moved from its Research division to the groups within the organization responsible for production-level services.
WorldCat Rights and Responsibilities Guidelines and Open Data Licenses
Currently, WorldCat, the organization's top strategic asset, remains governed by the “Rights and Responsibilities for the OCLC Cooperative” that outlines the policies regarding the use of records in WorldCat. OCLC's policies regarding the ability to share WorldCat records have long been controversial. OCLC constituted a Record Use Policy Council in September 2009. The council created the current Rights and Responsibilities guidelines through a process that included opportunities for the broader OCLC community to provide input. Among other issues, the Rights and Responsibilities document, which took effect in August 2010, aims to advise libraries on sharing records outside of the OCLC membership. OCLC positions the Rights and Responsibilities document as reflecting the norms for its membership rather than as binding contractual terms. Yet, in a time when the library community resonates with the principles of open data, the guidelines of use stated in the Rights and Responsibilities policies remain a source of creative tension (http://www.oclc.org/worldcat/ recorduse/policy/default.htm).
One action not necessarily endorsed under the Rights and Responsibilities policies is representing a library's collection as a public release of WorldCat records that it did not create. A library has claim to original records it produces directly. However, WorldCat records that represent its holdings but were created by other OCLC members are subject to the Rights and Responsibilities guidelines.
The National Library of Sweden Chooses Not to Join OCLC
In December 2011, the National Library of Sweden announced that it would not participate in WorldCat, specifically citing the constraints implied through the Rights and Responsibilities policies as in conflict with its strategy to openly share data. Libraries that participate in Libris, the Swedish National Union Catalog, expect to be free to take their records out and place them in other systems, a scenario that might be problematic if those records were derived from WorldCat. Participation in Europeana, for example, requires the CC0 (Creative Commons zero) public domain license inconsistent with OCLC's recommended policies. OCLC reports that there are continuing discussions with the National Library of Sweden on this issue.
Jim Michalko, Vice President, OCLC Research Library Partnership, wrote a subsequent blog post addressing this issue, stating that while the free public release of records would be inconsistent with the guidelines of the WorldCat Rights and Responsibilities, libraries ultimately can follow their own discretion in how they release their records to other organizations. It mentioned that licensing options that OCLC has followed for FAST, the Open Data Commons Attribution License, could be applied to how libraries release their own cataloging data that involves WorldCat. According to a comment on a blog posting by Eric van Lubeek, OCLC's Managing Director EMEA: “…on 18 April the OCLC Global Council in order to advise the OCLC Board of Trustees passed a resolution endorsing the Open Data Commons Attribution License (ODC-BY) as consistent with the WorldCat Rights and Responsibilities for the OCLC Cooperative.”
Harvard University Releases Complete Bibliographic Database
In April 2012, Harvard University exercised its discretion and released its entire bibliographic database, including over 12 million catalog records from its 73 libraries under the CC0 public domain license. This action, consistent with Harvard's mandates regarding open access to its creative output, goes beyond the Open Data Commons Attribution License that OCLC mentions as consistent with WorldCat Rights and Responsibilities. Harvard did work with OCLC as they considered the terms in which the data would be released and agreed to recommend to those that use the data to give attribution and to follow the community norms consistent with OCLC's guidelines, even though the CC0 does not require those actions. Harvard released its bibliographic records as a simple repository of MARC records, not as linked data.
EBSCO Makes Commercial Use of Harvard University Bibliographic Records
Less than two weeks after Harvard's release of its catalog records, in May 2012, EBSCO announced that these records had been incorporated into the base index of EBSCO Discovery Service. Libraries that subscribe to EDS would have access to these records, an example of the commercial use allowed under the CC0 public domain license.
Ex Libris to Incorporate Harvard records into Alma
Ex Libris has also announced its intentions to make commercial use of Harvard's bibliographic records. The company will incorporate these records into the Alma Community Catalog, a repository of data shared by all of the organizations that use the company's next-generation automation platform. Alma follows a hybrid data model. Its Community Zone includes resources such as the Alma Community Catalog, where participants can associate their holdings with shared records. Its Local Zone is for resources managed separately by an institution. Ex Libris will also index Harvard's Digital Access to Scholarship (dash.harvard.edu), an open access repository of articles based on the institution's research. Harvard University makes the metadata associated with DASH available under CC0, though it restricts the articles themselves to noncommercial personal, teaching, and research use. Harvard requests, but does not legally require, attribution for use of the DASH metadata. It also requests that any derivative works be made available under the same terms.
A Linked Data Evangelist at OCLC
Richard Wallis, a longtime proponent of the Semantic Web and linked data recently joined OCLC in April 2012 in a position titled as Technology Evangelist. The arrival of Wallis comes at a time when OCLC has been increasingly involved in the exploration of linked data relative to its strategic information assets and as interest in the Semantic Web and linked data becomes more prevalent in the broader library arena. Wallis is expected to help communicate OCLC's activities in these issues both internally and to the library community and to help the organization explore and navigate its strategic path in this area.
Wallis departed Talis in January 2012 to establish a consulting company named Data Liberate. He will continue to be based in Birmingham in the United Kingdom.