What steps can the library take to improve the discoverability of items in the institutional repository?
Libraries operating digital repositories can take several steps to ensure that the content can be easily found through the various search engines. Some of these tasks are related to general search engine optimization that would apply to any website, while others are related to the broader ecosystem of institutional repositories.
Search engine optimization techniques include tasks that ensure that each unique resource on a website can be identified and indexed by Google, Google Scholar, Microsoft Bing, Microsoft Academic, and others likely to be used by students and researchers. Even in the academic arena most literature searches begin with the general web. Search engine optimization can increase access to items in an institutional repository. Some of these techniques include:
- Verify that the robots.txt file does not impede access to the automated bots used by the search engines, as identified by user agent signatures. Although some user agents may need to be excluded to block bots that harvest content aggressively or that may download and republish content to unauthorized sites, the major search engines follow standard practices and should not burden web servers or degrade performance. The robots.txt file can optionally specify the location of a sitemap.
- Generate and update a comprehensive sitemap of all unique resources held in the repository. Sitemaps must follow the XML syntax as defined for the protocol.1 These sitemaps improve the efficiency for search engine indexing, though most search engines will also crawl through all links presented in addition to those systematically listed in the sitemap. Including the link to a resource does not guarantee that it will be harvested and indexed.
- The repository should offer a unique landing page for each abstract or full text document.
- Each page delivered by the repository should embed machine-readable metadata to enable detection and indexing of resources. This structured metadata represents the citation details and unique identifiers for the resource, improving the ingestion by Google Scholar, Microsoft Academic, or other scholarly search tools. Google Scholar suggests presenting the embedded metadata following the style developed jointly with Highwire Press.2 The metadata should include all authors, appropriate keywords or subject tags, and original date of publication. The date the item was uploaded to the repository should not be confused with the official publication date.
- Any embedded metadata must be consistent with the visual presentation of the page. Even though structured metadata improves the quality and efficiency of automated indexing, Google fundamentally indexes what human visitors to the website see. Search engines may penalize or disable indexing for sites with inconsistent or misleading metadata.
Most repository platforms should offer configuration options consistent with search engine optimization. DSpace, for example, includes tools to automatically generate robots. txt and sitemaps.3 Institutional repositories based on Fedora, TIND IR, CONTENTdm, or Digital Commons will each have their own procedures for search engine optimization and content syndication.
In addition to search engine optimization, libraries may also take additional measures to ensure that the server is well positioned in the ecosystem of open access repositories. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), enables a repository to systematically contribute metadata to relevant search services. OAIster, originally developed and deployed by the University of Michigan, enables repository items to included in a broad index of open access content, available at oaister.worldcat.org.4 This service was acquired by OCLC in 2009.
Libraries can also add their open access repository content to the Unsub database maintained by Unpaywall, which offers a popular browser plug-in that helps researchers gain access to open access copies of scholarly articles. (See https://unpaywall. org/sources to register a repository). Additional services that facilitate access to open access content include:
- OpenDOAR (https://v2.sherpa.ac.uk/opendoar),
- Directory of Open Access Journals (https://doaj.org),
- CORE (https://core.ac.uk).
Items in an institutional repository can also see increased use when they are included within the library's discovery service or online catalog. The major discovery interfaces, such as VuFind, Blacklight, Primo, Summon, and EBSCO Discovery Service, support the use of OAI-PMH to extract and maintain metadata from a local repository for inclusion into the library's main search index. Items found through the discovery interface would link to the full text of the item in the repository. Inclusion in the default library search tool may be the most effective way to increase use, especially when the content of the repository includes items of local or institutional interest. Most users may not think of visiting the search interface of the repository directly.
Search engine optimization and syndication of data with relevant search services can improve the mechanics of discoverability of the content held in an institutional repository.
More fundamental issues include the nature of the content and the quality of the metadata. The impact of efficiently propagating metadata will be greatly inhibited if the items are not thoroughly described. Sparce or inconsistent metadata will impede discoverability when using the repository's search interface and on the external services that rely on that metadata. Most of all, the interest of the content will drive increased use of a repository. If the repository holds secondary copies of items held in more prominent destinations, placement in search results may be weakened. As with any other content resource, the key challenge concerns aligning content with the targeted audience. Libraries face obstacles in attracting high-impact content and producing rich metadata. The technical tasks in improving discoverability can be solved more easily. Notes
- “Sitemaps XML format,” sitemaps.org. https://www .sitemaps.org/protocol.html
- “Inclusion Guidelines for Webmasters,” Google Scholar. https://scholar.google.com/intl/en-us/scholar/inclusion .html#indexing
- “Search Engine Optimization,” Conf luence. https:// wiki.lyrasis.org/display/DSDOC5x/Search+Engine +Optimization
- “The OAIster database,” OCLC. https://www.oclc.org/en /oaister.html