Library Technology Guides

Current News Service and Archive

Press Release: OCLC [August 14, 2023]

Leveraging machine learning technology as part of ongoing WorldCat quality measures

OCLC Metadata Quality teams implement a variety of measures--both manual and automated--to improve the quality and usefulness of WorldCat data. These extensive and ongoing efforts ensure that WorldCat data supports the needs of our membership and our global network of thousands of libraries across a wide range of services. As the technologies and tools that allow us to do this important work evolve, we are continually exploring new methods for enriching, repairing, and de-duplicating WorldCat records--data that powers the global discovery and sharing of library resources.

Cleaning up duplicate records is one of the most impactful ways to improve the quality of WorldCat. Manual efforts by metadata professionals--and technology like our duplicate detection software--have had significant success in reducing the number of duplicates. And now we're leveraging machine learning to accelerate that progress.

In December 2022, we invited the cataloging community to participate in a data labeling exercise to validate our machine learning model's understanding of duplicate records in WorldCat. During the subsequent four and a half months, 336 total users labeled more than 34,000 "possible duplicates" using a simple, intuitive, online interface. Thank you to every individual who participated in the project--your collaboration helps advance the profession and the mission of libraries worldwide.

Leveraging the data we collected, we can now better scale the resolution of duplicate records in WorldCat, saving countless hours of time and improving the experience for the entire library community.

We will soon implement the machine learning model as part of our ongoing efforts to mitigate and resolve duplicate records in WorldCat. On 19 August 2023, an initial run of one million records, 500,000 pairs, will be processed through the machine learning algorithm. This will result in 500,000 duplicate record merges in WorldCat, which will improve cataloging, discovery, and interlibrary loan experiences for both library staff and end users.

The initial run will include only records for print books published in English, French, German, Italian, and Spanish. We recommend that libraries not using WMS enable WorldCat updates in Collection Manager to ensure you receive the updated OCN for held records that were merged. If you suspect an incorrect merge, report it to bibchange@oclc.org. WorldCat Metadata Quality staff can view the history of merged records and recover them if needed.

For additional information about the project and using the machine learning model to merge duplicate records in WorldCat, please read our Hanging Together blog post.


Summary: OCLC Metadata Quality teams implement a variety of measures--both manual and automated--to improve the quality and usefulness of WorldCat data. These extensive and ongoing efforts ensure that WorldCat data supports the needs of our membership and our global network of thousands of libraries across a wide range of services. As the technologies and tools that allow us to do this important work evolve, we are continually exploring new methods for enriching, repairing, and de-duplicating WorldCat records--data that powers the global discovery and sharing of library resources.
Publication Year:2023
Type of Material:Press Release
LanguageEnglish
Date Issued:August 14, 2023
Publisher:OCLC
Company: OCLC
Permalink: https://librarytechnology.org/pr/29115/leveraging-machine-learning-technology-as-part-of-ongoing-worldcat-quality-measures

DocumentID: 29115 views: 1020 Created: 2023-08-14 12:10:02 Last Modified: 2024-11-29 12:43:07.