BLOOMINGTON, Ind. -- Indiana University's Data To Insight Center will lead a $600,000 grant from the Alfred P. Sloan Foundation to fund the first investigation of non-consumptive research for a major mass digitized collection of content. Partners with D2I on this include the HathiTrust Research Center (HTRC) and the University of Michigan's Department of Electrical Engineering and Computer Science.
"This funding will enable us to pursue a research track around non-consumptive research uses of the HathiTrust digital corpus," said principal investigator Beth Plale, professor in the IU Bloomington School of Informatics and Computing and director of the Data To Insight Center. "At the end of the project we expect to have cyberinfrastructure in place that successfully demonstrates that non-consumptive research can be carried out safely under the conditions of unintended malicious user algorithms."
Non-consumptive research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. Rather than reading the material, researchers use specialized algorithms to analyze text as a massive data set and the Sloan grant will help ensure the work can be conducted in a secure environment.
In some cases, HTRC would own the algorithms used by researchers, so HTRC needs to examine the security requirements for users, the algorithms and the data, all within the context of using the suite of algorithms available in the Software Environment for the Advancement of Scholarly Research (SEASR).
In other cases, the researcher would own and submit their own algorithms for use and the Sloan Foundation funding will be used to create what Plale called a "data capsule framework" prototype that would allow the scholar the freedom to experiment with new algorithms on a huge body of information, but with technological "trust but verify" mechanisms in place to confirm compliance with non-consumptive research policy.
Without taking into account the actual content of materials, researchers using their own complex algorithms might analyze such massive data sets for anything as simple as repetition of words to complex linguistic structures or the evolution of word usage over a range of time, space or even demographic class.
The HathiTrust repository contains almost 8.6 million digitized volumes, and about 2.2 million of those -- roughly 26 percent -- are in the public domain and currently available for non-consumptive research.
The model for implementing non-consumptive research is founded on a principle of trust but verify, where the researcher should generally be trusted to do the right thing and be given the freedoms to carry out creative research, but with mechanisms in place to ensure good behavior and adherence to rules. The security aspects of the project leverage research by Atul Prakash of University of Michigan, also a principal investigator on the project with Plale.
Leveraging cyberinfrastructure at Indiana University, including FutureGrid, and at the University of Illinois at Urbana-Champaign, the HTRC will provision a secure computational and data environment. "This collaborative cyberinfrastructure test-bed will serve as a proving ground for our research agenda around non-consumptive uses of the collection," said Robert H. McDonald, associate director in the IU Data to Insight Center and another principal investigator on the project.
"In defining new methods of non-consumptive research of the HathiTrust digital corpus, the HathiTrust Research Center and the IU Data to Insight Center are enabling research faculty and the HathiTrust partner libraries to engage in groundbreaking new research across the corpus while maintaining the security and integrity of the collection and the researcher's fair-use access to its content," said Brenda Johnson, Ruth Lilly Dean of Libraries at Indiana University.
For questions about the HathiTrust Research Center and its Non-Consumptive Research Agenda contact Beth Plale at 812-855-4373.
The HathiTrust was created in 2008 through a partnership with the 12-university consortium known as the Committee on Institutional Cooperation (CIC), the 11 university libraries of the University of California system and the University of Virginia. Since that time HathiTrust has grown to encompass the research libraries of more than 50 institutions. HathiTrust was built to enable libraries a means to archive and provide access to their digital content, whether scanned volumes, special collections or born-digital materials. Preserving materials for the long term has long been a mission and driving force of leading research libraries. Their collections, accumulated over centuries, represent a treasury of cultural heritage and investment in the broad public good of promoting scholarship and advancing knowledge. The representation of these resources in digital form provides expanded opportunities for innovative use in research, teaching and learning, but must be done with careful attention to effective solutions for the curation and long-term preservation of digital assets.
About the HathiTrust Research Center
The HathiTrust Research Center is a new collaborative launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, that is working to better understand security implications and computational use of the HathiTrust Digital Library. The HathiTrust Research Center (HTRC) will enable access to published works in the public domain (as well as limited access to works under copyright) stored within HathiTrust.
About the Data to Insight Center
The Data to Insight Center (D2I) undertakes research to harness the vast stores of digital data being produced by modern computational resources, allowing scientists and companies to make better use of these data and find the important meaning that lies within them. D2I creates tools and visualizations for working with very large data sets, develops methods to ensure data provenance (quality and authenticity), and builds methods for listing and discovering data sets. D2I is part of the Pervasive Technology Institute (PTI). Funded by a $15 million grant from the Lilly Endowment, Inc., PTI is dedicated to the development and delivery of innovative information technology and policy to advance research, education, industry, and society.
About the University of Michigan Department of Electrical Engineering and Computer Science
The University of Michigan's Department of Electrical Engineering and Computer Science (EECS) charts new ground in space communications, quantum computing, high-speed lasers, low-power computing, network security, biomedical and environmental sensors, learning technology, and everything in between. EECS faculty are at the forefront of new and emerging technologies, and train our students to compete with the best in today's global society.