The Data Conservancy (DC) has recently collaborated with Data Rescue Boulder to complete a data rescue pilot. Through our conversations, both groups realized that there is an important opportunity to develop a model for data rescue collection development.
Data Rescue (DR) Boulder has operated at an impressive scale. As of May 1, DR Boulder has rescued over 12 billion research data records resulting in 6 PB and nearly 3 million datasets on their servers. DC has been developing archiving and preservation capabilities for institutions, particularly libraries, for diverse data in terms of semantics, disciplines, etc. DR Boulder recognized the need for sustained, institutional infrastructure and DC offered the tools and capabilities to transfer, describe, archive and preserve the data within an institutional repository context. Using software components from DC and the Open Science Framework (OSF), we have been able to demonstrate a pilot for coordinating data collection development and transfer guided by faculty input.
Associate Professor Ben Zaitchik from the Department of Earth and Planetary Sciences (EPS) at Johns Hopkins University identified a subset of the published data from DR Boulder that were relevant to his research and teaching.
For the pilot, we organized the subset of data into an OSF project: https://osf.io/grhz7/. We will continue to add more data as our pilot progresses, based on feedback from Ben who has indicated that he will evaluate the utility of organizing the rescued data within this OSF project as compared to accessing the data through the original data sources.
Once we identified the data, we implemented two different workflows to demonstrate how data could be brought into a Fedora institutional repository. The slides and links to videos below outline the workflows, as well as the software and services supporting them:
- Fedora API-X (https://wiki.duraspace.org/display/FF/Design+-+API+Extension+Architecture)
- RMap (http://rmap-project.info)
- DC Packaging tools
- Fedora repository
Data Curation through OSF
Data Curation through DC Packaging Tool
Connecting Data with RMap
We have created the following video demonstrations of the workflows, which are available on the Data Conservancy YouTube channel:
- Using a Fedora institutional repository to preserve rescued data in OSF project (https://www.youtube.com/watch?v=gYnYWDcY8Vs)
- Data curation, archiving, and access with the Data Conservancy, Fedora and OSF (https://www.youtube.com/watch?v=TQZRMTWtfKc&t=2s)
- RMap and OSF integration: linking rescued data to its original sources and other repositories (https://www.youtube.com/watch?v=lr7eL1vJqJo)
It is important to note that the work represented within the slides and links below was a pilot effort. We offer it as a model for consideration. We believe it addresses many of the issues that have been identified by both Data Rescue and Data Refuge. But it is not a completed solution. We need help from software developers to complete the technical work. We need UI/UX experts to refine our tools. We need project managers to coordinate work with multiple organizations. Most importantly, we need other libraries to engage faculty and to create use cases that identify subsets of data and uses of data.
We have documented our processes and lessons learned through the following OSF project: https://osf.io/d8tnq/
We welcome your feedback and, more importantly, your participation!