The Data Conservancy is a growing community promoting data preservation and re-use across disciplines with tools and services.
Johns Hopkins University, specifically the Sheridan Libraries, serves as the home institution of the Data Conservancy (DC) and the location of the first DC instance. DC has been able to utilize JHU’s many academic resources, including distinguished faculty, researchers and robust data sets, to help develop a sustainable data repository infrastructure. Members of the JHU DC unit are embedded into the many different facets of the project, serving on three team-led objectives: infrastructure research and development (IRD), broader impacts (BI) and sustainability.
Our mission is to provide expertise, tools, and training to help researchers create and promote open science within their teams and institutions. Promoting these practices within the research funding and publishing communities accelerates scientific progress.
DuraSpace provided open source software, notably the Fedora Repository software, which serves as a reference implementation for archival storage in the Data Conservancy technical framework. With the help of their consultants, DuraSpace provided the DC IRD team with technical expertise and software development in support of data curation, as well as experience in the process of developing, deploying and sustaining open technologies and open source software.
ELOKA (Exchange for Local Observations and Knowledge of the Arctic) facilitates the collection, preservation, exchange, and use of local observations and knowledge of the Arctic. ELOKA provides data management and user support, and fosters collaboration between resident Arctic experts and visiting researchers.
Researchers in the astronomy, life sciences, earth sciences and social sciences domains make up the primary user communities of DC, but data in the collection will also be valuable to educators, students, citizen scientists, policy makers and others who can benefit from access to DC data. Participating in the ESIP Federation helps to ensure that development of DC systems are informed by the distributed knowledge and experience of ESIP Federation members and that DC systems interoperate with existing and developing frameworks in the Earth sciences.
The Hesburgh Libraries at the University of Notre Dame is committed to the preservation and sharing of research data. Through research, development, and community collaboration we create tools and services that can be reused by like minded institutions and contribute to local and national efforts that advance open knowledge sharing. At Notre Dame we use a model that integrates data management consultation, data curation, and the development of new technologies to serve all disciplines and streamline the research lifecycle.
IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity. IEEE and its members inspire a global community to innovate for a better tomorrow through its highly cited publications, conferences, technology standards, and professional and educational activities. IEEE is the trusted “voice” for engineering, computing, and technology information around the globe.
The Data Conservancy, in partnership with IEEE and Portico, is working to create a service to link publications with data. This service will also preserve the myriad relationships that often exist between and among scholarly and scientific publications and associated datasets. Portico, a not-for-profit preservation service, previously provided a permanent archive of scholarly literature, as well as guidance in the design of technical architecture and on the structure of operational roles, responsibilities and guidelines. Additionally, Portico was able to use its archival infrastructure as a test bed for the DC export and replication functionality.
The National Snow and Ice Data Center is an established repository for Earth science data and information related to the world’s frozen regions. NSIDC contributes its deep perspective with data management to help inform Data Conservancy designs and practices.
The Marine Biological Laboratory (MBL) led the DC’s Life Sciences working group. MBL called on its experience with the innovative use of names to improve access to and organization of biodiversity data. The group built the tools for the feature extraction framework that indexes incoming content and improves discovery of biodiversity data located in any and all data conservancies.
Cornell University is primarily involved in the activities of the IRD team and previously developed a general data framework for observations that include core attributes such as identity, date/time stamp and location and accommodate discipline-specific features. Their work created one of DC’s first pilot programs, an arxiv.org collaboration that allows data sets to be deposited into the Data Conservancy upon submission.
The University of Illinois at Urbana Champaign contributes research and education through the data curation initiatives at the Center for Informatics Research in Science and Scholarship (CIRSS). CIRSS participates on the Broader Impacts team, and DC research and development are directly informing two professional education programs: the Data Curation Specialization in the MSLIS at GSLIS and the Biological Information Specialists masters in the campus-wide bioinformatics program.
The DC activities at UCAR and the National Center for Atmospheric Research (NCAR) were conducted through the NCAR Library. The focus of their work was the DC diversity efforts, primarily through their well established SOARS and Advanced Study Programs. Their work was informed by analysis of data practices conducted by Illinois and research at UCLA.
DLSciences Work for the Data Conservancy
SEAD, a second-round DataNet award recipient based at the University of Michigan, is dedicated to the development of community data services supporting the emerging field of sustainability science. Data Conservancy’s partnership with SEAD will help to foster the development of enhancements to the current Data Conservancy code structure with the prospect of creating a more robust, sustainable data archive, particularly as it relates to connections with institutional repositories
Tessella have been influential in providing oversight to the IRD team and developing the technical aspects of the DC system development. This includes the development of system architecture, component design, interface definitions, technology and tool selection, physical data models, component development and system integration and deployment. They help to ensure that the DC infrastructure provides a comprehensive set of services and information models that meets the identified needs of the scientists.
The Center for Embedded Networked Sensing (CENS) is the locus of the Data Conservancy activities at UCLA. Team members investigated scientific practices in the field of astronomy for the purposes of identifying the data curation needs and requirements of this scientific community. The primary focus of UCLA work is a case study of the Sloan Digital Sky Survey (SDSS), the Space Telescope Science Institute (STScl) archives, and follow on projects such as Pan-STARRS and the Large Synoptic Survey Telescope (LSST).