Solving the Data Management Mandate

by Jane Beitler, NSIDC

Ian H., a postdoc in biological sciences at JHU, seems a little out of his element as he looks over the criteria for the data management plan that he must submit with his research proposal. But across the desk, Betsy Gunia, a data consultant with JHU’s new Data Management Services (DMS), is prepared to help him sort through the issues to meet this requirement.

Starting January 2011, proposals to the National Science Foundation (NSF) now must include a data management plan. The new requirement spurred JHU to implement a cross-campus service to help researchers comply. JHU was able to jump-start their program by tapping the knowledge and the initial software stack developed by the Data Conservancy.

Figure 1. Graduate students at a biological research station collect data on meadow plant populations in an alpine valley. Biological data such as these may be of interests to other fields, such as climate studies. How will those researchers discover and access these data?

Driving data management

More broadly, NSF and institutions like JHU are working together to ensure that the data generated by today’s data-intensive research can be fully tapped to support scientific progress. The curation of scientific data is seen as a means to collect, organize, validate, and preserve data so that scientists can find new ways to address the grand research challenges that face society.

Data management ensures that the valuable products of research are not invisible or lost to future inquiry. “We think of the data as assets for the institution,” said Barbara Pralle, who heads the JHU DMS. Data may stand as proof of the researcher’s work and methods. Climate or astronomy data may capture a snapshot in time that future researchers may see as a record of change or evolving processes. Interdisciplinary studies may also depend on access to data generated by specialists working in several fields.

For the individual researcher, making a road map for data management is not really possible unless roads actually exist. The facilities, expertise, and methodologies for research data management have yet to become standard fixtures of the academic infrastructure, like research computing or libraries.

The JHU DMS is a realization of the need for such a data management infrastructure. Launched July 1, 2011, it had already served 36 researchers by year end 2011. The DMS supplies both the expertise and the practical means to accomplish data management.

“We offer in depth data management planning, and can actually help them prepare and transfer that data into our JHU Data Archive,” Pralle said. “Part of that service includes holding data for a defined period of time, until it is determined what is going to happen to it—will it stay in our archive, or be transferred elsewhere.”

Putting it in practice

The new NSF requirement for data management plans recognizes that the foundations for good data management are laid before research even begins. Gunia said, “We basically go over three topics with the researchers. First, how are you planning to manage your data while you are conducting your research? We call this operational data management.” Data that are poorly organized, poorly described, or inadequately stored during or following research may become lost or useless over the long term.

“Secondly, how are you planning to manage your data after your research is complete?” Gunia continued. “And third, how are you planning to share your data with other researchers and the general public? We look into practices in their field—such as metadata standards, and data repositories that would be available to them.”

While some fields may have access to discipline-specific repositories, frequently no designated archive exists. Gunia said, “They can use the JHU Data Archive if no data repositories are available—and if the data merit archiving.” Gunia asks questions to help researchers think about the need for archiving. “Your data management plan is going to be reviewed by your peers in the proposal stage,” Gunia said. “So if the norm in your community is to share your research data openly, and you do not state that you will do so in your data management plan, your peers may judge your data management plan poorly, which will impact the overall quality of your proposal.”

If your peers would think you should archive it, then you should write that in to your data management plan, because you will have to share your data with your peers.”

These basic questions lead to a discussion of specific plans to be undertaken before, during, and after data collection. Gunia said, “PIs often have never explicitly thought about the different types of data they produce, or even what is data, or what else can be done with the data. We talk to the PIs about metadata, which describe the data. A light bulb will go off over their head, hey, that could be useful to another group. If I take this one step of adding metadata to a file, then other people can use it in different ways.”

The resulting data management plan addresses not only where data are to be archived, but also how they will be financially sustained. Pralle said, “When data are to be archived in the JHU Data Archive, it is for a defined period of time. It is scoped according to our financial model, and then included in their proposal as an item for the grant to fund.” Typically, this means holding the data five years post project. “Well before those five years are up, we work with the researcher to determine the next step. This might mean renewing for another five years. That’s likely to be a different fee structure,” Pralle said.

A blueprint for data management

The JHU Data Archive is an instance of the Data Conservancy System software, which is especially well suited for a campus-wide service. The Data Conservancy suite of applications and services, developed collaboratively by several institutions including JHU, is discipline agnostic. This enables Pralle and Gunia to work with researchers from fields as diverse as social, behavioral, and economic sciences, engineering, mathematics, physical sciences, and computer and information sciences. “It’s rare to be working across the disciplines like this,” Pralle said. “I don’t know of any other data archive that is supporting such a wide range of domains.”

The architecture of the Data Conservancy System also ensures that data are not locked in discipline-specific silos. Today’s big science questions may require data across multiple fields, but trying to search across data from multiple disciplines can result in an apples-and-oranges problem, such as a lack of equivalent terms or parameters. To solve this problem, the DCS uses an approach that places importance on data over documents. As a result, the DCS “feature extraction” allows data from multiple projects to be brought together and queried through spatial, temporal, and taxonomical data structures. “The way it searches digital objects is incredibly sophisticated,” Gunia said.

The DCS also comes with technologies that electronically share information on its holdings with other data systems, based on leading metadata and interoperability standards. By thus exposing its holdings, the DCS can facilitate collaboration, enabling researchers to find somebody else’s data products and assess the applicability of those data to their research.

As of this writing, The Data Conservancy system is operational and has several active instances, though development continues. Pralle said, “We’re taking advantage of the early core system, and are also contributing to further development of the system. We are engaged in the planning process, and we are also providing a software developer to the team.”

Figure 2. This block diagram shows the conceptual elements of the Data Conservancy Service, right, and the elements of the Open Archival Information System, a standard to which the DCS was mapped. The Data Conservancy’s architecture provides a flexible, interoperable, framework that can evolve and be sustained over the long term.

Just add people

By solving the fundamental technical and conceptual problems of data management, the Data Conservancy System can help research institutions quickly gear up a sustainable, well designed, and interoperable practice for managing their research data. DCS provides the blueprint, the systems, and knowledge, while the institution supplies the people and education to result in a successful adoption.

At JHU, the launch of the DMS is gaining momentum. Pralle said, “We have engaged in marketing and communication campaigns across the institution. We’ve also communicated with the deans of schools that receive NSF funding and our colleagues in research administration. The latter are steering PIs to us, reminding them that this service exists. We’ve gone out and met with department administrators who are aware when proposals being prepared; we’ve gone out to our library colleagues. It’s a timing issue, trying to catch that individual at the point when they need the service.”

To learn more about the JHU Data Management Services, visit http://dmp.data.jhu.edu.

For more details on the Data Conservancy System, its features, services, and implementation, read the detailed Data Conservancy Blueprint document.

Data Conservancy

Solving the Data Management Mandate

One Response

Leave a Reply