An Op-Ed Piece about Data Rescue and Libraries by Sayeed Choudhury

We recently posted a piece about our data rescue pilot in collaboration with the Data Rescue Boulder. That piece offers information about the process, tools, lessons learned and outcomes.

I am writing this op-ed piece to outline the motivation, background, context for this pilot. In a fundamental sense, this op-ed piece is a call to action.

Johns Hopkins has been involved with both Data Refuge and Data Rescue Boulder. Our libraries’ NDSR resident, Elizabeth England, helped organize the Data Refuge event at Georgetown University. I will note that both Data Refuge and Data Rescue Boulder have a common goal — to curate data such that researchers can use them effectively. However, both have adopted very different approaches.

Data Conservancy long-time partner Ruth Duerr of the Ronin Institute introduced me to Joan Saez of Data Rescue (DR) Boulder. This introduction represents an important part of our story. Both Ruth and Joan are data managers. For years, they have worked directly with their respective designated communities (in the OAIS sense). One of the best outcomes of the DataNet grant that launched the Data Conservancy is this connection to data managers like Ruth who make connections to other data managers like Joan. Both of them are well connected to the individuals from federal data centers who have unique, tacit, and invaluable knowledge necessary for effective data rescue.

As I spoke with Joan over a series of emails and phone calls, it became clear to both of us that we had an important opportunity to bridge the high throughput efforts of DR Boulder and the data preservation work of the Data Conservancy. This realization led to the data rescue pilot effort between the two groups.

I approached Associate Professor Ben Zaitchik from the Department of Earth and Planetary Sciences (EPS) at Johns Hopkins University regarding this data rescue pilot. Ben agreed to identify a subset of DR Boulder data that would be relevant to his research and teaching. More importantly, he shared the sense of urgency to conduct this pilot. Since the announcement of data management plan (DMP) guidelines or requirements and the Obama Administration White House OSTP memorandum on public access, research libraries have expected a rise in demand for data management services. While there has been some increase, one could argue reasonably that supply has been misaligned from demand. The current concerns about data access or availability have generated urgency in a manner that has not been present beforehand. In the sense of unintended consequences, the current threats to data present an opportunity or moment in time.

JHU Libraries have been focused on data management for 15 years. Libraries that were involved with the NSF DataNet program have been focused on data management for 10 years. With the announcement of DMP guidelines and the OSTP memo, many research libraries have been focused on data management for 5 years. As a community, we have had years to consider the data management needs of our researchers.

No doubt that discussions to date about data rescue have highlighted important topics for consideration such as community engagement, metadata, packaging, provenance, preservation and linked data. But I would urge that we act on our lessons learned and developed capabilities especially when our faculty — one of our most important designated communities — may have a sense of urgency.

If data are indeed a new form of collections, then having a faculty member identify subsets of data relevant for research and teaching and having libraries curate and steward those data within local preservation frameworks seems like a model for data rescue collection development.

I appreciate the balance between planning and acting. But I think it is telling that much of libraries’ activities to date have been related to data management planning. This begs the question of planning for what? I’m reminded of a quote that leadership is about making a decision; if it’s the right one, that’s a bonus. While this may seem facetious, it highlights the importance of appreciating that, in some circumstances, it is indeed better to act and make mistakes rather than wait — perhaps in vain — for the ideal opportunity and moment in time to converge perfectly.

The Data Rescue Boulder and Data Conservancy data rescue pilot offers a model for action. Undoubtedly, there are ways in which our approach can be improved. I welcome feedback in this regard. But I also hope that feedback or comments are ultimately aimed at developing further actions rather than plans. And if you are ready to act, then I definitely want to hear from you.