The library matrix | Data Conservancy

by Agnieszka Gautier, NSIDC

A view into the George Peabody Library at Johns Hopkins University, Maryland. Photo credit: Matthew Petroff, flickr.com

Libraries are knowledge warehouses. Not so long ago, content came in the form of books or other paper documents, recordings, photographs, and other visual media. This stuff was tangible. Now that clunky card catalog with its index cards has become archaic, even for the library. More and more, information is in the form of digital data, existing in what may as well be the ether for most users.

Though technology has shifted, the librarian still functions as the in-between, effectively connecting people to information. “Data curation is still brand new,” says David Fearon, a Data Management Consultant, or a new type of librarian, for John Hopkins University (JHU) Data Management Services. “We’re still finding our way and learning what users need.”

The mean of meaning

The library still operates as a hub for research. So many universities are finding libraries good places to operate data repositories. “There’s been a push over the last decade, and especially in the last three or four years, in data sharing, particularly for public-funded research,” Fearon says. Data repositories should reduce redundancy—or such is the hope. The idea is simple enough: more value can be squeezed out of research when data can be reused, expanded upon, or checked for reproducibility of the original research.

But information still needs to be organized, not just plugged in randomly into some software. “Researchers shouldn’t automatically assume that the process of archiving data will just be posting a file to a website or making an attachment to an email or putting it in some remote server,” Fearon says. “It needs to have a little more labor attached to it.”

Researchers collect and analyze data. That beast of collected data then needs to be tamed. For instance, how does research get cataloged? Researchers have their specific terminology depending on their field, and sometimes the same words are used to mean different things. Language itself is challenging, now what about metadata labels? Dates and location may seem straightforward, but perhaps keywords are not. A degree of standard must exist.

An artist depiction of a present-day librarian, who works with tangible material as well as data management within computer systems. Photo credit: David Fearon

Ask and you shall receive

Big data needs to be organized. Its labeling needs to make sense; otherwise, the information gets lost within a matrix of jargon. Good data collection is a matter of skill, both in the actual depositing of data, and then, within the smart software that retrieves information with sense and ease. This is where data management consultants come in. The current version of the Data Conservancy isn’t regarded as self-service software. Consultants have been made available to researchers and other users at the depositing end and at the retrieval end.

“We’ve been doing the bug testing more or less from the beginning,” Fearon says about the new Data Conservancy software. According to Fearon, its organization works well, especially for various types of collections, whether that means a single research project with multiple parts or different phases of a project. The organization is flexible enough to determine the stage of a project. It packages data efficiently. About the product, Fearon says, “There will be a lot of power under the hood. It’s going to do the robust data preservation things you don’t see in other products.”

By offering data repositories like the Data Conservancy, libraries are able to bring data discovery resources to researchers and other users, who might not be able to find material on their own. Libraries are literarily moving beyond their walls. “The library expands the support of the research life cycle rather than just applying reference sources to journal articles,” Fearon says. Present-day librarians help researchers with their proposals, while making sure data is well managed, disseminated, and preserved. “Archiving data isn’t really an easy self-serving thing. It takes a bit of labor but you get a better product as a result, and that’s why we’ve set up a library service to help with that.”

Visit the Software section of the Data Conservancy site for more information about the Data Conservancy software system.