APS Comments on Plans for Data Repositories

Home / Advocacy Article

On January 17, 2020 the Office of Science and Technology Policy (OSTP) issued a request for comments on a set of draft desirable characteristics of repositories for managing and sharing data resulting from federally-funded research. APS journals already encourage authors to make the data underlying their conclusions available through public repositories. As the government seeks to make more data available through deposition in repositories, APS offered comments on how those repositories should operate. Earlier this year APS also provided input on the NIH draft plan for data management and sharing.

Dear Dr. Droegemeier:

The American Physiological Society (APS) appreciates the opportunity to submit remarks in response to the request for comments on draft desirable characteristics for repositories for managing and sharing data resulting from federally funded research. As a publisher of 15 scientific journals, the society’s publications policies already encourage authors to “make data that underlie the conclusions reported in the article freely available via public repositories or available to readers upon request.”

As a general comment on the implementation of data deposition policies for federally-funded research, the government should consider the costs and administrative burdens associated with data deposition and should seek to harmonize requirements across federal agencies to the greatest extent possible.

With respect to the specific characteristics detailed in the federal register notice, APS offers the following comments on selected provisions.

A. The use of Persistent Unique Identifiers (PUID) for data submissions is absolutely necessary for locating deposited data. Only in rare instances should data become unavailable once it has been deposited. As noted in (B.), long-term sustainability of data repositories is important and each repository should have back-up plans to preserve and transfer data if there was a need to shut down. Federal agencies will need to determine how to fund long-term data storage that extends beyond the end of each award period and preferably for a much more extended period of time.

C. Standard terminology should be used as much as possible to describe data sets. This should include clear annotation, and definitions should be provided as needed. Important questions about metadata include: What metadata will be required? To what extent will an accompanying description of methods be required along with the data for the purposes of replicating experimental results?

D. Curation and quality assurance are highly desirable for data repositories, but it is not clear how this expertise will be provided. How will submitted data be evaluated for quality? Current costs for data storage are sometimes significant depending on the volume of data, and the addition of curation and quality assurance will add to those costs, which must be considered. Where shared data has not undergone peer review in the context of publication, how will the quality of the data be assessed? Will it be evaluated before, or after it is made public? As data from all federally-funded projects begins to accumulate, the sheer volume of the data available will limit the ability of the scientific community to examine and provide meaningful review via informal crowdsourcing.

E, F. Repositories should be designed to provide ease of access both for scientists depositing the data and for users accessing it.

G. Tracking data citation through the use of PUIDs is straightforward, but more details are needed about how repositories might track data usage in order to understand how that would be accomplished. Will users be required to create unique sign in profiles?

H. Repositories should be able to provide access to data in a manner that is automatically consistent with any necessary restrictions on access and reuse such as intellectual property concerns.

J. Research generates an enormous range of data types. Therefore, it will be difficult and perhaps impossible to develop a common format for depositing data into databases. In some cases, specialized software may be required to access and view the data – for example imaging data from different sources. How to make the necessary software available and ensuring long-term compatibility between the software and the data should be considered in the development of repositories. A critical question is also what constitutes “data”. Many labs generate thousands of individual data points or sets each day – do they all need an individual PUID? Are they treated individually or as a data collective for each experiment or set of experiments?

K. Repositories should maintain information about any changes made to data or metadata deposited in them. In addition, they should have security measures in place to ensure that information is not changed in an inappropriate or fraudulent manner after deposition.

As OSTP works to increase access to the results of federally-funded research, APS appreciates the opportunity to provide input. We hope we will have the opportunity for continued conversations on these complex and important topics.

Sincerely,
Meredith Hay, Ph.D.
President