Data archiving is a good investment (original) (raw)
We have found that ongoing financial investment in data-archiving infrastructure yields an impressive scientific return, and believe that it should be whole-heartedly supported by research funding agencies (see, for example, http://go.nature.com/nzftf3).
We used Dryad (see http://datadryad.org), an international, open, cost-effective data repository for the biological sciences, to estimate the cost of archiving data from more than 10,000 publications. We found that these could be curated and the data preserved at an annual cost of about US$400,000.
As an example of how much research is typically published per grant dollar, core grants in population and community ecology from the US National Science Foundation averaged 3–4 publications per 100,000ofgrantbetween2000and2005(S.Reyes,A.TessierandS.Mazer,unpublishedresults).Thatis,100,000 of grant between 2000 and 2005 (S. Reyes, A. Tessier and S. Mazer, unpublished results). That is, 100,000ofgrantbetween2000and2005(S.Reyes,A.TessierandS.Mazer,unpublishedresults).Thatis,400,000 invested in original research resulted in about 16 papers.
Dryad cannot yet tell us how effective data archives are in facilitating primary research publications, but the Gene Expression Omnibus (GEO) database at the US National Center for Biotechnology Information offers some insight. To estimate data reuse, we searched the full text of articles in PubMed Central for mention of any of the 2,711 data sets deposited in GEO in 2007. We excluded articles whose authors' names overlapped with those depositing the data set. Extrapolating the 338 hits in PubMed Central to all of PubMed, we estimate that the GEO 2007 data sets made third-party contributions to more than 1,150 published articles by the end of 2010, and reuse continues to accumulate rapidly (H. A. Piwowar, T. J. Vision and M. C. Whitlock Dryad Digital Repository doi:10.5061/dryad.j1fd7; 2011).
Assuming that Dryad has a comparable rate of reuse and collects at least 2,500 data sets annually, an investment of $400,000 in one year should contribute to more than 1,000 papers in the next four years — far more than the accepted value for a research dollar.
Author information
Authors and Affiliations
- Dryad, and the National Evolutionary Synthesis Center, Durham, North Carolina, USA
Heather A. Piwowar - Dryad, and the University of North Carolina, Chapel Hill, North Carolina, USA
Todd J. Vision - Dryad, and the University of British Columbia, Vancouver, Canada
Michael C. Whitlock
Authors
- Heather A. Piwowar
You can also search for this author inPubMed Google Scholar - Todd J. Vision
You can also search for this author inPubMed Google Scholar - Michael C. Whitlock
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toHeather A. Piwowar.
Ethics declarations
Competing interests
Heather Piwowar and Todd Vision receive research support from the Dryad data repository project.
Rights and permissions
About this article
Cite this article
Piwowar, H., Vision, T. & Whitlock, M. Data archiving is a good investment.Nature 473, 285 (2011). https://doi.org/10.1038/473285a
- Published: 18 May 2011
- Issue Date: 19 May 2011
- DOI: https://doi.org/10.1038/473285a