Data archiving is a good investment (original) (raw)

We have found that ongoing financial investment in data-archiving infrastructure yields an impressive scientific return, and believe that it should be whole-heartedly supported by research funding agencies (see, for example, http://go.nature.com/nzftf3).

We used Dryad (see http://datadryad.org), an international, open, cost-effective data repository for the biological sciences, to estimate the cost of archiving data from more than 10,000 publications. We found that these could be curated and the data preserved at an annual cost of about US$400,000.

As an example of how much research is typically published per grant dollar, core grants in population and community ecology from the US National Science Foundation averaged 3–4 publications per 100,000ofgrantbetween2000and2005(S.Reyes,A.TessierandS.Mazer,unpublishedresults).Thatis,100,000 of grant between 2000 and 2005 (S. Reyes, A. Tessier and S. Mazer, unpublished results). That is, 100,000ofgrantbetween2000and2005(S.Reyes,A.TessierandS.Mazer,unpublishedresults).Thatis,400,000 invested in original research resulted in about 16 papers.

Dryad cannot yet tell us how effective data archives are in facilitating primary research publications, but the Gene Expression Omnibus (GEO) database at the US National Center for Biotechnology Information offers some insight. To estimate data reuse, we searched the full text of articles in PubMed Central for mention of any of the 2,711 data sets deposited in GEO in 2007. We excluded articles whose authors' names overlapped with those depositing the data set. Extrapolating the 338 hits in PubMed Central to all of PubMed, we estimate that the GEO 2007 data sets made third-party contributions to more than 1,150 published articles by the end of 2010, and reuse continues to accumulate rapidly (H. A. Piwowar, T. J. Vision and M. C. Whitlock Dryad Digital Repository doi:10.5061/dryad.j1fd7; 2011).

Assuming that Dryad has a comparable rate of reuse and collects at least 2,500 data sets annually, an investment of $400,000 in one year should contribute to more than 1,000 papers in the next four years — far more than the accepted value for a research dollar.

Author information

Authors and Affiliations

  1. Dryad, and the National Evolutionary Synthesis Center, Durham, North Carolina, USA
    Heather A. Piwowar
  2. Dryad, and the University of North Carolina, Chapel Hill, North Carolina, USA
    Todd J. Vision
  3. Dryad, and the University of British Columbia, Vancouver, Canada
    Michael C. Whitlock

Authors

  1. Heather A. Piwowar
    You can also search for this author inPubMed Google Scholar
  2. Todd J. Vision
    You can also search for this author inPubMed Google Scholar
  3. Michael C. Whitlock
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toHeather A. Piwowar.

Ethics declarations

Competing interests

Heather Piwowar and Todd Vision receive research support from the Dryad data repository project.

Rights and permissions

About this article

Cite this article

Piwowar, H., Vision, T. & Whitlock, M. Data archiving is a good investment.Nature 473, 285 (2011). https://doi.org/10.1038/473285a

Download citation