The case for cloud computing in genome informatics - PubMed (original) (raw)

The case for cloud computing in genome informatics

Lincoln D Stein. Genome Biol. 2010.

Abstract

With DNA sequencing now getting cheaper more quickly than data storage or computation, the time may have come for genome informatics to migrate to the cloud.

PubMed Disclaimer

Figures

Figure 1

The old genome informatics ecosystem. Under the traditional flow of genome information, sequencing laboratories transmit raw and interpreted sequencing information across the internet to one of several sequencing archives. This information is accessed either directly by casual users or indirectly via a website run by one of the value-added genome integrators. Power users typically download large datasets from the archives onto their local compute clusters for computationally intensive number crunching. Under this model, the sequencing archives, value-added integrators and power users all maintain their own compute and storage clusters and keep local copies of the sequencing datasets.

Figure 2

Historical trends in storage prices versus DNA sequencing costs. The blue squares describe the historic cost of disk prices in megabytes per US dollar. The long-term trend (blue line, which is a straight line here because the plot is logarithmic) shows exponential growth in storage per dollar with a doubling time of roughly 1.5 years. The cost of DNA sequencing, expressed in base pairs per dollar, is shown by the red triangles. It follows an exponential curve (yellow line) with a doubling time slightly slower than disk storage until 2004, when next generation sequencing (NGS) causes an inflection in the curve to a doubling time of less than 6 months (red line). These curves are not corrected for inflation or for the 'fully loaded' cost of sequencing and disk storage, which would include personnel costs, depreciation and overhead.

Figure 3

The 'new' genome informatics ecosystem based on cloud computing. In this model, the community's storage and compute resources are co-located in a 'cloud' maintained by a large service provider. The sequence archives and value-added integrators maintain servers and storage systems within the cloud, and use more or less capacity as needed for daily and seasonal fluctuations in usage. Casual users continue to access the data via the websites of the archives and integrators, but power users now have the option of creating virtual on-demand compute clusters within the cloud, which have direct access to the sequencing datasets.

Cited by

Collaborative cloud-enabled tools allow rapid, reproducible biological insights.
Ragan-Kelley B, Walters WA, McDonald D, Riley J, Granger BE, Gonzalez A, Knight R, Perez F, Caporaso JG. Ragan-Kelley B, et al. ISME J. 2013 Mar;7(3):461-4. doi: 10.1038/ismej.2012.123. Epub 2012 Oct 25. ISME J. 2013. PMID: 23096404 Free PMC article. No abstract available.
Recommendations on e-infrastructures for next-generation sequencing.
Spjuth O, Bongcam-Rudloff E, Dahlberg J, Dahlö M, Kallio A, Pireddu L, Vezzi F, Korpelainen E. Spjuth O, et al. Gigascience. 2016 Jun 7;5:26. doi: 10.1186/s13742-016-0132-7. Gigascience. 2016. PMID: 27267963 Free PMC article. Review.
Methods of integrating data to uncover genotype-phenotype interactions.
Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Ritchie MD, et al. Nat Rev Genet. 2015 Feb;16(2):85-97. doi: 10.1038/nrg3868. Epub 2015 Jan 13. Nat Rev Genet. 2015. PMID: 25582081 Review.
Novel bioinformatic developments for exome sequencing.
Lelieveld SH, Veltman JA, Gilissen C. Lelieveld SH, et al. Hum Genet. 2016 Jun;135(6):603-14. doi: 10.1007/s00439-016-1658-6. Epub 2016 Apr 13. Hum Genet. 2016. PMID: 27075447 Free PMC article. Review.
Adaptive efficient compression of genomes.
Wandelt S, Leser U. Wandelt S, et al. Algorithms Mol Biol. 2012 Nov 12;7(1):30. doi: 10.1186/1748-7188-7-30. Algorithms Mol Biol. 2012. PMID: 23146997 Free PMC article.

References

1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DJ. GenBank. Nucleic Acids Res. 2005;33:D34–D38. doi: 10.1093/nar/gki063. - DOI - PMC - PubMed
1. Brooksbank C, Cameron G, Thornton J. The European Bioinformatics Institute's data resources. Nucleic Acids Res. 2010;38:D17–D25. doi: 10.1093/nar/gkp986. - DOI - PMC - PubMed
1. Sugawara H, Ogasawara O, Okubo K, Gojobori T, Tateno Y. DDBJ with new system and face. Nucleic Acids Res. 2008;36:D22–24. doi: 10.1093/nar/gkm889. - DOI - PMC - PubMed
1. Shumway M, Cochrane G, Sugawara H. Archiving next generation sequencing data. Nucleic Acids Res. 2010;38:D870–D871. doi: 10.1093/nar/gkp1078. - DOI - PMC - PubMed
1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–D890. doi: 10.1093/nar/gkn764. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

The case for cloud computing in genome informatics - PubMed (original) (raw)