NCBI GEO: archive for functional genomics data sets--10 years on - PubMed (original) (raw)

. 2011 Jan;39(Database issue):D1005-10.

doi: 10.1093/nar/gkq1184. Epub 2010 Nov 21.

Dennis B Troup, Stephen E Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F Kim, Maxim Tomashevsky, Kimberly A Marshall, Katherine H Phillippy, Patti M Sherman, Rolf N Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, Alexandra Soboleva

Affiliations

NCBI GEO: archive for functional genomics data sets--10 years on

Tanya Barrett et al. Nucleic Acids Res. 2011 Jan.

Abstract

A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

A timeline of GEO database growth, development and events. The chart represents accumulative growth of publicly-available Sample records from 2000 to September 2010. A further 80 000 Samples are currently held private until published, making a total of about 550 000 Samples. The current rate of submission and processing is over 10 000 Samples per month. (A) First data uploaded to database. (B) MIAME proposal is published, outlining the minimum information that should be included when describing a microarray experiment (2). (C) Nature journals announce requirement for microarray data deposit to public databases (4). (D) Reviewer access mechanism enabled, allowing anonymous confidential review of pre-published data. (E and inset 1) GEO Profiles database released, enabling search and visualization of individual gene expression charts. (F and inset 2) Interactive pre-computed cluster heatmaps released, allowing users to view and select regions of interesting gene expression patterns. (G) Major database modifications released aimed at better support of MIAME elements. (H) GEO increases enforcement of provision of raw data. (I) Bioconductor GEOquery package published, allowing GEO data to be imported into R environment (22). (J) GEOarchive spreadsheet submission format released, enabling rapid batch deposit of data. (K) All GEO Series records re-classified according to technology and experiment type making it simple to locate studies of a specific type; types are listed in Table 1. (L) Improvements to DataSet Browser and accompanying analysis tools panel implemented. (M and inset 3) First release of next-generation sequence tracks on NCBI’s Sequence Viewer. These tracks were generated in support of the NIH Roadmap Epigenomics project,

http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/

. (N) Links generated to NCBI’s new Epigenomics resource (7) which applies advanced curation and genome browser tracks for hundreds of next-generation sequence Samples derived from GEO. (O) Advanced Search tool released, helping users construct complex queries.

Similar articles

Cited by

References

    1. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–210. - PMC - PubMed
    1. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 2001;29:365–371. - PubMed
    1. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, et al. ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009;37:D868–D872. - PMC - PubMed
    1. Microarray standards at last. Nature. 2002;419:323. Available at http://www.nature.com/nature/journal/v419/n6905/full/419323a.html. - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–D890. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources