Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges (original) (raw)

Nature Reviews Genetics volume 9, pages 678–688 (2008)Cite this article

Key Points

Abstract

Wiki pages and commenting Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

Julia for biologists

Article 06 April 2023

References

  1. caBIG Strategic Planning Workspace. The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12, 330–334 (2007).
  2. Martone, M. E., Gupta, A. & Ellisman, M. H. E-neuroscience: challenges and triumphs in integrating distributed data from molecules to brains. Nature Neurosci. 7, 467–472 (2004).
    Article CAS PubMed Google Scholar
  3. Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 36, D13–D21 (2008).
    Article CAS PubMed Google Scholar
  4. Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).
    Article CAS PubMed Google Scholar
  5. Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).
    Article CAS PubMed Google Scholar
  6. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
    Article CAS PubMed Google Scholar
  7. Ilic, K. et al. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant. Plant Physiol. 143, 587–599 (2007).
    Article CAS PubMed PubMed Central Google Scholar
  8. Fields, S., Song, O. A novel genetic system to detect protein–protein interactions. Nature 340, 245–246 (1989).
    Article CAS PubMed Google Scholar
  9. Maglott, D., Ostell, J., Pruitt, K. D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35, D26–D31 (2007).
    Article CAS PubMed Google Scholar
  10. UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).
  11. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
    Article CAS PubMed PubMed Central Google Scholar
  12. King, D. C. et al. Evaluation of regulatory potential and conservation scores for detecting _cis_-regulatory modules in aligned mammalian genome sequences. Genome Res. 15, 1051–1060 (2005).
    Article CAS PubMed PubMed Central Google Scholar
  13. Kent, W. J. BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
    Article CAS PubMed PubMed Central Google Scholar
  14. Dowell, R. D., Jokerst, R. M., Day, A., Eddy, S. R. & Stein, L. The distributed annotation system. BMC Bioinformatics 2, 7 (2001). This paper describes an early biological cyberinfrastructure system that uses a common syntactic protocol to exchange data about genome annotations, but it has the problem of weak semantics.
    Article CAS PubMed PubMed Central Google Scholar
  15. Stevens, R. D., Robinson, A. J. & Goble C. A. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19 (Suppl 1), i302–i304 (2003).
    Article PubMed Google Scholar
  16. Qiao, W., McLennan, M., Kennel, R., Ebert D. S., & Klimeck, G. Hub-based simulation and graphics hardware accelerated visualization for nanotechnology applications. IEEE Trans. Vis. Comput. Graph. 12, 1061–1068 (2006).
    Article PubMed Google Scholar
  17. Stein, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).
    Article CAS PubMed PubMed Central Google Scholar
  18. Mungall, C. J., Emmert D. B. & FlyBase Consortium . A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics 23, i337–i346 (2007). This paper describes a cyberinfrastructure approach built on a tightly coupled shared common-data model.
    Article CAS PubMed Google Scholar
  19. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
    Article CAS PubMed PubMed Central Google Scholar
  20. Noy, N. F. et al. Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu. Symp. Proc. 2003, 953 (2003).
    PubMed Central Google Scholar
  21. Reich, M. et al. GenePattern 2.0. Nature Genet. 38, 500–501 (2006).
    Article CAS PubMed Google Scholar
  22. Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).
    Article CAS PubMed PubMed Central Google Scholar
  23. Sotomayor, B. & Childers, L. Globus Toolkit 4: Programming Java Services 1st edn (Morgan Kaufmann, San Fransisco, 2005).
    Google Scholar
  24. Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 3, 331–341 (2002).
    Article PubMed Google Scholar
  25. Wilkinson, M., Schoof, H., Ernst, R. & Haase, D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol. 138, 5–17 (2005). This paper describes a large-scale attempt to integrate multiple resources using web services.
    Article CAS PubMed PubMed Central Google Scholar
  26. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This is the foundational paper for the Gene Ontology, a system for describing the molecular function of genes in a way that allows gene-based resources to be integrated at the semantic level.
    Article CAS PubMed Google Scholar
  27. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
    Article PubMed PubMed Central Google Scholar
  28. Lacy, L. W. Owl: Representing Information Using the Web Ontology Language (Trafford Publishing, Victoria, Canada, 2005).
    Google Scholar
  29. Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004). This paper describes Taverna, an exemplar platform for integrating bioinformatics workflows across loosely coupled sites and technologies that share common semantics.
    Article CAS PubMed Google Scholar
  30. Lord, P. et al. Applying semantic web services to bioinformatics: experiences gained, lessons learnt. International Semantic Web Conference 350–364 [online], (2004).
    Google Scholar
  31. Buck, M. J. & Lieb, J. D. ChIP–chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360 (2004).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

I wish to thank the staff of myGrid, BIRN, caBIG, iPlant, EcoliHub and nanoHub for their assistance during the research phase of this Review. I would also like to thank the three anonymous reviewers who took the time to review this article in manuscript stage and to make comments and suggestions. This work was supported in part by a grant from the National Science Foundation Division of Emerging Frontiers (0735191).

Author information

Authors and Affiliations

  1. Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York, 11724, USA
    Lincoln D. Stein
  2. Ontario Institute for Cancer Research, 101 College Street, Toronto, M5G 1L7, Ontario, Canada
    Lincoln D. Stein

Authors

  1. Lincoln D. Stein
    You can also search for this author inPubMed Google Scholar

Ethics declarations

Competing interests

In the interests of full disclosure, the author has been directly or indirectly involved in the following projects discussed in this article: DAS, GMOD, BioMOBY, SSWAP, caBIG and iPC.

Glossary

WIKI

A popular web page authoring system that allows individuals to collaborate on large communal documents. Wikipedia is the best known example, but there are many tens of thousands of WIKIs in use. The name comes from the Hawaiian word for quick.

Ontology

An enumeration of the concepts used in a particular domain of knowledge, their definitions and the relationships between them.

Web service

A web-based resource that can be programmatically invoked to perform a database search or a computation, or to provide some other service.

Web Services Description Language

(WSDL). An XML-based language used to describe the nature of SOAP web services.

Simple Object Access Protocol

(SOAP). The dominant messaging protocol for defining and invoking web services.

OWL

A dyslexic acronym for Web Ontology Language. It is an XML-based language used to describe ontologies. A variant of OWL called OWL Description Logics (OWL DL) is particularly suited for creating semantic webs of ontologies that can be traversed by reasoning engines.

Representational State Transfer

(REST). An alternative web services protocol that is sometimes more suitable than SOAP for particular web services.

Semantic web

An interrelated network of ontologies that together describe resources available on the web.

Rights and permissions

About this article

Cite this article

Stein, L. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges.Nat Rev Genet 9, 678–688 (2008). https://doi.org/10.1038/nrg2414

Download citation