Toward a plant genomics initiative: Thoughts on the value of cross-species and cross-genera comparisons in the grasses (original) (raw)

Abstract

Comparative genomics offers unparalleled opportunities to integrate historically distinct disciplines, to link disparate biological kingdoms, and to bridge basic and applied science. Cross-species, cross-genera, and cross-kingdom comparisons are proving key to understanding how genes are structured, how gene structure relates to gene function, and how changes in DNA have given rise to the biological diversity on the planet. The application of genomics to the study of crop species offers special opportunities for innovative approaches for combining sequence information with the vast reservoirs of historical information associated with crops and their evolution. The grasses provide a particularly well developed system for the development of tools to facilitate comparative genetic interpretation among members of a diverse and evolutionarily successful family. Rice provides advantages for genomic sequencing because of its small genome and its diploid nature, whereas each of the other grasses provides complementary genetic information that will help extract meaning from the sequence data. Because of the importance of the cereals to the human food chain, developments in this area can lead directly to opportunities for improving the health and productivity of our food systems and for promoting the sustainable use of natural resources.


As originally conceived, the Plant Genome Initiative was a focused effort to apply genomic research to the study of maize and maize improvement aimed at enhancing the competitive position of one of the major commodities of the United States. That this kind of scientific research can lead directly to product development and opportunities for economic growth is important. However, there is a more long-term, or perhaps visionary, reason for seriously contemplating what a broad-based scientifically driven “Plant Genomics Initiative” could mean to people around the world.

Genomic research is profoundly altering the way biologists think about living things. This branch of science seeks to understand how genes and genomes are structured, how they function, and how they evolved. Inquiry is based on sophisticated computational analysis of large data sets generated by high through-put sequencing and chip-reading robots. It has the power to unify disparate kingdoms, build bridges across ancient genetic voids, and establish new channels of communication across commodity or species-specific domains. With increasing pressure on the natural resource base to provide food, shelter, and a habitable environment for living things on the planet, the biological sciences and genetics in particular are likely to become a focal point of public opinion and scientific investigation as we enter the 21st century. The interpretation and utilization of new scientific insights and technological potential may play a major role in determining whether the years ahead are a time of hope and optimism for the planet and its people, or a time of divisiveness and alienation. The promise is that out of biological investigation come products and ideas that enable us to live better—breath cleaner air, drink purer water, eat healthier food—but science is a process more profound than product development, and its long-term potential is to help create a vision that can also help us to live together more harmoniously, more finely tuned to our own biological rhythms and those of the world we live in, than what we can conceive of today.

At this time, genome research has been driven primarily by the human genome project and its spin-offs (the sequencing of entire microbial genomes and other “model” organisms such as_Drosophila_, Caenorhabditis, and Arabidopsis). Genomics has profoundly changed the kind of information that is available to biologists, the kinds of questions they are asking, and the research strategies they employ. Although most biologists still specialize in the study of a specific organism or a specific biochemical, physiological, or agronomic pathway as the entry point into the study of living things, the quantity and quality of data that may reasonably be accessed or generated over the course of a study has grown exponentially. High through-put sequencing and computerized data access are now an entry point rather than a final achievement in the quest to understand the workings of the biological world. Groups of investigators far removed in time, space, and organizational orbit from the machines that generate raw sequence data will be involved in deciphering its meaning. Genomics is fueling research and product development in the pharmaceutical, agricultural, and basic sciences today, and the volume, depth, and diversity of available sequence and mapping data has stimulated the interest and imagination of biologists around the world.

As we contemplate the potential of a “Plant Genomics Project” for agriculture, it becomes obvious that there are both parallels and novelties compared with the human and microbial genome projects that have come before. Plant genomics will be oriented toward the use of genomic research to enhance the productivity, quality, and sustainability of our food production systems. The use of agricultural species with their rich history of genetic improvement, their extensive germplasm collections, and the intimate associations with human history and culture throughout the world bring new challenges and possibilities. Sophisticated and comprehensive computational tools will be needed to integrate not only sequence data but information regarding genotypic performance, pedigree relationships, and germplasm diversity so that genomic data can be interpreted in ways that are useful to agricultural scientists and geneticists.

In the United States, federally funded scientific research for agriculture is largely supported by the U.S. Department of Agriculture, and over the past 10–12 years, one of the major national research initiatives in agriculture has been the Plant Genome Project. This project has supported the development of molecular genetic maps for more than 50 crop species (R. Phillips, National Research Initiative, personal communication), and the maps have been used to localize genes and quantitative trait loci (QTLs), to select for those genes in plant breeding, and for positional gene cloning. The proliferation of molecular mapping activities was predicated on two fundamental working hypotheses: first, that the maps would be useful in genetic improvement of the target crops and that this would be beneficial for U.S. agriculture, and second, that a separate map had to be developed for each crop because mapping was a commodity specific endeavor. Thus, species-specific maps were developed independently in relative isolation of one another for several years.

In the late 1980s, these assumptions began to give way to molecular linkage data suggesting that, among diverse relatives within a plant family, large regions of chromosomes had been inherited intact from a common ancestor (1, 2). Within a few years, this had led to the idea that an entire plant family could be studied as a single genetic system (36), and the richest accumulation of data supporting this concept has occurred in the grass family. The family consists of five principal subfamilies and about 10,000 diverse species, including some of our most important agricultural food crops, such as rice, wheat, barley, rye oat, maize, sorghum, millet, and sugarcane (7). Despite the wide diversity of forms and habitat specialization among the grasses, there is an underlying consistency in the general plant body and seed-bearing unit observable among members of the Gramineae, an observation that was well documented by Vavilov in the early 1930s and 40s (8).

Over the last decade, extensive, species-specific maps were developed for many members of the Gramineae including rice (Oryza), wheat (Triticum), barley (Hordeum), rye (Secale), oats (Avena), pearl millet (Pennisetum), maize (Zea), sorghum (Sorghum), and sugarcane (Saccharum) (http://probe.nalusda.gov:8300). Comparative mapping among these genera by using a common set of cDNA clones suggests that the same genes are arranged in identical order along large tracts of chromosome. This opens the door to the concept that all of these crops contain essentially the same set of genes, duplicated extensively in some genomes (notably maize and wheat), rearranged through translocation and other mutational events over time, but still recognizably the same ancestral set of genes.

To arrive at this conclusion, the species-specific maps were aligned through the use of a set of “anchor” cDNA probes. This set of probes was developed at Cornell University by Van Deynze et al. (9) and has been distributed over the last 2 years to more than 50 research groups for mapping experiments. Linkage among anchored loci has allowed the identification of homoeologous regions of distantly related genomes and provides instantaneous coordination of mapping results among independent investigators. This set of probes consists of 152 cDNAs selected from rice, oat, barley, and wheat libraries [67 from rice, 63 from oat, 21 from barley, and 1 from wheat that met the following criteria: (_i_) they hybridized to the majority of grass species surveyed (rice, wheat, barley, oat, maize, sorghum, sugarcane) by using Southern analysis, (_ii_) they appeared to be low or single copy in rice (the grass species with the smallest genome), and (_iii_) they provided optimal genome coverage when mapped onto existing species-specific linkage maps of rice, oat, wheat, barley, and maize (9).

The anchor probes have been end-sequenced at both the 5′ and the 3′ ends (GenBank accession nos. AA231638AA231938; also available in the RiceGenes database http://probe.nalusda.gov:8300) and, based on a homology search using blastx scores exceeding 100, 78% showed significant similarity to protein sequences for known genes in the National Center for Biotechnology Information. Efforts are underway to localize these grass anchor probes on maps of more distantly related plant species (i.e., dicots and gymnosperms) to determine how extensive the homeologous relationships are across larger evolutionary distances. The evaluation of macrosynteny, based on the conserved order of genes (cDNAs) along a chromosomal segment, will be complemented with evaluations of microsynteny, based on sequencing of large tracts of genomic DNA. Comparative sequencing of multiple genomic regions anchored by homologous cDNAs, both within the grass family as initiated by Chen et al. (10) and between evolutionarily more divergent groups of plants, will provide critical data for determining the most efficient way to align sequence information across diverse plant species.

Important to the ability to treat the grasses as a unified genetic system is the development of computational tools that facilitate cross-genome comparisons. To identify and visualize conserved linkage blocks (homoeologous segments) in multiple genera simultaneously, an interactive display has been developed in the RiceGenes database (E. Paul, M. Blinstrub, and S.R.M., curators;http://probe.nalusda.gov:8300). This display currently integrates information from rice, maize, oat, and wheat comparative maps and allows the user to clearly observe the alignment of homoeologous chromosome segments from multiple-grass genera simultaneously. The display encourages the user to contemplate internal duplications within a single genome, a feature particularly characteristic of maize, and provides notes about genes or QTLs occurring in a particular region of a linkage map to help align genes with putative or known function across members of the family. This dynamic display provides the user with a powerful and versatile tool for examining evolutionary relationships among the grasses and making cross-genera comparisons. Evaluation of macrosynteny and higher-resolution studies of microsynteny in the grasses will need to be expanded to accomodate parallel studies in other plant families. Further development of computational tools to enhance our understanding and appreciation of similarities and differences among species and genera at all levels of resolution is vitally needed.

Rice is a particularly valuable point of comparison for comparative studies involving monocots because of its small, diploid genome (430 Mb per haploid cell) (11), its well developed genetic and physical maps, and the initiation of an international effort to completely sequence the genome (ftp://genome1.bio.bnl.gov/pub/maize/rice.hml). It has a large, publicly available germplasm collection (84,000 accessions) and an active research community, reflecting its importance as a staple food crop. Information is currently available on more than 3,000 mapped DNA markers (12, 13) and more than 150 morphological mutants (14). There is an average spacing of 1 DNA marker every 0.5 cM, and an average DNA/cM ratio of 250–300 kb/cM (15). Physical mapping based on yeast artificial chromosome (YAC) and bacterial artificial chromosome (BAC) contigs has progressed rapidly, and groups in Japan (16) and China (17) report more than 50% coverage of the rice genome. Public availability of YAC (16), BAC (18, 19), and cosmid libraries (20) has facilitated high-resolution mapping aimed at gene cloning via a positional/candidate approach (2025). More than 15,000 expressed sequence tags and complete cDNA sequences of rice are available in public databases, providing an important body of data for sequence comparisons with other organisms.

To take advantage of the growing ability to use sequence information to bridge the investigation of evolutionarily divergent organisms, it is necessary to integrate classical and molecular information across diverse data domains. One of the greatest challenges posed by the Plant Genome Initiative will be to make instructive use of the genetic information contained in the large reservoirs of crop plant germplasm that have been accumulated over many decades, and of the historical familiarity of the agricultural community with the performance characteristics, crossing histories and environmental adaptation of crop species. Although sequencing is an attractive activity as part of a Plant Genome Initiative because it can be accomplished rapidly, economically, and in known quantities, it is clear that the ability to extract meaning from sequence information is an equally critical feature of a successful program. A valuable key to deciphering the meaning of biological sequence information lies in humans’ historical familiarity with the plants that underwrite their food, fuel, fiber, and pharmaceutical requirements. Figuring out a way to extract information from this historical repertoire of human knowledge and link it with the tools of modern molecular genetics and bioinformatics represents a unique and fascinating challenge for the Plant Genome Initiative.

Biologists today have the opportunity to study the particulars of what makes an organism uniquely suited to its way of life while at the same time learning through its DNA how it is related to others. In a Plant Genomics Initiative, focusing attention on the diverse members of the grass family would provide a coherent body of information for investigating comparative aspects of chromosome structure, DNA sequence, and, ultimately, gene function in some of our most important agricultural species. Sequencing of the rice genome would provide a useful template for comparisons among grass family members and would also provide the basis for monocot–dicot comparisons, with_Arabidopsis_ sequencing well underway. Agricultural species in general, and grass genomes in particular, offer a wealth of opportunity to explore natural variants of genes that have been selected as useful modifications of basic biochemical pathways and, linked to basic studies of plant growth and development, provide the basis for discovering or generating novel genetic variation for plant improvement. As agricultural scientists begin to look beyond their historically commodity-centered perspectives and contemplate how to address the global demand for food in the 21st century, and biologists around the world look for meaning in raw sequence data, a Plant Genomics Initiative that brings together the skills and expertise of these two communities will present a compelling case for the future of science and technology.

Acknowledgments

Thanks to B. Wilson, M. Sorrells, and S. Tanksley for substantive discussions about comparative genomics and plant improvement and to M. Sorrells, B. Park, S. Cartinhour, and their research groups for collaboration on development of the anchor probes.

Footnotes

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AA231638AA231938).

Eun, M. Y., Hahn, J. H., Cho, Y. G., Lee, M. C., Yoon, U. H., Kwon, K. R., Yi, B. Y. & Chung, T. Y., FAO/IAEA International Symposium on the Use of Induced Mutations and Molecular Techniques for Crop Improvement, June 19–23, 1995, Vienna, Austria.

References