Comparative genomics of the eukaryotes - PubMed (original) (raw)
Comparative Study
. 2000 Mar 24;287(5461):2204-15.
doi: 10.1126/science.287.5461.2204.
M D Yandell, J R Wortman, G L Gabor Miklos, C R Nelson, I K Hariharan, M E Fortini, P W Li, R Apweiler, W Fleischmann, J M Cherry, S Henikoff, M P Skupski, S Misra, M Ashburner, E Birney, M S Boguski, T Brody, P Brokstein, S E Celniker, S A Chervitz, D Coates, A Cravchik, A Gabrielian, R F Galle, W M Gelbart, R A George, L S Goldstein, F Gong, P Guan, N L Harris, B A Hay, R A Hoskins, J Li, Z Li, R O Hynes, S J Jones, P M Kuehl, B Lemaitre, J T Littleton, D K Morrison, C Mungall, P H O'Farrell, O K Pickeral, C Shue, L B Vosshall, J Zhang, Q Zhao, X H Zheng, S Lewis
Affiliations
- PMID: 10731134
- PMCID: PMC2754258
- DOI: 10.1126/science.287.5461.2204
Comparative Study
Comparative genomics of the eukaryotes
G M Rubin et al. Science. 2000.
Abstract
A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
Figures
Fig. 1
Fly (F), worm (W), and yeast (Y) genes showing similarity to human disease genes. This collection of human disease genes was selected to represent a cross section of human pathophysiology and is not comprehensive. The selection criteria require that the gene is actually mutated, altered, amplified, or deleted in a human disease, as opposed to having a function deduced from experiments on model organisms or in cell culture. Due to redundancy in gene and protein sequence databases, a single reference sequence for each gene had to be chosen. Most reference sequences represent the longest mRNA of several alternatives in GenBank. Authoritative sources in the literature and electronic databases [Online Mendelian Inheritance in Man (OMIM)] were also consulted. In all, 289 protein sequences met these criteria. These were used as queries to search a database consisting of the sum total of gene products (38,860) found in the complete genomes of fly, worm, and yeast. 12,953 was used as the effective database size (the z parameter in BLAST). BLASTP searches were conducted as described for full genome searches, except for the z parameter. To control for potential frameshift errors in the Drosophila genome sequence, searches against a six-frame translation of the entire genome (using TBLASTN) were also conducted with the disease gene sequences using the z parameter above. Only two cases in which matches to genomic sequence were better than to the predicted protein were found, and these were manually corrected to reflect the better TBLASTN scores in the table. Results are scaled according to various levels of statistical significance, reflecting a level of confidence in either evolutionary homology or functional similarity. White boxes represent BLAST E values >1 × 10−6, indicating no or weak similarity; light blue boxes represent E values in the range of 1 × 10−6 to 1 × 10−40; purple boxes represent E values in the range of 1 × 10−40 to 1 × 10−100; and dark blue boxes represent E values <1 × 10−100, indicating the highest degree of sequence conservation. Actual E values can be found in the Web supplement to this figure (62), where links to OMIM and GenBank may also be found. A plus sign indicates our best estimate that the corresponding Drosophila gene product is the functional equivalent of the human protein, based on degree of sequence similarity, InterPro domain composition, and supporting biological evidence, when available. A minus sign indicates that we were unable to identify a likely functional equivalent of the human protein.
Fig. 1
Fly (F), worm (W), and yeast (Y) genes showing similarity to human disease genes. This collection of human disease genes was selected to represent a cross section of human pathophysiology and is not comprehensive. The selection criteria require that the gene is actually mutated, altered, amplified, or deleted in a human disease, as opposed to having a function deduced from experiments on model organisms or in cell culture. Due to redundancy in gene and protein sequence databases, a single reference sequence for each gene had to be chosen. Most reference sequences represent the longest mRNA of several alternatives in GenBank. Authoritative sources in the literature and electronic databases [Online Mendelian Inheritance in Man (OMIM)] were also consulted. In all, 289 protein sequences met these criteria. These were used as queries to search a database consisting of the sum total of gene products (38,860) found in the complete genomes of fly, worm, and yeast. 12,953 was used as the effective database size (the z parameter in BLAST). BLASTP searches were conducted as described for full genome searches, except for the z parameter. To control for potential frameshift errors in the Drosophila genome sequence, searches against a six-frame translation of the entire genome (using TBLASTN) were also conducted with the disease gene sequences using the z parameter above. Only two cases in which matches to genomic sequence were better than to the predicted protein were found, and these were manually corrected to reflect the better TBLASTN scores in the table. Results are scaled according to various levels of statistical significance, reflecting a level of confidence in either evolutionary homology or functional similarity. White boxes represent BLAST E values >1 × 10−6, indicating no or weak similarity; light blue boxes represent E values in the range of 1 × 10−6 to 1 × 10−40; purple boxes represent E values in the range of 1 × 10−40 to 1 × 10−100; and dark blue boxes represent E values <1 × 10−100, indicating the highest degree of sequence conservation. Actual E values can be found in the Web supplement to this figure (62), where links to OMIM and GenBank may also be found. A plus sign indicates our best estimate that the corresponding Drosophila gene product is the functional equivalent of the human protein, based on degree of sequence similarity, InterPro domain composition, and supporting biological evidence, when available. A minus sign indicates that we were unable to identify a likely functional equivalent of the human protein.
References
- Fleischman RD, et al. Science. 1995;269:496. - PubMed
- C. elegans data were taken from A C. Elegans Database (ACEDB) release WS8.
- Local gene duplications were determined by searching for N similar genes within 2_N_ genes on each arm. For example, if three similar genes are found within a region containing six genes, this counts as one cluster of three genes. Genes were judged to be similar if a BLASTP High Scoring Pair (HSP) with a score of 200 or more existed between them. Histone gene clusters were not included. C. elegans data were taken from ACEDB release WS8, containing 18,424 genes.
- More information about GO is available at http://www.geneontology.org/. The Gene Ontology project provides terms for categorizing gene products on the basis of their molecular function, biological role, and cellular location using controlled vocabularies.
Publication types
MeSH terms
Substances
Grants and funding
- P50HG00750/HG/NHGRI NIH HHS/United States
- R01 GM037193/GM/NIGMS NIH HHS/United States
- R01 GM060988/GM/NIGMS NIH HHS/United States
- P4IHG00739/HG/NHGRI NIH HHS/United States
- R01 NS040296/NS/NINDS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases