Evgenia Kriventseva - Academia.edu (original) (raw)
Papers by Evgenia Kriventseva
Nucleic acids research, 2008
The concept of orthology is widely used to relate genes across different species using comparativ... more The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthodb.
Genome research, 2005
Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria... more Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria Anopheles gambiae structured into the AnoEST database. The expressed sequences are grouped into clusters using genomic sequence as template and associated with inferred functional annotation, including the following: corresponding Ensembl gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups. AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, we have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, we found that clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology ...
Trends in genetics : TIG, 2003
A large-scale analysis of protein isoforms arising from alternative splicing shows that alternati... more A large-scale analysis of protein isoforms arising from alternative splicing shows that alternative splicing tends to insert or delete complete protein domains more frequently than expected by chance, whereas disruption of domains and other structural modules is less frequent. If domain regions are disrupted, the functional effect, as predicted from 3D structure, is frequently equivalent to removal of the entire domain. Also, short alternative splicing events within domains, which might preserve folded structure, target functional residues more frequently than expected. Thus, it seems that positive selection has had a major role in the evolution of alternative splicing.
The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation d... more The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes.
Nucleic Acids Research, 2001
The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classificati... more The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which
Science, 2007
The Liverpool (LVP) strain of Ae. aegypti is one of the most commonly used laboratory strains aro... more The Liverpool (LVP) strain of Ae. aegypti is one of the most commonly used laboratory strains around the world and has often been employed for studies of mosquito physiology, genetics and vector competence. It has also been used to build genomic DNA and cDNA library resources. LVP is tolerant to intense inbreeding while maintaining relevant phenotypes. The strain originated from West Africa and was maintained at the Liverpool School of Tropical Medicine starting in 1936, and was selected for susceptibility to the filarioid nematode Brugia malayi (1). Subsequently, a substrain LVP sbm was selected for greater susceptibility by single pair breeding and showed greatly reduced DNA polymorphism across the entire genome (2, 3).
Science, 2009
second model, the two main conditions were parametrically modulated by the two categories, respec... more second model, the two main conditions were parametrically modulated by the two categories, respectively (SOM, S5.1). The activation of the precuneus was higher for hard dominance-solvable games than for easy ones ( and table S10). The activation of the insula was higher for the highly focal coordination games than for less focal ones . Previous studies also found that precuneus activity increased when the number of planned moves increased (40, 41). The higher demand for memory-related imagery and memory retrieval may explain the greater precuneus activation in hard dominance-solvable games. In highly focal coordination games, the participants may have felt quite strongly that the pool students must notice the same salient feature. This may explain why insula activation correlates with NCI.
Proceedings of the National Academy of Sciences, 2010
Nucleic Acids Research, 2001
The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resou... more The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http:// www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31-67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
Nucleic Acids Research, 2003
The Proteome Analysis database (http://www.ebi.ac. uk/proteome/) has been developed by the Sequen... more The Proteome Analysis database (http://www.ebi.ac. uk/proteome/) has been developed by the Sequence Database Group at EBI utilizing existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archeae and eukaryotes. Three main projects are used, InterPro, CluSTr and GO Slim, to give an overview on families, domains, sites, and functions of the
Nucleic Acids Research, 2009
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of po... more MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature~22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige. ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
Nucleic Acids Research, 2011
The concept of homology drives speculation on a gene's function in any given species when its bio... more The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from
Nature, Jan 24, 2008
Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organ... more Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets...
Briefings in Bioinformatics, 2002
The applications of InterPro span a range of biologically important areas that includes automatic... more The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase
Nucleic acids research, 2008
The concept of orthology is widely used to relate genes across different species using comparativ... more The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthodb.
Genome research, 2005
Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria... more Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria Anopheles gambiae structured into the AnoEST database. The expressed sequences are grouped into clusters using genomic sequence as template and associated with inferred functional annotation, including the following: corresponding Ensembl gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups. AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, we have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, we found that clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology ...
Trends in genetics : TIG, 2003
A large-scale analysis of protein isoforms arising from alternative splicing shows that alternati... more A large-scale analysis of protein isoforms arising from alternative splicing shows that alternative splicing tends to insert or delete complete protein domains more frequently than expected by chance, whereas disruption of domains and other structural modules is less frequent. If domain regions are disrupted, the functional effect, as predicted from 3D structure, is frequently equivalent to removal of the entire domain. Also, short alternative splicing events within domains, which might preserve folded structure, target functional residues more frequently than expected. Thus, it seems that positive selection has had a major role in the evolution of alternative splicing.
The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation d... more The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes.
Nucleic Acids Research, 2001
The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classificati... more The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which
Science, 2007
The Liverpool (LVP) strain of Ae. aegypti is one of the most commonly used laboratory strains aro... more The Liverpool (LVP) strain of Ae. aegypti is one of the most commonly used laboratory strains around the world and has often been employed for studies of mosquito physiology, genetics and vector competence. It has also been used to build genomic DNA and cDNA library resources. LVP is tolerant to intense inbreeding while maintaining relevant phenotypes. The strain originated from West Africa and was maintained at the Liverpool School of Tropical Medicine starting in 1936, and was selected for susceptibility to the filarioid nematode Brugia malayi (1). Subsequently, a substrain LVP sbm was selected for greater susceptibility by single pair breeding and showed greatly reduced DNA polymorphism across the entire genome (2, 3).
Science, 2009
second model, the two main conditions were parametrically modulated by the two categories, respec... more second model, the two main conditions were parametrically modulated by the two categories, respectively (SOM, S5.1). The activation of the precuneus was higher for hard dominance-solvable games than for easy ones ( and table S10). The activation of the insula was higher for the highly focal coordination games than for less focal ones . Previous studies also found that precuneus activity increased when the number of planned moves increased (40, 41). The higher demand for memory-related imagery and memory retrieval may explain the greater precuneus activation in hard dominance-solvable games. In highly focal coordination games, the participants may have felt quite strongly that the pool students must notice the same salient feature. This may explain why insula activation correlates with NCI.
Proceedings of the National Academy of Sciences, 2010
Nucleic Acids Research, 2001
The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resou... more The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http:// www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31-67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
Nucleic Acids Research, 2003
The Proteome Analysis database (http://www.ebi.ac. uk/proteome/) has been developed by the Sequen... more The Proteome Analysis database (http://www.ebi.ac. uk/proteome/) has been developed by the Sequence Database Group at EBI utilizing existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archeae and eukaryotes. Three main projects are used, InterPro, CluSTr and GO Slim, to give an overview on families, domains, sites, and functions of the
Nucleic Acids Research, 2009
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of po... more MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature~22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige. ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
Nucleic Acids Research, 2011
The concept of homology drives speculation on a gene's function in any given species when its bio... more The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from
Nature, Jan 24, 2008
Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organ... more Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets...
Briefings in Bioinformatics, 2002
The applications of InterPro span a range of biologically important areas that includes automatic... more The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase