Anne Morgat - Academia.edu (original) (raw)
Papers by Anne Morgat
Nucleic acids research, Jan 2, 2015
MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from... more MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from a number of major resources imported into a common namespace of chemical compounds, reactions, cellular compartments-namely MNXref-and proteins. The MetaNetX.org website (http://www.metanetx.org/) provides access to these integrated data as well as a variety of tools that allow users to import their own GSMNs, map them to the MNXref reconciliation, and manipulate, compare, analyze, simulate (using flux balance analysis) and export the resulting GSMNs. MNXref and MetaNetX are regularly updated and freely available.
Current opinion in drug discovery & development, 2003
The development of genomic and post-genomic technologies has created an explosion in the quantity... more The development of genomic and post-genomic technologies has created an explosion in the quantity, diversity and availability of both biological data and methods of analysis. Biologists are currently facing the problem of using all these resources to convert raw data into new valuable knowledge. This review presents software platforms designed to handle data and/or methods in the context of genome analysis.
Nucleic Acids Research, 2009
Nucleic Acids Research, 2010
The primary mission of UniProt is to support biological research by maintaining a stable, compreh... more The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.
Nucleic Acids Research, 2011
The primary mission of Universal Protein Resource (UniProt) is to support biological research by ... more The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Nucleic Acids Research, 2012
UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representatio... more UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.
Nucleic Acids Research, 2014
Microbiology, 2013
Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is ... more Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is a basic requirement needed by the biology community. In this work new genomic objects have been included (toxin/antitoxin genes and small RNA genes) and the metabolic network has been entirely updated. The curated view of the validated metabolic pathways present in the organism as of 2012 shows several significant differences from pathways present in the other bacterial reference, Escherichia coli: variants in synthesis of cofactors (thiamine, biotin, bacillithiol), amino acids (lysine, methionine), branched-chain fatty acids, tRNA modification and RNA degradation. In this new version, gene products that are enzymes or transporters are explicitly linked to the biochemical reactions of the RHEA reaction resource (http://www.ebi.ac.uk/rhea/), while novel compound entries have been created in the database Chemical Entities of Biological Interest (http://www.ebi.ac.uk/chebi/). The newly annotated sequence is deposited at the International Nucleotide Sequence Data Collaboration with accession number AL009126.4.
Journal of Bacteriology, 2006
Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livesto... more Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa that has been introduced in the Caribbean and is threatening to emerge and spread on the American mainland. We sequenced the complete genomes of two strains of E. ruminantium of differing phenotypes, strains Gardel (Erga; 1,499,920 bp), from the island of Guadeloupe, and Welgevonden (Erwe; 1,512,977 bp), originating in South Africa and maintained in Guadeloupe in a different cell environment. Comparative genomic analysis of these two strains was performed with the recently published parent strain of Erwe (Erwo) and other Rickettsiales (Anaplasma, Wolbachia, and Rickettsia spp.). Gene order is highly conserved between the E. ruminantium strains and with A. marginale. In contrast, there is very little conservation of gene order with members of the Rickettsiaceae. However, gene order may be locally conserved, as illustrated by the tuf operons. Eighteen truncated protein-encoding sequences (CDSs) differentiate Erga from Erwe/Erwo, whereas four other truncated CDSs differentiate Erwe from Erwo. Moreover, E. ruminantium displays the lowest coding ratio observed among bacteria due to unusually long intergenic regions. This is related to an active process of genome expansion/contraction targeted at tandem repeats in noncoding regions and based on the addition or removal of ca. 150-bp tandem units. This process seems to be specific to E. ruminantium and is not observed in the other Rickettsiales.
Infection, Genetics and Evolution, 2008
Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livesto... more Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa introduced in the Caribbean and threatening to emerge and spread in the American mainland. Complete genome sequencing was done for two isolates of E. ruminantium of differing phenotype, isolates Gardel (Erga) from Guadeloupe Island and Welgevonden (Erwe) originating from South Africa and maintained in Guadeloupe. The type strain of E. ruminantium (Erwo), previously isolated and sequenced in South Africa; is identical to Erwe with respect to target genes. They make the Erwe/Erwo complex. Comparative analysis of the genomes shows the presence of 49 unique CDS and 28 truncated CDS differentiating Erga from Erwe/Erwo. Three regions of accumulated differences (RAD) acting as mutational hot spots were identified in E. ruminantium. Ten CDS, six unique CDS and four truncated CDS corresponding to major genomic changes (deletions or extensive mutations) were considered as targets for differential diagnosis on four isolates of E. ruminantium: Erga, Erwe/Erwo, Senegal and Umpala. Pairs of PCR primers were developed for each target gene. PCR analysis of the target genes generated strain-specific patterns on Erga and Erwe/Erwo as predicted by comparative genomics, but also for isolates Senegal and Umpala. The target genes identified by bacterial comparative genomics are shown to be highly efficient for strain-specific PCR diagnosis of E. ruminantium and further vaccine management tools. #
Nucleic acids research, 2005
The Universal Protein Resource (UniProt) provides the scientific community with a single, central... more The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword l...
Briefings in Bioinformatics, 2014
Bioinformatics, 1993
Hydrophobic cluster analysis (HCA) is an efficient method for analysing and comparing the amino a... more Hydrophobic cluster analysis (HCA) is an efficient method for analysing and comparing the amino acid sequences of proteins. It relies on two-dimensional representations of the sequences presently generated by simple plot programs working on microcomputers. Two interactive programs, MANSEK and SUNHCA, are described here that operate from Vax and Sun workstations respectively. These programs allow the display of several protein sequences in the form of two-dimensional helical plots suitable for HCA. Several tedious, repetitive and time-consuming steps of HCA have been suppressed by implementing several features such as interactive on-screen manipulations (zoom, translations) of the plots and HCA score calculations on segments chosen by the user. Plots on paper can be obtained through hard copies or plotting subroutines.
Bioinformatics, 2005
Motivation: Modern comparative genomics does not restrict to sequence but involves the comparison... more Motivation: Modern comparative genomics does not restrict to sequence but involves the comparison of metabolic pathways or protein-protein interactions as well. Central in this approach is the concept of neighbourhood between entities (genes, proteins, chemical compounds). Therefore there is a growing need for new methods aiming at merging the connectivity information from different biological sources in order to infer functional coupling. Results: We present a generic approach to merge the information from two or more graphs representing biological data. The method is based on two concepts. The first one, the correspondence multigraph, precisely defines how correspondence is performed between the primary data-graphs. The second one, the common connected components, defines which property of the multigraph is searched for. Although this problem has already been informally stated in the past few years, we give here a formal and general statement together with an exact algorithm to solve it. Availability: The algorithm presented in this paper has been implemented in C. Source code is freely available for download at:
Annals of the New York Academy of Sciences, 2006
The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of hear... more The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of heartwater in Africa and the Caribbean. Heartwater, responsible for major losses on livestock in Africa represents also a threat for the American mainland. Three complete genomes corresponding to two different groups of differing phenotypes, Gardel and Welgevonden, have been recently described. One genome (Erga) represents the Gardel group from Guadeloupe Island and two genomes (Erwo and Erwe) belong to the Welgevonden group. Erwo, isolated in South Africa, is the parental strain of Erwe, which was maintained for 18 years in Guadeloupe under different culture conditions than Erwo. The three strains display genomes of differing sizes with 1,499,920 bp, 1,512,977 bp, and 1,516,355 bp for Erga, Erwe, and Erwo, respectively. Gene sequences and order are highly conserved between the three strains, although several gene truncations could be pinpointed, most of them occurring within three regions of accumulated differences (RAD). E. ruminantium displays a strong leading/lagging compositional bias inducing a strand-specific codon usage. Finally, a striking feature of E. ruminantium is the presence of long intergenic regions containing 417 418 ANNALS NEW YORK ACADEMY OF SCIENCES tandem repeats. These repeats are at the origin of an active process, specific to E. ruminantium, of genome expansion/contraction based on the addition or removal of tandem units.
Nucleic Acids Research, 2012
Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reacti... more Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reactions. Rhea provides a non-redundant set of chemical transformations for use in a broad spectrum of applications, including metabolic network reconstruction and pathway inference. Rhea includes enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list), transport reactions and spontaneously occurring reactions. Rhea reactions are described using chemical species from the Chemical Entities of Biological Interest ontology (ChEBI) and are stoichiometrically balanced for mass and charge. They are extensively manually curated with links to source literature and other public resources on metabolism including enzyme and pathway databases. This cross-referencing facilitates the mapping and reconciliation of common reactions and compounds between distinct resources, which is a common first step in the reconstruction of genome scale metabolic networks and models.
Nucleic acids research, Jan 2, 2015
MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from... more MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from a number of major resources imported into a common namespace of chemical compounds, reactions, cellular compartments-namely MNXref-and proteins. The MetaNetX.org website (http://www.metanetx.org/) provides access to these integrated data as well as a variety of tools that allow users to import their own GSMNs, map them to the MNXref reconciliation, and manipulate, compare, analyze, simulate (using flux balance analysis) and export the resulting GSMNs. MNXref and MetaNetX are regularly updated and freely available.
Current opinion in drug discovery & development, 2003
The development of genomic and post-genomic technologies has created an explosion in the quantity... more The development of genomic and post-genomic technologies has created an explosion in the quantity, diversity and availability of both biological data and methods of analysis. Biologists are currently facing the problem of using all these resources to convert raw data into new valuable knowledge. This review presents software platforms designed to handle data and/or methods in the context of genome analysis.
Nucleic Acids Research, 2009
Nucleic Acids Research, 2010
The primary mission of UniProt is to support biological research by maintaining a stable, compreh... more The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.
Nucleic Acids Research, 2011
The primary mission of Universal Protein Resource (UniProt) is to support biological research by ... more The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Nucleic Acids Research, 2012
UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representatio... more UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.
Nucleic Acids Research, 2014
Microbiology, 2013
Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is ... more Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is a basic requirement needed by the biology community. In this work new genomic objects have been included (toxin/antitoxin genes and small RNA genes) and the metabolic network has been entirely updated. The curated view of the validated metabolic pathways present in the organism as of 2012 shows several significant differences from pathways present in the other bacterial reference, Escherichia coli: variants in synthesis of cofactors (thiamine, biotin, bacillithiol), amino acids (lysine, methionine), branched-chain fatty acids, tRNA modification and RNA degradation. In this new version, gene products that are enzymes or transporters are explicitly linked to the biochemical reactions of the RHEA reaction resource (http://www.ebi.ac.uk/rhea/), while novel compound entries have been created in the database Chemical Entities of Biological Interest (http://www.ebi.ac.uk/chebi/). The newly annotated sequence is deposited at the International Nucleotide Sequence Data Collaboration with accession number AL009126.4.
Journal of Bacteriology, 2006
Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livesto... more Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa that has been introduced in the Caribbean and is threatening to emerge and spread on the American mainland. We sequenced the complete genomes of two strains of E. ruminantium of differing phenotypes, strains Gardel (Erga; 1,499,920 bp), from the island of Guadeloupe, and Welgevonden (Erwe; 1,512,977 bp), originating in South Africa and maintained in Guadeloupe in a different cell environment. Comparative genomic analysis of these two strains was performed with the recently published parent strain of Erwe (Erwo) and other Rickettsiales (Anaplasma, Wolbachia, and Rickettsia spp.). Gene order is highly conserved between the E. ruminantium strains and with A. marginale. In contrast, there is very little conservation of gene order with members of the Rickettsiaceae. However, gene order may be locally conserved, as illustrated by the tuf operons. Eighteen truncated protein-encoding sequences (CDSs) differentiate Erga from Erwe/Erwo, whereas four other truncated CDSs differentiate Erwe from Erwo. Moreover, E. ruminantium displays the lowest coding ratio observed among bacteria due to unusually long intergenic regions. This is related to an active process of genome expansion/contraction targeted at tandem repeats in noncoding regions and based on the addition or removal of ca. 150-bp tandem units. This process seems to be specific to E. ruminantium and is not observed in the other Rickettsiales.
Infection, Genetics and Evolution, 2008
Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livesto... more Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa introduced in the Caribbean and threatening to emerge and spread in the American mainland. Complete genome sequencing was done for two isolates of E. ruminantium of differing phenotype, isolates Gardel (Erga) from Guadeloupe Island and Welgevonden (Erwe) originating from South Africa and maintained in Guadeloupe. The type strain of E. ruminantium (Erwo), previously isolated and sequenced in South Africa; is identical to Erwe with respect to target genes. They make the Erwe/Erwo complex. Comparative analysis of the genomes shows the presence of 49 unique CDS and 28 truncated CDS differentiating Erga from Erwe/Erwo. Three regions of accumulated differences (RAD) acting as mutational hot spots were identified in E. ruminantium. Ten CDS, six unique CDS and four truncated CDS corresponding to major genomic changes (deletions or extensive mutations) were considered as targets for differential diagnosis on four isolates of E. ruminantium: Erga, Erwe/Erwo, Senegal and Umpala. Pairs of PCR primers were developed for each target gene. PCR analysis of the target genes generated strain-specific patterns on Erga and Erwe/Erwo as predicted by comparative genomics, but also for isolates Senegal and Umpala. The target genes identified by bacterial comparative genomics are shown to be highly efficient for strain-specific PCR diagnosis of E. ruminantium and further vaccine management tools. #
Nucleic acids research, 2005
The Universal Protein Resource (UniProt) provides the scientific community with a single, central... more The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword l...
Briefings in Bioinformatics, 2014
Bioinformatics, 1993
Hydrophobic cluster analysis (HCA) is an efficient method for analysing and comparing the amino a... more Hydrophobic cluster analysis (HCA) is an efficient method for analysing and comparing the amino acid sequences of proteins. It relies on two-dimensional representations of the sequences presently generated by simple plot programs working on microcomputers. Two interactive programs, MANSEK and SUNHCA, are described here that operate from Vax and Sun workstations respectively. These programs allow the display of several protein sequences in the form of two-dimensional helical plots suitable for HCA. Several tedious, repetitive and time-consuming steps of HCA have been suppressed by implementing several features such as interactive on-screen manipulations (zoom, translations) of the plots and HCA score calculations on segments chosen by the user. Plots on paper can be obtained through hard copies or plotting subroutines.
Bioinformatics, 2005
Motivation: Modern comparative genomics does not restrict to sequence but involves the comparison... more Motivation: Modern comparative genomics does not restrict to sequence but involves the comparison of metabolic pathways or protein-protein interactions as well. Central in this approach is the concept of neighbourhood between entities (genes, proteins, chemical compounds). Therefore there is a growing need for new methods aiming at merging the connectivity information from different biological sources in order to infer functional coupling. Results: We present a generic approach to merge the information from two or more graphs representing biological data. The method is based on two concepts. The first one, the correspondence multigraph, precisely defines how correspondence is performed between the primary data-graphs. The second one, the common connected components, defines which property of the multigraph is searched for. Although this problem has already been informally stated in the past few years, we give here a formal and general statement together with an exact algorithm to solve it. Availability: The algorithm presented in this paper has been implemented in C. Source code is freely available for download at:
Annals of the New York Academy of Sciences, 2006
The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of hear... more The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of heartwater in Africa and the Caribbean. Heartwater, responsible for major losses on livestock in Africa represents also a threat for the American mainland. Three complete genomes corresponding to two different groups of differing phenotypes, Gardel and Welgevonden, have been recently described. One genome (Erga) represents the Gardel group from Guadeloupe Island and two genomes (Erwo and Erwe) belong to the Welgevonden group. Erwo, isolated in South Africa, is the parental strain of Erwe, which was maintained for 18 years in Guadeloupe under different culture conditions than Erwo. The three strains display genomes of differing sizes with 1,499,920 bp, 1,512,977 bp, and 1,516,355 bp for Erga, Erwe, and Erwo, respectively. Gene sequences and order are highly conserved between the three strains, although several gene truncations could be pinpointed, most of them occurring within three regions of accumulated differences (RAD). E. ruminantium displays a strong leading/lagging compositional bias inducing a strand-specific codon usage. Finally, a striking feature of E. ruminantium is the presence of long intergenic regions containing 417 418 ANNALS NEW YORK ACADEMY OF SCIENCES tandem repeats. These repeats are at the origin of an active process, specific to E. ruminantium, of genome expansion/contraction based on the addition or removal of tandem units.
Nucleic Acids Research, 2012
Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reacti... more Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reactions. Rhea provides a non-redundant set of chemical transformations for use in a broad spectrum of applications, including metabolic network reconstruction and pathway inference. Rhea includes enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list), transport reactions and spontaneously occurring reactions. Rhea reactions are described using chemical species from the Chemical Entities of Biological Interest ontology (ChEBI) and are stoichiometrically balanced for mass and charge. They are extensively manually curated with links to source literature and other public resources on metabolism including enzyme and pathway databases. This cross-referencing facilitates the mapping and reconciliation of common reactions and compounds between distinct resources, which is a common first step in the reconstruction of genome scale metabolic networks and models.