Jean-François Taly | Center for Genomic Regulation (original) (raw)

Uploads

Papers by Jean-François Taly

Research paper thumbnail of ABC methods for model choice in Gibbs random fields Application to protein 3D structure prediction

Gibbs random fields are polymorphous statistical models that can be used to analyse different typ... more Gibbs random fields are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood. In particular, from a Bayesian perspective, the computation of the posterior probabilities of the models under competition requires special likelihood-free simulation techniques like the Approximate Bayesian Computation (ABC) algorithm that is intensively used in population Genetics. We show in this paper how to implement an ABC algorithm geared towards model choice in the general setting of Gibbs random fields, demonstrating in particular that there exists a sufficient statistic across models. The accuracy of the approximation to the posterior probabilities can be further improved by importance sampling on the distribution of the models. The practical aspects of the method are detailed through two applications, the test of an iid Bernoulli model versus a first-order Markov chain, and the choice of a folding structure for a protein of Thermotoga maritima implicated into signal transduction processes.

Research paper thumbnail of Large-Scale Screening of a Targeted Enterococcus faecalis Mutant Library Identifies Envelope Fitness Factors

PLOS One, 2011

Spread of antibiotic resistance among bacteria responsible for nosocomial and community-acquired ... more Spread of antibiotic resistance among bacteria responsible for nosocomial and community-acquired infections urges for novel therapeutic or prophylactic targets and for innovative pathogen-specific antibacterial compounds. Major challenges are posed by opportunistic pathogens belonging to the low GC% Gram-positive bacteria. Among those, Enterococcus faecalis is a leading cause of hospital-acquired infections associated with life-threatening issues and increased hospital costs. To better understand the molecular properties of enterococci that may be required for virulence, and that may explain the emergence of these bacteria in nosocomial infections, we performed the first large-scale functional analysis of E. faecalis V583, the first vancomycin-resistant isolate from a human bloodstream infection. E. faecalis V583 is within the high-risk clonal complex 2 group, which comprises mostly isolates derived from hospital infections worldwide. We conducted broadrange screenings of candidate genes likely involved in host adaptation (e.g., colonization and/or virulence). For this purpose, a library was constructed of targeted insertion mutations in 177 genes encoding putative surface or stress-response factors. Individual mutants were subsequently tested for their i) resistance to oxidative stress, ii) antibiotic resistance, iii) resistance to opsonophagocytosis, iv) adherence to the human colon carcinoma Caco-2 epithelial cells and v) virulence in a surrogate insect model. Our results identified a number of factors that are involved in the interaction between enterococci and their host environments. Their predicted functions highlight the importance of cell envelope glycopolymers in E. faecalis host adaptation. This study provides a valuable genetic database for understanding the steps leading E. faecalis to opportunistic virulence.

Research paper thumbnail of Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Nature Protocols, 2011

t-coffee (tree-based consistency objective function for alignment evaluation) is a versatile mult... more t-coffee (tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (Msa) method suitable for aligning most types of biological sequences. the main strength of t-coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building Msas. the series of protocols presented here show how the package can be used to multiply align proteins, rna and Dna sequences. the protein section shows how users can select the most suitable t-coffee mode for their data set. Detailed protocols include t-coffee, the default mode, M-coffee, a meta version able to combine several third party aligners into one, psI (position-specific iterated)-coffee, the homology extended mode suitable for remote homologs and expresso, the structure-based multiple aligner. We then also show how the t-rMsD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. rna alignment procedures are described for using r-coffee, a mode able to use predicted rna secondary structures when aligning rna sequences. Dna alignments are illustrated with pro-coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with t-coffee. the package is an open-source freeware available from http://www.tcoffee.org/.

Research paper thumbnail of ABC likelihood-free methods for model choice in Gibbs random fields

Gibbs random fields (GRF) are polymorphous statistical models that can be used to analyse differe... more Gibbs random fields (GRF) are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood.

Research paper thumbnail of Protein secondary structure assignment revisited: a detailed analysis of different assignment methods

BMC Structural Biology, 2005

Background A number of methods are now available to perform automatic assignment of periodic seco... more Background A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure. Methods To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures). Results A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior. Conclusion Our method provides valuable assignments which favor the regularity of secondary structure segments.

Research paper thumbnail of Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models

BMC Bioinformatics, 2008

Background: Recent approaches for predicting the three-dimensional (3D) structure of proteins suc... more Background: Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures.

Research paper thumbnail of Cyclosporin A Treatment of Leishmania donovani Reveals Stage-Specific Functions of Cyclophilins in Parasite Proliferation and Viability

PLOS Neglected Tropical Diseases, 2010

Background: Cyclosporin A (CsA) has important anti-microbial activity against parasites of the ge... more Background: Cyclosporin A (CsA) has important anti-microbial activity against parasites of the genus Leishmania, suggesting CsA-binding cyclophilins (CyPs) as potential drug targets. However, no information is available on the genetic diversity of this important protein family, and the mechanisms underlying the cytotoxic effects of CsA on intracellular amastigotes are only poorly understood. Here, we performed a first genome-wide analysis of Leishmania CyPs and investigated the effects of CsA on host-free L. donovani amastigotes in order to elucidate the relevance of these parasite proteins for drug development.

Research paper thumbnail of ABC methods for model choice in Gibbs random fields Application to protein 3D structure prediction

Gibbs random fields are polymorphous statistical models that can be used to analyse different typ... more Gibbs random fields are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood. In particular, from a Bayesian perspective, the computation of the posterior probabilities of the models under competition requires special likelihood-free simulation techniques like the Approximate Bayesian Computation (ABC) algorithm that is intensively used in population Genetics. We show in this paper how to implement an ABC algorithm geared towards model choice in the general setting of Gibbs random fields, demonstrating in particular that there exists a sufficient statistic across models. The accuracy of the approximation to the posterior probabilities can be further improved by importance sampling on the distribution of the models. The practical aspects of the method are detailed through two applications, the test of an iid Bernoulli model versus a first-order Markov chain, and the choice of a folding structure for a protein of Thermotoga maritima implicated into signal transduction processes.

Research paper thumbnail of Large-Scale Screening of a Targeted Enterococcus faecalis Mutant Library Identifies Envelope Fitness Factors

PLOS One, 2011

Spread of antibiotic resistance among bacteria responsible for nosocomial and community-acquired ... more Spread of antibiotic resistance among bacteria responsible for nosocomial and community-acquired infections urges for novel therapeutic or prophylactic targets and for innovative pathogen-specific antibacterial compounds. Major challenges are posed by opportunistic pathogens belonging to the low GC% Gram-positive bacteria. Among those, Enterococcus faecalis is a leading cause of hospital-acquired infections associated with life-threatening issues and increased hospital costs. To better understand the molecular properties of enterococci that may be required for virulence, and that may explain the emergence of these bacteria in nosocomial infections, we performed the first large-scale functional analysis of E. faecalis V583, the first vancomycin-resistant isolate from a human bloodstream infection. E. faecalis V583 is within the high-risk clonal complex 2 group, which comprises mostly isolates derived from hospital infections worldwide. We conducted broadrange screenings of candidate genes likely involved in host adaptation (e.g., colonization and/or virulence). For this purpose, a library was constructed of targeted insertion mutations in 177 genes encoding putative surface or stress-response factors. Individual mutants were subsequently tested for their i) resistance to oxidative stress, ii) antibiotic resistance, iii) resistance to opsonophagocytosis, iv) adherence to the human colon carcinoma Caco-2 epithelial cells and v) virulence in a surrogate insect model. Our results identified a number of factors that are involved in the interaction between enterococci and their host environments. Their predicted functions highlight the importance of cell envelope glycopolymers in E. faecalis host adaptation. This study provides a valuable genetic database for understanding the steps leading E. faecalis to opportunistic virulence.

Research paper thumbnail of Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Nature Protocols, 2011

t-coffee (tree-based consistency objective function for alignment evaluation) is a versatile mult... more t-coffee (tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (Msa) method suitable for aligning most types of biological sequences. the main strength of t-coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building Msas. the series of protocols presented here show how the package can be used to multiply align proteins, rna and Dna sequences. the protein section shows how users can select the most suitable t-coffee mode for their data set. Detailed protocols include t-coffee, the default mode, M-coffee, a meta version able to combine several third party aligners into one, psI (position-specific iterated)-coffee, the homology extended mode suitable for remote homologs and expresso, the structure-based multiple aligner. We then also show how the t-rMsD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. rna alignment procedures are described for using r-coffee, a mode able to use predicted rna secondary structures when aligning rna sequences. Dna alignments are illustrated with pro-coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with t-coffee. the package is an open-source freeware available from http://www.tcoffee.org/.

Research paper thumbnail of ABC likelihood-free methods for model choice in Gibbs random fields

Gibbs random fields (GRF) are polymorphous statistical models that can be used to analyse differe... more Gibbs random fields (GRF) are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood.

Research paper thumbnail of Protein secondary structure assignment revisited: a detailed analysis of different assignment methods

BMC Structural Biology, 2005

Background A number of methods are now available to perform automatic assignment of periodic seco... more Background A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure. Methods To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures). Results A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior. Conclusion Our method provides valuable assignments which favor the regularity of secondary structure segments.

Research paper thumbnail of Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models

BMC Bioinformatics, 2008

Background: Recent approaches for predicting the three-dimensional (3D) structure of proteins suc... more Background: Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures.

Research paper thumbnail of Cyclosporin A Treatment of Leishmania donovani Reveals Stage-Specific Functions of Cyclophilins in Parasite Proliferation and Viability

PLOS Neglected Tropical Diseases, 2010

Background: Cyclosporin A (CsA) has important anti-microbial activity against parasites of the ge... more Background: Cyclosporin A (CsA) has important anti-microbial activity against parasites of the genus Leishmania, suggesting CsA-binding cyclophilins (CyPs) as potential drug targets. However, no information is available on the genetic diversity of this important protein family, and the mechanisms underlying the cytotoxic effects of CsA on intracellular amastigotes are only poorly understood. Here, we performed a first genome-wide analysis of Leishmania CyPs and investigated the effects of CsA on host-free L. donovani amastigotes in order to elucidate the relevance of these parasite proteins for drug development.