Morten Nielsen | Technical University of Denmark (DTU) (original) (raw)
Papers by Morten Nielsen
The concept of traces has been introduced for describing non-sequential behaviour of concurrent s... more The concept of traces has been introduced for describing non-sequential behaviour of concurrent systems via its sequential observations. Traces represent concurrent processes in the same way as strings represent sequential ones. The theory of traces can be used as a tool for reasoning about nets and it is hoped that applying this theory one can get a calculus of the concurrent processes anologous to that available for sequential systems. The following topics will be discussed: algebraic properties of traces, trace models of some concurrency phenomena, fixed-point calculus for finding the behaviour of nets, modularity, and some applications of the presented theory.
Applied and Computational Harmonic Analysis, 2007
The purpose of this paper is to study sparse representations of signals from a general dictionary... more The purpose of this paper is to study sparse representations of signals from a general dictionary in a Banach space. For so-called localized frames in Hilbert spaces, the canonical frame coefficients are shown to provide a near sparsest expansion for several sparseness measures. However, for frames which are not localized, this no longer holds true and sparse representations may depend strongly on the choice of the sparseness measure. A large class of admissible sparseness measures is introduced, and we give sufficient conditions for having a unique sparse representation of a signal from the dictionary w.r.t. such a sparseness measure. Moreover, we give sufficient conditions on a signal such that the simple solution of a linear programming problem simultaneously solves all the non-convex (and generally hard combinatorial) problems of sparsest representation of the signal w.r.t. arbitrary admissible sparseness measures.
IEEE Transactions on Information Theory, 2003
R EFERENCES [1] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992. [2] K. Grö... more R EFERENCES [1] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992. [2] K. Gröchenig, Foundations of Time-Frequency Analysis. Boston, MA: Birkhäuser, 2001. [3] A. Ron and Z. Shen, Weyl-Heisenberg frames and Riesz bases in L (R ) , Duke Math. ...
IEEE Transactions on Information Theory, 2003
Protein Science, 2006
Discovery of discontinuous B-cell epitopes is a major challenge in vaccine design. Previous epito... more Discovery of discontinuous B-cell epitopes is a major challenge in vaccine design. Previous epitope prediction methods have mostly been based on protein sequences and are not very effective. Here, we present DiscoTope, a novel method for discontinuous epitope prediction that uses protein threedimensional structural data. The method is based on amino acid statistics, spatial information, and surface accessibility in a compiled data set of discontinuous epitopes determined by X-ray crystallography of antibody/antigen protein complexes. DiscoTope is the first method to focus explicitly on discontinuous epitopes. We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15.5% of residues located in discontinuous epitopes with a specificity of 95%. At this level of specificity, the conventional Parker hydrophilicity scale for predicting linear B-cell epitopes identifies only 11.0% of residues located in discontinuous epitopes. Predictions by the DiscoTope method can guide experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification. ; fax: 45-4593-1585.
PLOS Computational Biology, 2006
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T ly... more Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools. Citation: Peters B, Bui HH, Frankild S, Nielsen M, Lundegaard C, et al. (2006) A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2(6): e65.
Immunogenetics, 2004
Major histocompatibility complex (MHC) proteins are encoded by extremely polymorphic genes and pl... more Major histocompatibility complex (MHC) proteins are encoded by extremely polymorphic genes and play a crucial role in immunity. However, not all genetically different MHC molecules are functionally different. Sette and Sidney (1999) have defined nine HLA class I supertypes and showed that with only nine main functional binding specificities it is possible to cover the binding properties of almost all known HLA class I molecules. Here we present a comprehensive study of the functional relationship between all HLA molecules with known specificities in a uniform and automated way. We have developed a novel method for clustering sequence motifs. We construct hidden Markov models for HLA class I molecules using a Gibbs sampling procedure and use the similarities among these to define clusters of specificities. These clusters are extensions of the previously suggested ones. We suggest splitting some of the alleles in the A1 supertype into a new A26 supertype, and some of the alleles in the B27 supertype into a new B39 supertype. Furthermore the B8 alleles may define their own supertype. We also use the published specificities for a number of HLA-DR types to define clusters with similar specificities. We report that the previously observed specificities of these class II molecules can be clustered into nine classes, which only partly correspond to the serological classification. We show that classification of HLA molecules may be done in a uniform and automated way. The definition of clusters allows for selection of representative HLA molecules that can cover the HLA specificity space better. This makes it possible to target most of the known HLA alleles with known specificities using only a few peptides, and may be used in construction of vaccines. Supplementary material is available at http://www.cbs.dtu.dk/researchgroups/immunology/supertypes.html.
Bioinformatics/computer Applications in The Biosciences, 2004
Motivation: Prediction of which peptides will bind a specific major histocompatibility complex (M... more Motivation: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs.The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. Results: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a largescale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived * To whom correspondence should be addressed.
Protein Science, 2003
In this paper we describe an improved neural network method to predict T-cell class I epitopes. A... more In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.
Proteins-structure Function and Bioinformatics, 2000
Secondary structure prediction involving up to 800 neural network predictions has been developed,... more Secondary structure prediction involving up to 800 neural network predictions has been developed, by use of novel methods such as output expansion and a unique balloting procedure. An overall performance of 77.2%-80.2% (77.9%-80.6% mean per-chain) for three-state (helix, strand, coil) prediction was obtained when evaluated on a commonly used set of 126 protein chains. The method uses profiles made by position-specific scoring matrices as input, while at the output level it predicts on three consecutive residues simultaneously. The predictions arise from tenfold, cross validated training and testing of 1032 protein sequences, using a scheme with primary structure neural networks followed by structure filtering neural networks. With respect to blind prediction, this work is preliminary and awaits evaluation by CASP4. Proteins 2000;41:17-20.
European Journal of Immunology, 2005
Reverse immunogenetic approaches attempt to optimize the selection of candidate epitopes, and thu... more Reverse immunogenetic approaches attempt to optimize the selection of candidate epitopes, and thus minimize the experimental effort needed to identify new epitopes. When predicting cytotoxic T cell epitopes, the main focus has been on the highly specific MHC class I binding event. Methods have also been developed for predicting the antigen-processing steps preceding MHC class I binding, including proteasomal cleavage and transporter associated with antigen processing (TAP) transport efficiency. Here, we use a dataset obtained from the SYFPEITHI database to show that a method integrating predictions of MHC class I binding affinity, TAP transport efficiency, and Cterminal proteasomal cleavage outperforms any of the individual methods. Using an independent evaluation dataset of HIV epitopes from the Los Alamos database, the validity of the integrated method is confirmed. The performance of the integrated method is found to be significantly higher than that of the two publicly available prediction methods BIMAS and SYFPEITHI. To identify 85% of the epitopes in the HIV dataset, 9% and 10% of all possible nonamers in the HIV proteins must be tested when using the BIMAS and SYFPEITHI methods, respectively, for the selection of candidate epitopes. This number is reduced to 7% when using the integrated method. In practical terms, this means that the experimental effort needed to identify an epitope in a hypothetical protein with 85% probability is reduced by 20-30% when using the integrated method. The method is available at Abbreviations: ANN: artificial neural network Á AUC: area under the ROC curve Á TAP: transporter associated with antigen processing Eur. J. Immunol. 2005. 35: 2295-2303 Antigen processing a) The 12 supertypes and the alleles are classified as belonging to each supertype [20]. b) The number of unique nonamers in each group for which it was possible to locate a source protein in the SwissProt database. c) The number of nonamers not included in training of NetMHC and NetChop C-term 2.0/3.0 predictors. The last three columns summarize which NetMHC, BIMAS [22] and SYFPEITHI [23] methods are used to represent each supertype.
The concept of traces has been introduced for describing non-sequential behaviour of concurrent s... more The concept of traces has been introduced for describing non-sequential behaviour of concurrent systems via its sequential observations. Traces represent concurrent processes in the same way as strings represent sequential ones. The theory of traces can be used as a tool for reasoning about nets and it is hoped that applying this theory one can get a calculus of the concurrent processes anologous to that available for sequential systems. The following topics will be discussed: algebraic properties of traces, trace models of some concurrency phenomena, fixed-point calculus for finding the behaviour of nets, modularity, and some applications of the presented theory.
Applied and Computational Harmonic Analysis, 2007
The purpose of this paper is to study sparse representations of signals from a general dictionary... more The purpose of this paper is to study sparse representations of signals from a general dictionary in a Banach space. For so-called localized frames in Hilbert spaces, the canonical frame coefficients are shown to provide a near sparsest expansion for several sparseness measures. However, for frames which are not localized, this no longer holds true and sparse representations may depend strongly on the choice of the sparseness measure. A large class of admissible sparseness measures is introduced, and we give sufficient conditions for having a unique sparse representation of a signal from the dictionary w.r.t. such a sparseness measure. Moreover, we give sufficient conditions on a signal such that the simple solution of a linear programming problem simultaneously solves all the non-convex (and generally hard combinatorial) problems of sparsest representation of the signal w.r.t. arbitrary admissible sparseness measures.
IEEE Transactions on Information Theory, 2003
R EFERENCES [1] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992. [2] K. Grö... more R EFERENCES [1] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992. [2] K. Gröchenig, Foundations of Time-Frequency Analysis. Boston, MA: Birkhäuser, 2001. [3] A. Ron and Z. Shen, Weyl-Heisenberg frames and Riesz bases in L (R ) , Duke Math. ...
IEEE Transactions on Information Theory, 2003
Protein Science, 2006
Discovery of discontinuous B-cell epitopes is a major challenge in vaccine design. Previous epito... more Discovery of discontinuous B-cell epitopes is a major challenge in vaccine design. Previous epitope prediction methods have mostly been based on protein sequences and are not very effective. Here, we present DiscoTope, a novel method for discontinuous epitope prediction that uses protein threedimensional structural data. The method is based on amino acid statistics, spatial information, and surface accessibility in a compiled data set of discontinuous epitopes determined by X-ray crystallography of antibody/antigen protein complexes. DiscoTope is the first method to focus explicitly on discontinuous epitopes. We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15.5% of residues located in discontinuous epitopes with a specificity of 95%. At this level of specificity, the conventional Parker hydrophilicity scale for predicting linear B-cell epitopes identifies only 11.0% of residues located in discontinuous epitopes. Predictions by the DiscoTope method can guide experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification. ; fax: 45-4593-1585.
PLOS Computational Biology, 2006
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T ly... more Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools. Citation: Peters B, Bui HH, Frankild S, Nielsen M, Lundegaard C, et al. (2006) A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2(6): e65.
Immunogenetics, 2004
Major histocompatibility complex (MHC) proteins are encoded by extremely polymorphic genes and pl... more Major histocompatibility complex (MHC) proteins are encoded by extremely polymorphic genes and play a crucial role in immunity. However, not all genetically different MHC molecules are functionally different. Sette and Sidney (1999) have defined nine HLA class I supertypes and showed that with only nine main functional binding specificities it is possible to cover the binding properties of almost all known HLA class I molecules. Here we present a comprehensive study of the functional relationship between all HLA molecules with known specificities in a uniform and automated way. We have developed a novel method for clustering sequence motifs. We construct hidden Markov models for HLA class I molecules using a Gibbs sampling procedure and use the similarities among these to define clusters of specificities. These clusters are extensions of the previously suggested ones. We suggest splitting some of the alleles in the A1 supertype into a new A26 supertype, and some of the alleles in the B27 supertype into a new B39 supertype. Furthermore the B8 alleles may define their own supertype. We also use the published specificities for a number of HLA-DR types to define clusters with similar specificities. We report that the previously observed specificities of these class II molecules can be clustered into nine classes, which only partly correspond to the serological classification. We show that classification of HLA molecules may be done in a uniform and automated way. The definition of clusters allows for selection of representative HLA molecules that can cover the HLA specificity space better. This makes it possible to target most of the known HLA alleles with known specificities using only a few peptides, and may be used in construction of vaccines. Supplementary material is available at http://www.cbs.dtu.dk/researchgroups/immunology/supertypes.html.
Bioinformatics/computer Applications in The Biosciences, 2004
Motivation: Prediction of which peptides will bind a specific major histocompatibility complex (M... more Motivation: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs.The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. Results: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a largescale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived * To whom correspondence should be addressed.
Protein Science, 2003
In this paper we describe an improved neural network method to predict T-cell class I epitopes. A... more In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.
Proteins-structure Function and Bioinformatics, 2000
Secondary structure prediction involving up to 800 neural network predictions has been developed,... more Secondary structure prediction involving up to 800 neural network predictions has been developed, by use of novel methods such as output expansion and a unique balloting procedure. An overall performance of 77.2%-80.2% (77.9%-80.6% mean per-chain) for three-state (helix, strand, coil) prediction was obtained when evaluated on a commonly used set of 126 protein chains. The method uses profiles made by position-specific scoring matrices as input, while at the output level it predicts on three consecutive residues simultaneously. The predictions arise from tenfold, cross validated training and testing of 1032 protein sequences, using a scheme with primary structure neural networks followed by structure filtering neural networks. With respect to blind prediction, this work is preliminary and awaits evaluation by CASP4. Proteins 2000;41:17-20.
European Journal of Immunology, 2005
Reverse immunogenetic approaches attempt to optimize the selection of candidate epitopes, and thu... more Reverse immunogenetic approaches attempt to optimize the selection of candidate epitopes, and thus minimize the experimental effort needed to identify new epitopes. When predicting cytotoxic T cell epitopes, the main focus has been on the highly specific MHC class I binding event. Methods have also been developed for predicting the antigen-processing steps preceding MHC class I binding, including proteasomal cleavage and transporter associated with antigen processing (TAP) transport efficiency. Here, we use a dataset obtained from the SYFPEITHI database to show that a method integrating predictions of MHC class I binding affinity, TAP transport efficiency, and Cterminal proteasomal cleavage outperforms any of the individual methods. Using an independent evaluation dataset of HIV epitopes from the Los Alamos database, the validity of the integrated method is confirmed. The performance of the integrated method is found to be significantly higher than that of the two publicly available prediction methods BIMAS and SYFPEITHI. To identify 85% of the epitopes in the HIV dataset, 9% and 10% of all possible nonamers in the HIV proteins must be tested when using the BIMAS and SYFPEITHI methods, respectively, for the selection of candidate epitopes. This number is reduced to 7% when using the integrated method. In practical terms, this means that the experimental effort needed to identify an epitope in a hypothetical protein with 85% probability is reduced by 20-30% when using the integrated method. The method is available at Abbreviations: ANN: artificial neural network Á AUC: area under the ROC curve Á TAP: transporter associated with antigen processing Eur. J. Immunol. 2005. 35: 2295-2303 Antigen processing a) The 12 supertypes and the alleles are classified as belonging to each supertype [20]. b) The number of unique nonamers in each group for which it was possible to locate a source protein in the SwissProt database. c) The number of nonamers not included in training of NetMHC and NetChop C-term 2.0/3.0 predictors. The last three columns summarize which NetMHC, BIMAS [22] and SYFPEITHI [23] methods are used to represent each supertype.