Ole Winther | Technical University of Denmark (DTU) (original) (raw)

Papers by Ole Winther

Research paper thumbnail of Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Bioinformatics/computer Applications in The Biosciences, 2006

Motivation: Hierarchical and relocation clustering (e.g. Kmeans and self-organising maps) have be... more Motivation: Hierarchical and relocation clustering (e.g. Kmeans and self-organising maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialisation of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualisation and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data. Availability: Matlab source code for the clustering algorithm ClusterLustre, and the simulated dataset for testing are available upon request from T.G.

Research paper thumbnail of Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach

Physical Review Letters, 2001

We develop an advanced mean field method for approximating averages in probabilistic data models ... more We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with non-glassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set.

Research paper thumbnail of Gaussian processes and SVM: Mean field and leave-one-out

ADVANCES IN NEURAL INFORMATION …, 2000

In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and ... more In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The rst one is the calculation of the posterior mean for GP classi ers using a`naive' mean eld approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean eld algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error.

Research paper thumbnail of A Quantitative Study Of Pruning By Optimal Brain Damage

International Journal of Neural Systems, 1993

Research paper thumbnail of Independent component analysis for understanding multimedia content

Neural Networks for …, 2002

Research paper thumbnail of Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae

Genome Biology, 2006

Genome Biology 2006, 7:R107 comment reviews reports deposited research refereed research interact... more Genome Biology 2006, 7:R107 comment reviews reports deposited research refereed research interactions information

Research paper thumbnail of Mean-field approaches to independent component analysis

Research paper thumbnail of The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line

Nature Genetics, 2009

The transcriptional network that controls growth arrest and differentiation in a human myeloid le... more The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line

Research paper thumbnail of Gaussian processes for classification: Mean-field algorithms

Neural Computation, 2000

We derive a mean eld algorithm for binary classi cation with Gaussian processes which is based on... more We derive a mean eld algorithm for binary classi cation with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler`naive' mean eld theory and support vector machines (SVM) as limiting cases. For both mean eld algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean eld approach.

Research paper thumbnail of JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update

Nucleic Acids Research, 2007

JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for... more JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database—the JASPAR CORE sub-database—has increased by 12% in size, and three new specialized sub-databases have been added. New functions include clustering of matrix models by similarity, generation of random matrices by sampling from selected sets of existing models and a language-independent Web Service applications programming interface for matrix retrieval. JASPAR is available at http://jaspar.genereg.net.

Research paper thumbnail of TAP Gibbs free energy, belief propagation and sparsity

Abstract The adaptive TAP Gibbs free energy for a general densely connected probabilistic model w... more Abstract The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka's expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm.

Research paper thumbnail of Sparse linear identifiable multivariate modeling

Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and l... more Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables.

Research paper thumbnail of Discovery of regulatory elements is improved by a discriminatory approach

A major goal in post-genome biology is the complete mapping of the gene regulatory networks for e... more A major goal in post-genome biology is the complete mapping of the gene regulatory networks for every organism. Identification of regulatory elements is a prerequisite for realizing this ambitious goal. A common problem is finding regulatory patterns in promoters of a group of co-expressed genes, but contemporary methods are challenged by the size and diversity of regulatory regions in higher metazoans.

Research paper thumbnail of Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach

We develop an advanced mean held method for approximating averages in probabilistic data models t... more We develop an advanced mean held method for approximating averages in probabilistic data models that is based on the Thouless-Anderson-Palmer (TAP) approach of disorder physics. In contrast to conventional TAP. where the knowledge of the distribution of couplings between the random variables is required. our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is so far restricted to models with nonglassy behavior?

Research paper thumbnail of JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update

Abstract JASPAR is a popular open-access database for matrix models describing DNA-binding prefer... more Abstract JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database—the JASPAR CORE sub-database—has increased by 12% in size, and three new specialized sub-databases have been added.

Research paper thumbnail of Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae

Background Growth rate is central to the development of cells in all organisms. However, little i... more Background Growth rate is central to the development of cells in all organisms. However, little is known about the impact of changing growth rates. We used continuous cultures to control growth rate and studied the transcriptional program of the model eukaryote Saccharomyces cerevisiae, with generation times varying between 2 and 35 hours. Results A total of 5930 transcripts were identified at the different growth rates studied.

Research paper thumbnail of Teaching computers to fold proteins

A new general algorithm for optimization of potential functions for protein folding is introduced... more A new general algorithm for optimization of potential functions for protein folding is introduced. It is based upon gradient optimization of the thermodynamic stability of native folds of a training set of proteins with known structure. The iterative update rule contains two thermodynamic averages which are estimated by (generalized ensemble) Monte Carlo. We test the learning algorithm on a Lennard-Jones (LJ) force field with a torsional angle degrees-of-freedom and a single-atom side-chain.

Research paper thumbnail of Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Abstract Motivation: Hierarchical and relocation clustering (eg K-means and self-organizing maps)... more Abstract Motivation: Hierarchical and relocation clustering (eg K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results.

Research paper thumbnail of Multivariate Hawkes process models of the occurrence of regulatory elements

Background A central question in molecular biology is how transcriptional regulatory elements (TR... more Background A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome. Results We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets.

Research paper thumbnail of Molecular signatures of thyroid follicular neoplasia

Abstract The molecular pathways leading to thyroid follicular neoplasia are incompletely understo... more Abstract The molecular pathways leading to thyroid follicular neoplasia are incompletely understood, and the diagnosis of follicular tumors is a clinical challenge. To provide leads to the pathogenesis and diagnosis of the tumors, we examined the global transcriptome signatures of follicular thyroid carcinoma (FC) and normofollicular adenoma (FA) as well as fetal/microFA (fetal adenoma).

Research paper thumbnail of Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Bioinformatics/computer Applications in The Biosciences, 2006

Motivation: Hierarchical and relocation clustering (e.g. Kmeans and self-organising maps) have be... more Motivation: Hierarchical and relocation clustering (e.g. Kmeans and self-organising maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialisation of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualisation and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data. Availability: Matlab source code for the clustering algorithm ClusterLustre, and the simulated dataset for testing are available upon request from T.G.

Research paper thumbnail of Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach

Physical Review Letters, 2001

We develop an advanced mean field method for approximating averages in probabilistic data models ... more We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with non-glassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set.

Research paper thumbnail of Gaussian processes and SVM: Mean field and leave-one-out

ADVANCES IN NEURAL INFORMATION …, 2000

In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and ... more In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The rst one is the calculation of the posterior mean for GP classi ers using a`naive' mean eld approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean eld algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error.

Research paper thumbnail of A Quantitative Study Of Pruning By Optimal Brain Damage

International Journal of Neural Systems, 1993

Research paper thumbnail of Independent component analysis for understanding multimedia content

Neural Networks for …, 2002

Research paper thumbnail of Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae

Genome Biology, 2006

Genome Biology 2006, 7:R107 comment reviews reports deposited research refereed research interact... more Genome Biology 2006, 7:R107 comment reviews reports deposited research refereed research interactions information

Research paper thumbnail of Mean-field approaches to independent component analysis

Research paper thumbnail of The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line

Nature Genetics, 2009

The transcriptional network that controls growth arrest and differentiation in a human myeloid le... more The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line

Research paper thumbnail of Gaussian processes for classification: Mean-field algorithms

Neural Computation, 2000

We derive a mean eld algorithm for binary classi cation with Gaussian processes which is based on... more We derive a mean eld algorithm for binary classi cation with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler`naive' mean eld theory and support vector machines (SVM) as limiting cases. For both mean eld algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean eld approach.

Research paper thumbnail of JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update

Nucleic Acids Research, 2007

JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for... more JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database—the JASPAR CORE sub-database—has increased by 12% in size, and three new specialized sub-databases have been added. New functions include clustering of matrix models by similarity, generation of random matrices by sampling from selected sets of existing models and a language-independent Web Service applications programming interface for matrix retrieval. JASPAR is available at http://jaspar.genereg.net.

Research paper thumbnail of TAP Gibbs free energy, belief propagation and sparsity

Abstract The adaptive TAP Gibbs free energy for a general densely connected probabilistic model w... more Abstract The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka's expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm.

Research paper thumbnail of Sparse linear identifiable multivariate modeling

Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and l... more Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables.

Research paper thumbnail of Discovery of regulatory elements is improved by a discriminatory approach

A major goal in post-genome biology is the complete mapping of the gene regulatory networks for e... more A major goal in post-genome biology is the complete mapping of the gene regulatory networks for every organism. Identification of regulatory elements is a prerequisite for realizing this ambitious goal. A common problem is finding regulatory patterns in promoters of a group of co-expressed genes, but contemporary methods are challenged by the size and diversity of regulatory regions in higher metazoans.

Research paper thumbnail of Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach

We develop an advanced mean held method for approximating averages in probabilistic data models t... more We develop an advanced mean held method for approximating averages in probabilistic data models that is based on the Thouless-Anderson-Palmer (TAP) approach of disorder physics. In contrast to conventional TAP. where the knowledge of the distribution of couplings between the random variables is required. our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is so far restricted to models with nonglassy behavior?

Research paper thumbnail of JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update

Abstract JASPAR is a popular open-access database for matrix models describing DNA-binding prefer... more Abstract JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database—the JASPAR CORE sub-database—has increased by 12% in size, and three new specialized sub-databases have been added.

Research paper thumbnail of Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae

Background Growth rate is central to the development of cells in all organisms. However, little i... more Background Growth rate is central to the development of cells in all organisms. However, little is known about the impact of changing growth rates. We used continuous cultures to control growth rate and studied the transcriptional program of the model eukaryote Saccharomyces cerevisiae, with generation times varying between 2 and 35 hours. Results A total of 5930 transcripts were identified at the different growth rates studied.

Research paper thumbnail of Teaching computers to fold proteins

A new general algorithm for optimization of potential functions for protein folding is introduced... more A new general algorithm for optimization of potential functions for protein folding is introduced. It is based upon gradient optimization of the thermodynamic stability of native folds of a training set of proteins with known structure. The iterative update rule contains two thermodynamic averages which are estimated by (generalized ensemble) Monte Carlo. We test the learning algorithm on a Lennard-Jones (LJ) force field with a torsional angle degrees-of-freedom and a single-atom side-chain.

Research paper thumbnail of Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Abstract Motivation: Hierarchical and relocation clustering (eg K-means and self-organizing maps)... more Abstract Motivation: Hierarchical and relocation clustering (eg K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results.

Research paper thumbnail of Multivariate Hawkes process models of the occurrence of regulatory elements

Background A central question in molecular biology is how transcriptional regulatory elements (TR... more Background A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome. Results We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets.

Research paper thumbnail of Molecular signatures of thyroid follicular neoplasia

Abstract The molecular pathways leading to thyroid follicular neoplasia are incompletely understo... more Abstract The molecular pathways leading to thyroid follicular neoplasia are incompletely understood, and the diagnosis of follicular tumors is a clinical challenge. To provide leads to the pathogenesis and diagnosis of the tumors, we examined the global transcriptome signatures of follicular thyroid carcinoma (FC) and normofollicular adenoma (FA) as well as fetal/microFA (fetal adenoma).