Andrew Teschendorff - Academia.edu (original) (raw)
Uploads
Papers by Andrew Teschendorff
Genome Biology, 2020
Cell type heterogeneity presents a challenge to the interpretation of epigenome data, compounded ... more Cell type heterogeneity presents a challenge to the interpretation of epigenome data, compounded by the difficulty in generating reliable single-cell DNA methylomes for large numbers of cells and samples. We present EPISCORE, a computational algorithm that performs virtual microdissection of bulk tissue DNA methylation data at single cell-type resolution for any solid tissue. EPISCORE applies a probabilistic epigenetic model of gene regulation to a single-cell RNA-seq tissue atlas to generate a tissue-specific DNA methylation reference matrix, allowing quantification of cell-type proportions and cell-type-specific differential methylation signals in bulk tissue data. We validate EPISCORE in multiple epigenome studies and tissue types.
BMC Cancer, 2010
Background: Elucidating the activation pattern of molecular pathways across a given tumour type i... more Background: Elucidating the activation pattern of molecular pathways across a given tumour type is a key challenge necessary for understanding the heterogeneity in clinical response and for developing novel more effective therapies. Gene expression signatures of molecular pathway activation derived from perturbation experiments in model systems as well as structural models of molecular interactions ("model signatures") constitute an important resource for estimating corresponding activation levels in tumours. However, relatively few strategies for estimating pathway activity from such model signatures exist and only few studies have used activation patterns of pathways to refine molecular classifications of cancer.
BMC Biotechnology, 2008
Background: Human papilloma virus (HPV) load and physical status are considered useful parameters... more Background: Human papilloma virus (HPV) load and physical status are considered useful parameters for clinical evaluation of cervical squamous cell neoplasia. However, the errors implicit in HPV gene quantification by PCR are not well documented. We have undertaken the first rigorous evaluation of the errors that can be expected when using SYBR green qPCR for quantification of HPV type 16 gene copy numbers. We assessed a modified method, in which external calibration curves were generated from a single construct containing HPV16 E2, HPV16 E6 and the host gene hydroxymethylbilane synthase in a 1:1:1 ratio.
BMC Bioinformatics, 2011
Background: Inferring molecular pathway activity is an important step towards reducing the comple... more Background: Inferring molecular pathway activity is an important step towards reducing the complexity of genomic data, understanding the heterogeneity in clinical outcome, and obtaining molecular correlates of cancer imaging traits. Increasingly, approaches towards pathway activity inference combine molecular profiles (e.g gene or protein expression) with independent and highly curated structural interaction data (e.g protein interaction networks) or more generally with prior knowledge pathway databases. However, it is unclear how best to use the pathway knowledge information in the context of molecular profiles of any given study. Results: We present an algorithm called DART (Denoising Algorithm based on Relevance network Topology) which filters out noise before estimating pathway activity. Using simulated and real multidimensional cancer genomic data and by comparing DART to other algorithms which do not assess the relevance of the prior pathway information, we here demonstrate that substantial improvement in pathway activity predictions can be made if prior pathway information is denoised before predictions are made. We also show that genes encoding hubs in expression correlation networks represent more reliable markers of pathway activity. Using the Netpath resource of signalling pathways in the context of breast cancer gene expression data we further demonstrate that DART leads to more robust inferences about pathway activity correlations. Finally, we show that DART identifies a hypothesized association between oestrogen signalling and mammographic density in ER+ breast cancer.
BMC Bioinformatics, 2012
Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technolog... more Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context.
Bioinformatics, 2012
The standard paradigm in omic disciplines has been to identify biologically relevant biomarkers u... more The standard paradigm in omic disciplines has been to identify biologically relevant biomarkers using statistics that reflect differences in mean levels of a molecular quantity such as mRNA expression or DNA methylation. Recently, however, it has been proposed that differential epigenetic variability may mark genes that contribute to the risk of complex genetic diseases like cancer and that identification of risk and early detection markers may therefore benefit from statistics based on differential variability. Results: Using four genome-wide DNA methylation datasets totalling 311 epithelial samples and encompassing all stages of cervical carcinogenesis, we here formally demonstrate that differential variability, as a criterion for selecting DNA methylation features, can identify cancer risk markers more reliably than statistics based on differences in mean methylation. We show that differential variability selects features with heterogeneous outlier methylation profiles and that these play a key role in the early stages of carcinogenesis. Moreover, differentially variable features identified in precursor non-invasive lesions exhibit significantly increased enrichment for developmental genes compared with differentially methylated sites. Conversely, differential variability does not add predictive value in cancer studies profiling invasive tumours or wholeblood tissue. Finally, we incorporate the differential variability feature selection step into a novel adaptive index prediction algorithm called EVORA (epigenetic variable outliers for risk prediction analysis), and demonstrate that EVORA compares favourably to powerful prediction algorithms based on differential methylation statistics. Conclusions: Statistics based on differential variability improve the detection of cancer risk markers in the context of DNA methylation studies profiling epithelial preinvasive neoplasias. We present a novel algorithm (EVORA) which could be used for prediction and diagnosis of precursor epithelial cancer lesions. Availability: R-scripts implementing EVORA are available from CRAN (www.r-project.org).
Bioinformatics, 2006
Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical mar... more Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical markers from microarray experiments is problematic due to the large number of variables being measured. Feature selection methods that can identify relevant classifiers or that can remove likely false positives prior to supervised analysis are therefore desirable. Results: We present a novel feature selection procedure based on a mixture model and a non-gaussianity measure of a gene's expression profile. The method can be used to find genes that define either small outlier subgroups or major subdivisions, depending on the sign of kurtosis. The method can also be used as a filtering step, prior to supervised analysis, in order to reduce the false discovery rate. We validate our methodology using six independent datasets by rediscovering major classifiers in ER negative and ER positive breast cancer and in prostate cancer. Furthermore, our method finds two novel subtypes within the basal subgroup of ER negative breast tumours, associated with apoptotic and immune response functions respectively, and with statistically different clinical outcome. Availability: An R-function pack that implements the methods used here has been added to vabayelMix, available from (www.cran. r-project.org). Contact: aet21@cam.ac.uk Supplementary information: Supplementary information is available at Bioinformatics online.
Bioinformatics, 2011
Single-molecule force spectroscopy has facilitated the experimental investigation of biomolecular... more Single-molecule force spectroscopy has facilitated the experimental investigation of biomolecular force-coupled kinetics, from which the kinetics at zero force can be extrapolated via explicit theoretical models. The atomic force microscope (AFM) in particular is routinely used to study protein unfolding kinetics, but only rarely protein folding kinetics. The discrepancy arises because mechanical protein refolding studies are more technically challenging. Results: We developed software that can drive and analyse mechanical refolding experiments when used with the commercial AFM setup 'Picoforce AFM', Bruker (previously Digital Instruments). We expect the software to be easily adaptable to other AFM setups. We also developed an improved method for the statistical characterization of protein folding kinetics, and implemented it into an AFM-independent software module. Availability: Software and documentation are available at
Bioinformatics, 2005
Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires... more Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. Results: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure. Availability: We have developed an R-package vabayelMix, available from www.cran.r-project.org, that implements the algorithm described in this paper. Contact: aet21@cam.ac.uk Supplementary information: http://bioinformatics.oxfordjournals.org
Background: Transcriptional networks in cancer are deeply misregulated. Retrotransposons and othe... more Background: Transcriptional networks in cancer are deeply misregulated. Retrotransposons and other repetitive elements occupy more than 40% of the size of mammalian genomes, and their epigenetically silenced status have been recently reported to be lost in several cancer stages. Besides, their sequences are potential targets for transcription factors (TFs) regulation. In spite of this, little is known about how transcriptional networks are modulated by the presence of repetitive elements.
Integration of genetics and epigenetics has emerged as a powerful approach to study cellular diff... more Integration of genetics and epigenetics has emerged as a powerful approach to study cellular differentiation and tumourigenesis. The study of DNA methylation is of particular importance in cancer as causal involvement has been demonstrated and it is the most stable of all ...
MJ and AET contributed equally to this work.
Genome Biology, 2020
Cell type heterogeneity presents a challenge to the interpretation of epigenome data, compounded ... more Cell type heterogeneity presents a challenge to the interpretation of epigenome data, compounded by the difficulty in generating reliable single-cell DNA methylomes for large numbers of cells and samples. We present EPISCORE, a computational algorithm that performs virtual microdissection of bulk tissue DNA methylation data at single cell-type resolution for any solid tissue. EPISCORE applies a probabilistic epigenetic model of gene regulation to a single-cell RNA-seq tissue atlas to generate a tissue-specific DNA methylation reference matrix, allowing quantification of cell-type proportions and cell-type-specific differential methylation signals in bulk tissue data. We validate EPISCORE in multiple epigenome studies and tissue types.
BMC Cancer, 2010
Background: Elucidating the activation pattern of molecular pathways across a given tumour type i... more Background: Elucidating the activation pattern of molecular pathways across a given tumour type is a key challenge necessary for understanding the heterogeneity in clinical response and for developing novel more effective therapies. Gene expression signatures of molecular pathway activation derived from perturbation experiments in model systems as well as structural models of molecular interactions ("model signatures") constitute an important resource for estimating corresponding activation levels in tumours. However, relatively few strategies for estimating pathway activity from such model signatures exist and only few studies have used activation patterns of pathways to refine molecular classifications of cancer.
BMC Biotechnology, 2008
Background: Human papilloma virus (HPV) load and physical status are considered useful parameters... more Background: Human papilloma virus (HPV) load and physical status are considered useful parameters for clinical evaluation of cervical squamous cell neoplasia. However, the errors implicit in HPV gene quantification by PCR are not well documented. We have undertaken the first rigorous evaluation of the errors that can be expected when using SYBR green qPCR for quantification of HPV type 16 gene copy numbers. We assessed a modified method, in which external calibration curves were generated from a single construct containing HPV16 E2, HPV16 E6 and the host gene hydroxymethylbilane synthase in a 1:1:1 ratio.
BMC Bioinformatics, 2011
Background: Inferring molecular pathway activity is an important step towards reducing the comple... more Background: Inferring molecular pathway activity is an important step towards reducing the complexity of genomic data, understanding the heterogeneity in clinical outcome, and obtaining molecular correlates of cancer imaging traits. Increasingly, approaches towards pathway activity inference combine molecular profiles (e.g gene or protein expression) with independent and highly curated structural interaction data (e.g protein interaction networks) or more generally with prior knowledge pathway databases. However, it is unclear how best to use the pathway knowledge information in the context of molecular profiles of any given study. Results: We present an algorithm called DART (Denoising Algorithm based on Relevance network Topology) which filters out noise before estimating pathway activity. Using simulated and real multidimensional cancer genomic data and by comparing DART to other algorithms which do not assess the relevance of the prior pathway information, we here demonstrate that substantial improvement in pathway activity predictions can be made if prior pathway information is denoised before predictions are made. We also show that genes encoding hubs in expression correlation networks represent more reliable markers of pathway activity. Using the Netpath resource of signalling pathways in the context of breast cancer gene expression data we further demonstrate that DART leads to more robust inferences about pathway activity correlations. Finally, we show that DART identifies a hypothesized association between oestrogen signalling and mammographic density in ER+ breast cancer.
BMC Bioinformatics, 2012
Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technolog... more Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context.
Bioinformatics, 2012
The standard paradigm in omic disciplines has been to identify biologically relevant biomarkers u... more The standard paradigm in omic disciplines has been to identify biologically relevant biomarkers using statistics that reflect differences in mean levels of a molecular quantity such as mRNA expression or DNA methylation. Recently, however, it has been proposed that differential epigenetic variability may mark genes that contribute to the risk of complex genetic diseases like cancer and that identification of risk and early detection markers may therefore benefit from statistics based on differential variability. Results: Using four genome-wide DNA methylation datasets totalling 311 epithelial samples and encompassing all stages of cervical carcinogenesis, we here formally demonstrate that differential variability, as a criterion for selecting DNA methylation features, can identify cancer risk markers more reliably than statistics based on differences in mean methylation. We show that differential variability selects features with heterogeneous outlier methylation profiles and that these play a key role in the early stages of carcinogenesis. Moreover, differentially variable features identified in precursor non-invasive lesions exhibit significantly increased enrichment for developmental genes compared with differentially methylated sites. Conversely, differential variability does not add predictive value in cancer studies profiling invasive tumours or wholeblood tissue. Finally, we incorporate the differential variability feature selection step into a novel adaptive index prediction algorithm called EVORA (epigenetic variable outliers for risk prediction analysis), and demonstrate that EVORA compares favourably to powerful prediction algorithms based on differential methylation statistics. Conclusions: Statistics based on differential variability improve the detection of cancer risk markers in the context of DNA methylation studies profiling epithelial preinvasive neoplasias. We present a novel algorithm (EVORA) which could be used for prediction and diagnosis of precursor epithelial cancer lesions. Availability: R-scripts implementing EVORA are available from CRAN (www.r-project.org).
Bioinformatics, 2006
Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical mar... more Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical markers from microarray experiments is problematic due to the large number of variables being measured. Feature selection methods that can identify relevant classifiers or that can remove likely false positives prior to supervised analysis are therefore desirable. Results: We present a novel feature selection procedure based on a mixture model and a non-gaussianity measure of a gene's expression profile. The method can be used to find genes that define either small outlier subgroups or major subdivisions, depending on the sign of kurtosis. The method can also be used as a filtering step, prior to supervised analysis, in order to reduce the false discovery rate. We validate our methodology using six independent datasets by rediscovering major classifiers in ER negative and ER positive breast cancer and in prostate cancer. Furthermore, our method finds two novel subtypes within the basal subgroup of ER negative breast tumours, associated with apoptotic and immune response functions respectively, and with statistically different clinical outcome. Availability: An R-function pack that implements the methods used here has been added to vabayelMix, available from (www.cran. r-project.org). Contact: aet21@cam.ac.uk Supplementary information: Supplementary information is available at Bioinformatics online.
Bioinformatics, 2011
Single-molecule force spectroscopy has facilitated the experimental investigation of biomolecular... more Single-molecule force spectroscopy has facilitated the experimental investigation of biomolecular force-coupled kinetics, from which the kinetics at zero force can be extrapolated via explicit theoretical models. The atomic force microscope (AFM) in particular is routinely used to study protein unfolding kinetics, but only rarely protein folding kinetics. The discrepancy arises because mechanical protein refolding studies are more technically challenging. Results: We developed software that can drive and analyse mechanical refolding experiments when used with the commercial AFM setup 'Picoforce AFM', Bruker (previously Digital Instruments). We expect the software to be easily adaptable to other AFM setups. We also developed an improved method for the statistical characterization of protein folding kinetics, and implemented it into an AFM-independent software module. Availability: Software and documentation are available at
Bioinformatics, 2005
Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires... more Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. Results: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure. Availability: We have developed an R-package vabayelMix, available from www.cran.r-project.org, that implements the algorithm described in this paper. Contact: aet21@cam.ac.uk Supplementary information: http://bioinformatics.oxfordjournals.org
Background: Transcriptional networks in cancer are deeply misregulated. Retrotransposons and othe... more Background: Transcriptional networks in cancer are deeply misregulated. Retrotransposons and other repetitive elements occupy more than 40% of the size of mammalian genomes, and their epigenetically silenced status have been recently reported to be lost in several cancer stages. Besides, their sequences are potential targets for transcription factors (TFs) regulation. In spite of this, little is known about how transcriptional networks are modulated by the presence of repetitive elements.
Integration of genetics and epigenetics has emerged as a powerful approach to study cellular diff... more Integration of genetics and epigenetics has emerged as a powerful approach to study cellular differentiation and tumourigenesis. The study of DNA methylation is of particular importance in cancer as causal involvement has been demonstrated and it is the most stable of all ...
MJ and AET contributed equally to this work.