Sebastian Klie - Academia.edu (original) (raw)

Papers by Sebastian Klie

Research paper thumbnail of Analysis of the compartmentalized metabolome–a validation of the non-aqueous fractionation technique

Research paper thumbnail of Aclonifen targets solanesyl diphosphate synthase, representing a novel mode of action for herbicides

Research paper thumbnail of FamNet: A framework to identify multiplied modules driving pathway diversification in plants

Plant Physiology, 2016

Gene duplications generate new genes that can acquire similar but often diversified functions. Re... more Gene duplications generate new genes that can acquire similar but often diversified functions. Recent studies of gene coexpression networks have indicated that, not only genes, but also pathways can be multiplied and diversified to perform related functions in different parts of an organism. Identification of such diversified pathways, or modules, is needed to expand our knowledge of biological processes in plants and to understand how biological functions evolve. However, systematic explorations of modules remain scarce, and no user-friendly platform to identify them exists. We have established a statistical framework to identify modules and show that approximately one-third of the genes of a plant's genome participate in hundreds of multiplied modules. Using this framework as a basis, we implemented a platform that can explore and visualize multiplied modules in coexpression networks of eight plant species. To validate the usefulness of the platform, we identified and functionally characterized pollen-and root-specific cell wall modules that multiplied to confer tip growth in pollen tubes and root hairs, respectively. Furthermore, we identified multiplied modules involved in secondary metabolite synthesis and corroborated them by metabolite profiling of tobacco (Nicotiana tabacum) tissues. The interactive platform, referred to as FamNet, is available at http://www.gene2function.de/famnet.html.

Research paper thumbnail of Glucocorticoid (dexamethasone)-induced metabolome changes in healthy males suggest prediction of response and side effects

Scientific Reports, 2015

Glucocorticoids are indispensable anti-inflammatory and decongestant drugs with high prevalence o... more Glucocorticoids are indispensable anti-inflammatory and decongestant drugs with high prevalence of use at ~0 .9% of the adult population. Better holistic insights into glucocorticoid-induced changes are crucial for effective use as concurrent medication and management of adverse effects. The profiles of 214 metabolites from plasma of 20 male healthy volunteers were recorded prior to and after ingestion of a single dose of 4 mg dexamethasone (+20 mg pantoprazole). Samples were drawn at three predefined time points per day: seven untreated (day 1 midday-day 3 midday) and four treated (day 3 evening-day 4 evening) per volunteer. Statistical analysis revealed tremendous impact of dexamethasone on the metabolome with 150 of 214 metabolites being significantly deregulated on at least one time point after treatment (ANOVA, Benjamini-Hochberg corrected, q < 0.05). Inter-person variability was high and remained uninfluenced by treatment. The clearly visible circadian rhythm prior to treatment was almost completely suppressed and deregulated by dexamethasone. The results draw a holistic picture of the severe metabolic deregulation induced by single-dose, short-term glucocorticoid application. The observed metabolic changes suggest a potential for early detection of severe side effects, raising hope for personalized early countermeasures increasing quality of life and reducing health care costs.

Research paper thumbnail of Temporal kinetics of the transcriptional response to carbon depletion and sucrose readdition in Arabidopsis seedlings

Plant, cell & environment, Jan 19, 2015

To investigate whether the transcriptional response to carbon (C) depletion and sucrose resupply ... more To investigate whether the transcriptional response to carbon (C) depletion and sucrose resupply depends on the duration and severity of the C-depletion, Arabidopsis seedlings were grown in liquid culture and harvested 3, 6, 12, 24, 48 and 72 hours after removing sucrose from the medium, and 30 minutes after resupplying sucrose at each time. Expression profiling revealed early transcriptional inhibition of cell wall synthesis and remodelling of signalling, followed by induction of C-recycling and photosynthesis, and general inhibition of growth. The temporal sequence differed from the published response to progressive exhaustion of C during a night and extended night in vegetatively-growing plants. The response to sucrose readdition was conserved across the C-depletion time course. Intriguingly, the vast majority of rapidly-responding transcripts decreased rather than increased. The majority of transcripts that respond rapidly to sucrose and many transcripts that respond during C-de...

Research paper thumbnail of Storage and Processing of Mass Spectrometry Data

17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006

... 5 Acknowledgements Nigel Hardy, Helen Jenkins, Chris Taylor, Kai Runte and many others for th... more ... 5 Acknowledgements Nigel Hardy, Helen Jenkins, Chris Taylor, Kai Runte and many others for the ArMet and MzData models. ... Beilstein In-stitute, Logos-Verlag, 2006 (in Press). [7] S. Orchard, H. Hermjakob, P. Binz, C. Hoogland, C. Taylor, W. Zhu, RJ Julian, and R. Apweiler. ...

Research paper thumbnail of Differential metabolic and coexpression networks of plant metabolism

Trends in plant science, Jan 16, 2015

Recent analyses have demonstrated that plant metabolic networks do not differ in their structural... more Recent analyses have demonstrated that plant metabolic networks do not differ in their structural properties and that genes involved in basic metabolic processes show smaller coexpression than genes involved in specialized metabolism. By contrast, our analysis reveals differences in the structure of plant metabolic networks and patterns of coexpression for genes in (non)specialized metabolism. Here we caution that conclusions concerning the organization of plant metabolism based on network-driven analyses strongly depend on the computational approaches used.

Research paper thumbnail of Analysis of the compartmentalized metabolome - a validation of the non-aqueous fractionation technique

Frontiers in plant science, 2011

With the development of high-throughput metabolic technologies, a plethora of primary and seconda... more With the development of high-throughput metabolic technologies, a plethora of primary and secondary compounds have been detected in the plant cell. However, there are still major gaps in our understanding of the plant metabolome. This is especially true with regards to the compartmental localization of these identified metabolites. Non-aqueous fractionation (NAF) is a powerful technique for the determination of subcellular metabolite distributions in eukaryotic cells, and it has become the method of choice to analyze the distribution of a large number of metabolites concurrently. However, the NAF technique produces a continuous gradient of metabolite distributions, not discrete assignments. Resolution of these distributions requires computational analyses based on marker molecules to resolve compartmental localizations. In this article we focus on expanding the computational analysis of data derived from NAF. Along with an experimental workflow, we describe the critical steps in NAF...

Research paper thumbnail of Co-ordination and divergence of cell-specific transcription and translation of genes in arabidopsis root cells

Annals of Botany, 2014

† Background and Aims A key challenge in biology is to systematically investigate and integrate t... more † Background and Aims A key challenge in biology is to systematically investigate and integrate the different levels of information available at the global and single-cell level. Recent studies have elucidated spatiotemporal expression patterns of root cell types in Arabidopsis thaliana, and genome-wide quantification of polysome-associated mRNA levels, i.e. the translatome, has also been obtained for corresponding cell types. Translational control has been increasingly recognized as an important regulatory step in protein synthesis. The aim of this study was to investigate coupled transcription and translation by use of publicly available root datasets. † Methods Using cell-type-specific datasets of the root transcriptome and translatome of arabidopsis, a systematic assessment was made of the degree of coordination and divergence between these two levels of cellular organization. The computational analysis considered correlation and variation of expression across cell types at both system levels, and also provided insights into the degree of co-regulatory relationships that are preserved between the two processes. † Key Results The overall correlation of expression and translation levels of genes resemble an almost bimodal distribution (mean/median value of 0. 08/0. 12), with a second, less strongly pronounced 'mode' for negative Pearson's correlation coefficient values. The analysis conducted also confirms that previously identified key transcriptional activators of secondary cell wall development display highly conserved patterns of transcription and translation across the investigated cell types. Moreover, the biological processes that display conserved and divergent patterns based on the cell-type-specific expression and translation levels were identified. † Conclusions In agreement with previous studies in animal cells, a large degree of uncoupling was found between the transcriptome and translatome. However, components and processes were also identified that are under coordinated transcriptional and translational control in plant root cells.

Research paper thumbnail of Principal Components Analysis

Methods in Molecular Biology, 2012

ABSTRACT Principal components analysis (PCA) is a standard tool in multivariate data analysis to ... more ABSTRACT Principal components analysis (PCA) is a standard tool in multivariate data analysis to reduce the number of dimensions, while retaining as much as possible of the data&#39;s variation. Instead of investigating thousands of original variables, the first few components containing the majority of the data&#39;s variation are explored. The visualization and statistical analysis of these new variables, the principal components, can help to find similarities and differences between samples. Important original variables that are the major contributors to the first few components can be discovered as well.This chapter seeks to deliver a conceptual understanding of PCA as well as a mathematical description. We describe how PCA can be used to analyze different datasets, and we include practical code examples. Possible shortcomings of the methodology and ways to overcome these problems are also discussed.

Research paper thumbnail of Microsoft Word-Supplemental Material31 July2013. docx-226142Supplemental Material. pdf

Research paper thumbnail of An integrated genomic and metabolomic framework for cell wall biology in rice

BMC Genomics, 2014

Background: Plant cell walls are complex structures that full-fill many diverse functions during ... more Background: Plant cell walls are complex structures that full-fill many diverse functions during plant growth and development. It is therefore not surprising that thousands of gene products are involved in cell wall synthesis and maintenance. However, functional association for the majority of these gene products remains obscure. One useful approach to infer biological associations is via transcriptional coordination, or co-expression of genes. This approach has proved useful for several biological processes. Nevertheless, combining co-expression with other large-scale measurements may improve the biological inferences. Results: In this study, we used a combined approach of co-expression and cell wall metabolomics to obtain new insight into cell wall synthesis in rice. We initially created a weighted gene co-expression network from publicly available datasets, and then established a comprehensive cell wall dataset by determining cell wall compositions from 29 tissues that almost cover the whole life cycle of rice. We subsequently combined the datasets through the conversion of co-expressed gene modules into eigen-vectors, representing expression profiles for the genes in the modules, and performed comparative analyses against the cell wall contents. Here, we made three major discoveries. First, we confirmed our approach by finding primary and secondary wall cellulose biosynthesis modules, respectively. Second, we found co-expressed modules that strongly correlated with reorganization of the secondary cell walls and with modifications and degradation of hemicellulosic structures. Third, we inferred that at least one module is likely to play a regulatory role in the production of G-rich lignification. Conclusions: Here, we integrated transcriptomic associations and cell wall metabolism and found that certain co-expressed gene modules are positively correlated with distinct cell wall characteristics. We propose that combining multiple data-types, such as coordinated transcription and cell wall analyses, may be a useful approach to glean new insight into biological processes. The combination of multiple datasets, as illustrated here, can further improve the functional inferences that typically are generated via a single type of datasets. In addition, our data extend the typical co-expression approach to allow deeper insight into cell wall biology in rice.

Research paper thumbnail of Data Integration through Proximity-Based Networks Provides Biological Principles of Organization across Scales

The Plant Cell, 2013

Plant behaviors across levels of cellular organization, from biochemical components to tissues an... more Plant behaviors across levels of cellular organization, from biochemical components to tissues and organs, relate and reflect growth habitats. Quantification of the relationship between behaviors captured in various phenotypic characteristics and growth habitats can help reveal molecular mechanisms of plant adaptation. The aim of this article is to introduce the power of using statistics originally developed in the field of geographic variability analysis together with prominent network models in elucidating principles of biological organization. We provide a critical systematic review of the existing statistical and network-based approaches that can be employed to determine patterns of covariation from both uni- and multivariate phenotypic characteristics in plants. We demonstrate that parameter-independent network-based approaches result in robust insights about phenotypic covariation. These insights can be quantified and tested by applying well-established statistics combining th...

Research paper thumbnail of The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science

Frontiers in Genetics, 2012

Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become... more Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of speciesspecific biological knowledge, and warrants caution in the design principles employed in future ontologies.

Research paper thumbnail of Compromise of Multiple Time-Resolved Transcriptomics Experiments Identifies Tightly Regulated Functions

Frontiers in Plant Science, 2012

With the advent of high-throughput technologies for data acquisition from different components (i... more With the advent of high-throughput technologies for data acquisition from different components (i.e., genes, proteins, and metabolites) of a given biological system, generation of hypotheses, and biological interpretations based on multivariate data sets become increasingly important. These technologies allow for simultaneous gathering of data from the same biological components under different perturbations, including genotypic variation and/or changes in conditions, resulting in so-called multiple data tables. Moreover, these data tables are obtained over a well-chosen time domain to capture the dynamics of the response of the biological system to the perturbation. The computational problem we address in this study is twofold: (1) derive a single data table, referred to as a compromise, which captures information common to the investigated set of multiple tables and (2) identify biological components which contribute most to the determined compromise. Here we argue that recent extensions to principle component analysis called STATIS and dual-STATIS can be used to determine the compromise on which classical techniques for data analysis, such as clustering and term over-enrichment, can be subsequently applied. In addition, we illustrate that STATIS and dual-STATIS facilitate interpretations of a publically available transcriptomics data set capturing the time-resolved response of Arabidopsis thaliana to changing light and/or temperature conditions. We demonstrate that STATIS and dual-STATIS can be used not only to identify the components of a biological system whose behavior is similarly affected due to the perturbation (e.g., in time or condition), but also to specify the extent to which each dimension of the data tables reflect the perturbation. These findings ultimately provide insights in the components and pathways which could be under tight control in plant systems.

Research paper thumbnail of Decreased nucleotide and expression diversity and modified coexpression patterns characterize domestication in the common bean

Using RNA sequencing technology and de novo transcriptome assembly, we compared representative se... more Using RNA sequencing technology and de novo transcriptome assembly, we compared representative sets of wild and domesticated accessions of common bean (Phaseolus vulgaris) from Mesoamerica. RNA was extracted at the first true-leaf stage, and de novo assembly was used to develop a reference transcriptome; the final data set consists of ;190,000 single nucleotide polymorphisms from 27,243 contigs in expressed genomic regions. A drastic reduction in nucleotide diversity (;60%) is evident for the domesticated form, compared with the wild form, and almost 50% of the contigs that are polymorphic were brought to fixation by domestication. In parallel, the effects of domestication decreased the diversity of gene expression (18%). While the coexpression networks for the wild and domesticated accessions demonstrate similar seminal network properties, they show distinct community structures that are enriched for different molecular functions. After simulating the demographic dynamics during domestication, we found that 9% of the genes were actively selected during domestication. We also show that selection induced a further reduction in the diversity of gene expression (26%) and was associated with 5-fold enrichment of differentially expressed genes. While there is substantial evidence of positive selection associated with domestication, in a few cases, this selection has increased the nucleotide diversity in the domesticated pool at target loci associated with abiotic stress responses, flowering time, and morphology.

Research paper thumbnail of Concurrent Conditional Clustering of Multiple Networks: COCONETS

PLoS ONE, 2014

The accumulation of high-throughput data from different experiments has facilitated the extractio... more The accumulation of high-throughput data from different experiments has facilitated the extraction of condition-specific networks over the same set of biological entities. Comparing and contrasting of such multiple biological networks is in the center of differential network biology, aiming at determining general and condition-specific responses captured in the network structure (i.e., included associations between the network components). We provide a novel way for comparison of multiple networks based on determining network clustering (i.e., partition into communities) which is optimal across the set of networks with respect to a given cluster quality measure. To this end, we formulate the optimization-based problem of concurrent conditional clustering of multiple networks, termed COCONETS, based on the modularity. The solution to this problem is a clustering which depends on all considered networks and pinpoints their preserved substructures. We present theoretical results for special classes of networks to demonstrate the implications of conditionality captured by the COCONETS formulation. As the problem can be shown to be intractable, we extend an existing efficient greedy heuristic and applied it to determine concurrent conditional clusters on coexpression networks extracted from publically available timeresolved transcriptomics data of Escherichia coli under five stresses as well as on metabolite correlation networks from metabolomics data set from Arabidopsis thaliana exposed to eight environmental conditions. We demonstrate that the investigation of the differences between the clustering based on all networks with that obtained from a subset of networks can be used to quantify the specificity of biological responses. While a comparison of the Escherichia coli coexpression networks based on seminal properties does not pinpoint biologically relevant differences, the common network substructures extracted by COCONETS are supported by existing experimental evidence. Therefore, the comparison of multiple networks based on concurrent conditional clustering offers a novel venue for detection and investigation of preserved network substructures.

Research paper thumbnail of Principal Components Analysis

Everitt/Applied Multivariate Data Analysis, 2001

Research paper thumbnail of Additional role of O-acetylserine as a sulfur status-independent regulator during plant growth

The Plant Journal, 2012

O-acetylserine (OAS) is one of the most prominent metabolites whose levels are altered upon sulfu... more O-acetylserine (OAS) is one of the most prominent metabolites whose levels are altered upon sulfur starvation. However, its putative role as a signaling molecule in higher plants is controversial. This paper provides further evidence that OAS is a signaling molecule, based on computational analysis of time-series experiments and on studies of transgenic plants conditionally displaying increased OAS levels. Transcripts whose levels correlated with the transient and specific increase in OAS levels observed in leaves of Arabidopsis thaliana plants 5-10 min after transfer to darkness and with diurnal oscillation of the OAS content, showing a characteristic peak during the night, were identified. Induction of a serine-O-acetyltransferase gene (SERAT) in transgenic A. thaliana plants expressing the genes under the control of an inducible promoter resulted in a specific time-dependent increase in OAS levels. Monitoring the transcriptome response at time points at which no changes in sulfur-related metabolites except OAS were observed and correlating this with the light/dark transition and diurnal experiments resulted in identification of six genes whose expression was highly correlated with that of OAS (adenosine-5&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;-phosphosulfate reductase 3, sulfur-deficiency-induced 1, sulfur-deficiency-induced 2, low-sulfur-induced 1, serine hydroxymethyltransferase 7 and ChaC-like protein). These data suggest that OAS displays a signalling function leading to changes in transcript levels of a specific gene set irrespective of the sulfur status of the plant. Additionally, a role for OAS in a specific part of the sulfate response can be deduced.

Research paper thumbnail of High-density kinetic analysis of the metabolomic and transcriptomic response of Arabidopsis to eight environmental conditions

The Plant Journal, 2011

The time-resolved response of Arabidopsis thaliana towards changing light and/or temperature at t... more The time-resolved response of Arabidopsis thaliana towards changing light and/or temperature at the transcriptome and metabolome level is presented. Plants grown at 21°C with a light intensity of 150 lE m)2 sec)1 were either kept at this condition or transferred into seven different environments (4°C, darkness; 21°C, darkness; 32°C, darkness; 4°C, 85 lE m)2 sec)1 ; 21°C, 75 lE m)2 sec)1 ; 21°C, 300 lE m)2 sec)1 ; 32°C, 150 lE m)2 sec)1). Samples were taken before (0 min) and at 22 time points after transfer resulting in (8•) 22 time points covering both a linear and a logarithmic time series totaling 177 states. Hierarchical cluster analysis shows that individual conditions (defined by temperature and light) diverge into distinct trajectories at condition-dependent times and that the metabolome follows different kinetics from the transcriptome. The metabolic responses are initially relatively faster when compared with the transcriptional responses. Gene Ontology over-representation analysis identifies a common response for all changed conditions at the transcriptome level during the early response phase (5-60 min). Metabolic networks reconstructed via metabolite-metabolite correlations reveal extensive environment-specific rewiring. Detailed analysis identifies conditional connections between amino acids and intermediates of the tricarboxylic acid cycle. Parallel analysis of transcriptional changes strongly support a model where in the absence of photosynthesis at normal/high temperatures protein degradation occurs rapidly and subsequent amino acid catabolism serves as the main cellular energy supply. These results thus demonstrate the engagement of the electron transfer flavoprotein system under short-term environmental perturbations.

Research paper thumbnail of Analysis of the compartmentalized metabolome–a validation of the non-aqueous fractionation technique

Research paper thumbnail of Aclonifen targets solanesyl diphosphate synthase, representing a novel mode of action for herbicides

Research paper thumbnail of FamNet: A framework to identify multiplied modules driving pathway diversification in plants

Plant Physiology, 2016

Gene duplications generate new genes that can acquire similar but often diversified functions. Re... more Gene duplications generate new genes that can acquire similar but often diversified functions. Recent studies of gene coexpression networks have indicated that, not only genes, but also pathways can be multiplied and diversified to perform related functions in different parts of an organism. Identification of such diversified pathways, or modules, is needed to expand our knowledge of biological processes in plants and to understand how biological functions evolve. However, systematic explorations of modules remain scarce, and no user-friendly platform to identify them exists. We have established a statistical framework to identify modules and show that approximately one-third of the genes of a plant's genome participate in hundreds of multiplied modules. Using this framework as a basis, we implemented a platform that can explore and visualize multiplied modules in coexpression networks of eight plant species. To validate the usefulness of the platform, we identified and functionally characterized pollen-and root-specific cell wall modules that multiplied to confer tip growth in pollen tubes and root hairs, respectively. Furthermore, we identified multiplied modules involved in secondary metabolite synthesis and corroborated them by metabolite profiling of tobacco (Nicotiana tabacum) tissues. The interactive platform, referred to as FamNet, is available at http://www.gene2function.de/famnet.html.

Research paper thumbnail of Glucocorticoid (dexamethasone)-induced metabolome changes in healthy males suggest prediction of response and side effects

Scientific Reports, 2015

Glucocorticoids are indispensable anti-inflammatory and decongestant drugs with high prevalence o... more Glucocorticoids are indispensable anti-inflammatory and decongestant drugs with high prevalence of use at ~0 .9% of the adult population. Better holistic insights into glucocorticoid-induced changes are crucial for effective use as concurrent medication and management of adverse effects. The profiles of 214 metabolites from plasma of 20 male healthy volunteers were recorded prior to and after ingestion of a single dose of 4 mg dexamethasone (+20 mg pantoprazole). Samples were drawn at three predefined time points per day: seven untreated (day 1 midday-day 3 midday) and four treated (day 3 evening-day 4 evening) per volunteer. Statistical analysis revealed tremendous impact of dexamethasone on the metabolome with 150 of 214 metabolites being significantly deregulated on at least one time point after treatment (ANOVA, Benjamini-Hochberg corrected, q < 0.05). Inter-person variability was high and remained uninfluenced by treatment. The clearly visible circadian rhythm prior to treatment was almost completely suppressed and deregulated by dexamethasone. The results draw a holistic picture of the severe metabolic deregulation induced by single-dose, short-term glucocorticoid application. The observed metabolic changes suggest a potential for early detection of severe side effects, raising hope for personalized early countermeasures increasing quality of life and reducing health care costs.

Research paper thumbnail of Temporal kinetics of the transcriptional response to carbon depletion and sucrose readdition in Arabidopsis seedlings

Plant, cell & environment, Jan 19, 2015

To investigate whether the transcriptional response to carbon (C) depletion and sucrose resupply ... more To investigate whether the transcriptional response to carbon (C) depletion and sucrose resupply depends on the duration and severity of the C-depletion, Arabidopsis seedlings were grown in liquid culture and harvested 3, 6, 12, 24, 48 and 72 hours after removing sucrose from the medium, and 30 minutes after resupplying sucrose at each time. Expression profiling revealed early transcriptional inhibition of cell wall synthesis and remodelling of signalling, followed by induction of C-recycling and photosynthesis, and general inhibition of growth. The temporal sequence differed from the published response to progressive exhaustion of C during a night and extended night in vegetatively-growing plants. The response to sucrose readdition was conserved across the C-depletion time course. Intriguingly, the vast majority of rapidly-responding transcripts decreased rather than increased. The majority of transcripts that respond rapidly to sucrose and many transcripts that respond during C-de...

Research paper thumbnail of Storage and Processing of Mass Spectrometry Data

17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006

... 5 Acknowledgements Nigel Hardy, Helen Jenkins, Chris Taylor, Kai Runte and many others for th... more ... 5 Acknowledgements Nigel Hardy, Helen Jenkins, Chris Taylor, Kai Runte and many others for the ArMet and MzData models. ... Beilstein In-stitute, Logos-Verlag, 2006 (in Press). [7] S. Orchard, H. Hermjakob, P. Binz, C. Hoogland, C. Taylor, W. Zhu, RJ Julian, and R. Apweiler. ...

Research paper thumbnail of Differential metabolic and coexpression networks of plant metabolism

Trends in plant science, Jan 16, 2015

Recent analyses have demonstrated that plant metabolic networks do not differ in their structural... more Recent analyses have demonstrated that plant metabolic networks do not differ in their structural properties and that genes involved in basic metabolic processes show smaller coexpression than genes involved in specialized metabolism. By contrast, our analysis reveals differences in the structure of plant metabolic networks and patterns of coexpression for genes in (non)specialized metabolism. Here we caution that conclusions concerning the organization of plant metabolism based on network-driven analyses strongly depend on the computational approaches used.

Research paper thumbnail of Analysis of the compartmentalized metabolome - a validation of the non-aqueous fractionation technique

Frontiers in plant science, 2011

With the development of high-throughput metabolic technologies, a plethora of primary and seconda... more With the development of high-throughput metabolic technologies, a plethora of primary and secondary compounds have been detected in the plant cell. However, there are still major gaps in our understanding of the plant metabolome. This is especially true with regards to the compartmental localization of these identified metabolites. Non-aqueous fractionation (NAF) is a powerful technique for the determination of subcellular metabolite distributions in eukaryotic cells, and it has become the method of choice to analyze the distribution of a large number of metabolites concurrently. However, the NAF technique produces a continuous gradient of metabolite distributions, not discrete assignments. Resolution of these distributions requires computational analyses based on marker molecules to resolve compartmental localizations. In this article we focus on expanding the computational analysis of data derived from NAF. Along with an experimental workflow, we describe the critical steps in NAF...

Research paper thumbnail of Co-ordination and divergence of cell-specific transcription and translation of genes in arabidopsis root cells

Annals of Botany, 2014

† Background and Aims A key challenge in biology is to systematically investigate and integrate t... more † Background and Aims A key challenge in biology is to systematically investigate and integrate the different levels of information available at the global and single-cell level. Recent studies have elucidated spatiotemporal expression patterns of root cell types in Arabidopsis thaliana, and genome-wide quantification of polysome-associated mRNA levels, i.e. the translatome, has also been obtained for corresponding cell types. Translational control has been increasingly recognized as an important regulatory step in protein synthesis. The aim of this study was to investigate coupled transcription and translation by use of publicly available root datasets. † Methods Using cell-type-specific datasets of the root transcriptome and translatome of arabidopsis, a systematic assessment was made of the degree of coordination and divergence between these two levels of cellular organization. The computational analysis considered correlation and variation of expression across cell types at both system levels, and also provided insights into the degree of co-regulatory relationships that are preserved between the two processes. † Key Results The overall correlation of expression and translation levels of genes resemble an almost bimodal distribution (mean/median value of 0. 08/0. 12), with a second, less strongly pronounced 'mode' for negative Pearson's correlation coefficient values. The analysis conducted also confirms that previously identified key transcriptional activators of secondary cell wall development display highly conserved patterns of transcription and translation across the investigated cell types. Moreover, the biological processes that display conserved and divergent patterns based on the cell-type-specific expression and translation levels were identified. † Conclusions In agreement with previous studies in animal cells, a large degree of uncoupling was found between the transcriptome and translatome. However, components and processes were also identified that are under coordinated transcriptional and translational control in plant root cells.

Research paper thumbnail of Principal Components Analysis

Methods in Molecular Biology, 2012

ABSTRACT Principal components analysis (PCA) is a standard tool in multivariate data analysis to ... more ABSTRACT Principal components analysis (PCA) is a standard tool in multivariate data analysis to reduce the number of dimensions, while retaining as much as possible of the data&#39;s variation. Instead of investigating thousands of original variables, the first few components containing the majority of the data&#39;s variation are explored. The visualization and statistical analysis of these new variables, the principal components, can help to find similarities and differences between samples. Important original variables that are the major contributors to the first few components can be discovered as well.This chapter seeks to deliver a conceptual understanding of PCA as well as a mathematical description. We describe how PCA can be used to analyze different datasets, and we include practical code examples. Possible shortcomings of the methodology and ways to overcome these problems are also discussed.

Research paper thumbnail of Microsoft Word-Supplemental Material31 July2013. docx-226142Supplemental Material. pdf

Research paper thumbnail of An integrated genomic and metabolomic framework for cell wall biology in rice

BMC Genomics, 2014

Background: Plant cell walls are complex structures that full-fill many diverse functions during ... more Background: Plant cell walls are complex structures that full-fill many diverse functions during plant growth and development. It is therefore not surprising that thousands of gene products are involved in cell wall synthesis and maintenance. However, functional association for the majority of these gene products remains obscure. One useful approach to infer biological associations is via transcriptional coordination, or co-expression of genes. This approach has proved useful for several biological processes. Nevertheless, combining co-expression with other large-scale measurements may improve the biological inferences. Results: In this study, we used a combined approach of co-expression and cell wall metabolomics to obtain new insight into cell wall synthesis in rice. We initially created a weighted gene co-expression network from publicly available datasets, and then established a comprehensive cell wall dataset by determining cell wall compositions from 29 tissues that almost cover the whole life cycle of rice. We subsequently combined the datasets through the conversion of co-expressed gene modules into eigen-vectors, representing expression profiles for the genes in the modules, and performed comparative analyses against the cell wall contents. Here, we made three major discoveries. First, we confirmed our approach by finding primary and secondary wall cellulose biosynthesis modules, respectively. Second, we found co-expressed modules that strongly correlated with reorganization of the secondary cell walls and with modifications and degradation of hemicellulosic structures. Third, we inferred that at least one module is likely to play a regulatory role in the production of G-rich lignification. Conclusions: Here, we integrated transcriptomic associations and cell wall metabolism and found that certain co-expressed gene modules are positively correlated with distinct cell wall characteristics. We propose that combining multiple data-types, such as coordinated transcription and cell wall analyses, may be a useful approach to glean new insight into biological processes. The combination of multiple datasets, as illustrated here, can further improve the functional inferences that typically are generated via a single type of datasets. In addition, our data extend the typical co-expression approach to allow deeper insight into cell wall biology in rice.

Research paper thumbnail of Data Integration through Proximity-Based Networks Provides Biological Principles of Organization across Scales

The Plant Cell, 2013

Plant behaviors across levels of cellular organization, from biochemical components to tissues an... more Plant behaviors across levels of cellular organization, from biochemical components to tissues and organs, relate and reflect growth habitats. Quantification of the relationship between behaviors captured in various phenotypic characteristics and growth habitats can help reveal molecular mechanisms of plant adaptation. The aim of this article is to introduce the power of using statistics originally developed in the field of geographic variability analysis together with prominent network models in elucidating principles of biological organization. We provide a critical systematic review of the existing statistical and network-based approaches that can be employed to determine patterns of covariation from both uni- and multivariate phenotypic characteristics in plants. We demonstrate that parameter-independent network-based approaches result in robust insights about phenotypic covariation. These insights can be quantified and tested by applying well-established statistics combining th...

Research paper thumbnail of The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science

Frontiers in Genetics, 2012

Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become... more Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of speciesspecific biological knowledge, and warrants caution in the design principles employed in future ontologies.

Research paper thumbnail of Compromise of Multiple Time-Resolved Transcriptomics Experiments Identifies Tightly Regulated Functions

Frontiers in Plant Science, 2012

With the advent of high-throughput technologies for data acquisition from different components (i... more With the advent of high-throughput technologies for data acquisition from different components (i.e., genes, proteins, and metabolites) of a given biological system, generation of hypotheses, and biological interpretations based on multivariate data sets become increasingly important. These technologies allow for simultaneous gathering of data from the same biological components under different perturbations, including genotypic variation and/or changes in conditions, resulting in so-called multiple data tables. Moreover, these data tables are obtained over a well-chosen time domain to capture the dynamics of the response of the biological system to the perturbation. The computational problem we address in this study is twofold: (1) derive a single data table, referred to as a compromise, which captures information common to the investigated set of multiple tables and (2) identify biological components which contribute most to the determined compromise. Here we argue that recent extensions to principle component analysis called STATIS and dual-STATIS can be used to determine the compromise on which classical techniques for data analysis, such as clustering and term over-enrichment, can be subsequently applied. In addition, we illustrate that STATIS and dual-STATIS facilitate interpretations of a publically available transcriptomics data set capturing the time-resolved response of Arabidopsis thaliana to changing light and/or temperature conditions. We demonstrate that STATIS and dual-STATIS can be used not only to identify the components of a biological system whose behavior is similarly affected due to the perturbation (e.g., in time or condition), but also to specify the extent to which each dimension of the data tables reflect the perturbation. These findings ultimately provide insights in the components and pathways which could be under tight control in plant systems.

Research paper thumbnail of Decreased nucleotide and expression diversity and modified coexpression patterns characterize domestication in the common bean

Using RNA sequencing technology and de novo transcriptome assembly, we compared representative se... more Using RNA sequencing technology and de novo transcriptome assembly, we compared representative sets of wild and domesticated accessions of common bean (Phaseolus vulgaris) from Mesoamerica. RNA was extracted at the first true-leaf stage, and de novo assembly was used to develop a reference transcriptome; the final data set consists of ;190,000 single nucleotide polymorphisms from 27,243 contigs in expressed genomic regions. A drastic reduction in nucleotide diversity (;60%) is evident for the domesticated form, compared with the wild form, and almost 50% of the contigs that are polymorphic were brought to fixation by domestication. In parallel, the effects of domestication decreased the diversity of gene expression (18%). While the coexpression networks for the wild and domesticated accessions demonstrate similar seminal network properties, they show distinct community structures that are enriched for different molecular functions. After simulating the demographic dynamics during domestication, we found that 9% of the genes were actively selected during domestication. We also show that selection induced a further reduction in the diversity of gene expression (26%) and was associated with 5-fold enrichment of differentially expressed genes. While there is substantial evidence of positive selection associated with domestication, in a few cases, this selection has increased the nucleotide diversity in the domesticated pool at target loci associated with abiotic stress responses, flowering time, and morphology.

Research paper thumbnail of Concurrent Conditional Clustering of Multiple Networks: COCONETS

PLoS ONE, 2014

The accumulation of high-throughput data from different experiments has facilitated the extractio... more The accumulation of high-throughput data from different experiments has facilitated the extraction of condition-specific networks over the same set of biological entities. Comparing and contrasting of such multiple biological networks is in the center of differential network biology, aiming at determining general and condition-specific responses captured in the network structure (i.e., included associations between the network components). We provide a novel way for comparison of multiple networks based on determining network clustering (i.e., partition into communities) which is optimal across the set of networks with respect to a given cluster quality measure. To this end, we formulate the optimization-based problem of concurrent conditional clustering of multiple networks, termed COCONETS, based on the modularity. The solution to this problem is a clustering which depends on all considered networks and pinpoints their preserved substructures. We present theoretical results for special classes of networks to demonstrate the implications of conditionality captured by the COCONETS formulation. As the problem can be shown to be intractable, we extend an existing efficient greedy heuristic and applied it to determine concurrent conditional clusters on coexpression networks extracted from publically available timeresolved transcriptomics data of Escherichia coli under five stresses as well as on metabolite correlation networks from metabolomics data set from Arabidopsis thaliana exposed to eight environmental conditions. We demonstrate that the investigation of the differences between the clustering based on all networks with that obtained from a subset of networks can be used to quantify the specificity of biological responses. While a comparison of the Escherichia coli coexpression networks based on seminal properties does not pinpoint biologically relevant differences, the common network substructures extracted by COCONETS are supported by existing experimental evidence. Therefore, the comparison of multiple networks based on concurrent conditional clustering offers a novel venue for detection and investigation of preserved network substructures.

Research paper thumbnail of Principal Components Analysis

Everitt/Applied Multivariate Data Analysis, 2001

Research paper thumbnail of Additional role of O-acetylserine as a sulfur status-independent regulator during plant growth

The Plant Journal, 2012

O-acetylserine (OAS) is one of the most prominent metabolites whose levels are altered upon sulfu... more O-acetylserine (OAS) is one of the most prominent metabolites whose levels are altered upon sulfur starvation. However, its putative role as a signaling molecule in higher plants is controversial. This paper provides further evidence that OAS is a signaling molecule, based on computational analysis of time-series experiments and on studies of transgenic plants conditionally displaying increased OAS levels. Transcripts whose levels correlated with the transient and specific increase in OAS levels observed in leaves of Arabidopsis thaliana plants 5-10 min after transfer to darkness and with diurnal oscillation of the OAS content, showing a characteristic peak during the night, were identified. Induction of a serine-O-acetyltransferase gene (SERAT) in transgenic A. thaliana plants expressing the genes under the control of an inducible promoter resulted in a specific time-dependent increase in OAS levels. Monitoring the transcriptome response at time points at which no changes in sulfur-related metabolites except OAS were observed and correlating this with the light/dark transition and diurnal experiments resulted in identification of six genes whose expression was highly correlated with that of OAS (adenosine-5&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;-phosphosulfate reductase 3, sulfur-deficiency-induced 1, sulfur-deficiency-induced 2, low-sulfur-induced 1, serine hydroxymethyltransferase 7 and ChaC-like protein). These data suggest that OAS displays a signalling function leading to changes in transcript levels of a specific gene set irrespective of the sulfur status of the plant. Additionally, a role for OAS in a specific part of the sulfate response can be deduced.

Research paper thumbnail of High-density kinetic analysis of the metabolomic and transcriptomic response of Arabidopsis to eight environmental conditions

The Plant Journal, 2011

The time-resolved response of Arabidopsis thaliana towards changing light and/or temperature at t... more The time-resolved response of Arabidopsis thaliana towards changing light and/or temperature at the transcriptome and metabolome level is presented. Plants grown at 21°C with a light intensity of 150 lE m)2 sec)1 were either kept at this condition or transferred into seven different environments (4°C, darkness; 21°C, darkness; 32°C, darkness; 4°C, 85 lE m)2 sec)1 ; 21°C, 75 lE m)2 sec)1 ; 21°C, 300 lE m)2 sec)1 ; 32°C, 150 lE m)2 sec)1). Samples were taken before (0 min) and at 22 time points after transfer resulting in (8•) 22 time points covering both a linear and a logarithmic time series totaling 177 states. Hierarchical cluster analysis shows that individual conditions (defined by temperature and light) diverge into distinct trajectories at condition-dependent times and that the metabolome follows different kinetics from the transcriptome. The metabolic responses are initially relatively faster when compared with the transcriptional responses. Gene Ontology over-representation analysis identifies a common response for all changed conditions at the transcriptome level during the early response phase (5-60 min). Metabolic networks reconstructed via metabolite-metabolite correlations reveal extensive environment-specific rewiring. Detailed analysis identifies conditional connections between amino acids and intermediates of the tricarboxylic acid cycle. Parallel analysis of transcriptional changes strongly support a model where in the absence of photosynthesis at normal/high temperatures protein degradation occurs rapidly and subsequent amino acid catabolism serves as the main cellular energy supply. These results thus demonstrate the engagement of the electron transfer flavoprotein system under short-term environmental perturbations.