John Prince - Academia.edu (original) (raw)

Papers by John Prince

Research paper thumbnail of Insulin increases ceramide synthesis in skeletal muscle

Journal of diabetes research, 2014

The purpose of this study was to determine the effect of insulin on ceramide metabolism in skelet... more The purpose of this study was to determine the effect of insulin on ceramide metabolism in skeletal muscle. Skeletal muscle cells were treated with insulin with or without palmitate for various time periods. Lipids (ceramides and TAG) were isolated and gene expression of multiple biosynthetic enzymes were quantified. Additionally, adult male mice received daily insulin injections for 14 days, followed by muscle ceramide analysis. In muscle cells, insulin elicited an increase in ceramides comparable to palmitate alone. This is likely partly due to an insulin-induced increase in expression of multiple enzymes, particularly SPT2, which, when knocked down, prevented the increase in ceramides. In mice, 14 days of insulin injection resulted in increased soleus ceramides, but not TAG. However, insulin injections did significantly increase hepatic TAG compared with vehicle-injected animals. This study suggests that insulin elicits an anabolic effect on sphingolipid metabolism in skeletal mu...

Research paper thumbnail of A Comprehensive Protein-protein Interactome for Yeast PAS Kinase 1 Reveals Direct Inhibition of Respiration Through the Phosphorylation of Cbf1

Per-Arnt-Sim (PAS) kinase is a sensory protein kinase required for glucose homeostasis in yeast, ... more Per-Arnt-Sim (PAS) kinase is a sensory protein kinase required for glucose homeostasis in yeast, mice, and humans, yet little is known about the molecular mechanisms of its function. Using both yeast two-hybrid and copurification approaches, we identified the protein-protein interactome for yeast PAS kinase 1 (Psk1), revealing 93 novel putative protein binding partners. Several of the Psk1 binding partners expand the role of PAS kinase in glucose homeostasis, including new pathways involved in mitochondrial metabolism. In addition, the interactome suggests novel roles for PAS kinase in cell growth (gene/protein expression, replication/cell division, and protein modification and degradation), vacuole function, and stress tolerance. In vitro kinase studies using a subset of 25 of these binding partners identified Mot3, Zds1, Utr1, and Cbf1 as substrates. Further evidence is provided for the in vivo phosphorylation of Cbf1 at T211/T212 and for the subsequent inhibition of respiration. This respiratory role of PAS kinase is consistent with the reported hypermetabolism of PAS kinase-deficient mice, identifying a possible molecular mechanism and solidifying the evolutionary importance of PAS kinase in the regulation of glucose homeostasis.

Research paper thumbnail of Metriculator: quality assessment for mass spectrometry-based proteomics

Bioinformatics, Sep 2, 2013

Quality control in mass spectrometry-based proteomics remains subjective, labor-intensive and inc... more Quality control in mass spectrometry-based proteomics remains subjective, labor-intensive and inconsistent between laboratories. We introduce Metriculator, a software designed to facilitate long-term storage of extensive performance metrics as introduced by NIST in 2010. Metriculator features a web interface that generates interactive comparison plots for contextual understanding of metric values and an automated metric generation toolkit. The comparison plots are designed for at-a-glance determination of outliers and trends in the datasets, together with relevant statistical comparisons. Easyto-use quantitative comparisons and a framework for integration plugins will encourage a culture of quality assurance within the proteomics community.

Research paper thumbnail of Rubabel: wrapping open Babel with Ruby

Journal of Cheminformatics, 2013

Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the divers... more Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating, time-consuming, arcane, and verbose. Although high-level language wrappers have been implemented, more can be done to leverage the intuitiveness of object-orientation, the paradigms of high-level languages, and the extensibility of languages such as Ruby. We introduce Rubabel, an intuitive, object-oriented suite of functionality that substantially increases the accessibily of the tools in the Open Babel chemoinformatics library. Results: Rubabel requires fewer lines of code than any other actively developed wrapper, providing better object organization and navigation, and more intuitive object behavior than extant solutions. Moreover, Rubabel provides a convenient interface to the many extensions currently available in Ruby, greatly streamlining otherwise onerous tasks such as creating web applications that serve up Rubabel functionality. Conclusions: Rubabel is powerful, intuitive, concise, freely available, cross-platform, and easy to install. We expect it to be a platform of choice for new users, Ruby users, and some users of current solutions.

Research paper thumbnail of Annotated Chromatographic Isotope Features from a Highly Complex Mass Spectrometry Proteomic Dataset (MOUSE) for Feature Detection Algorithm Evaluation

Annotated chromatographic isotope features from a highly complex mass spectrometry proteomic data... more Annotated chromatographic isotope features from a highly complex mass spectrometry proteomic dataset (MOUSE) for feature detection algorithm evaluation

Research paper thumbnail of A coherent mathematical characterization of isotope trace extraction, isotopic envelope extraction, and LC-MS correspondence

BMC bioinformatics, Jan 23, 2015

Liquid chromatography-mass spectrometry is a popular technique for high-throughput protein, lipid... more Liquid chromatography-mass spectrometry is a popular technique for high-throughput protein, lipid, and metabolite comparative analysis. Such statistical comparison of millions of data points requires the generation of an inter-run correspondence. Though many techniques for generating this correspondence exist, few if any, address certain well-known run-to-run LC-MS behaviors such as elution order swaps, unbounded retention time swaps, missing data, and significant differences in abundance. Moreover, not all extant correspondence methods leverage the rich discriminating information offered by isotope envelope extraction informed by isotope trace extraction. To date, no attempt has been made to create a formal generalization of extant algorithms for these problems. By enumerating extant objective functions for these problems, we elucidate discrepancies between known LC-MS data behavior and extant approaches. We propose novel objective functions that more closely model known LC-MS beha...

Research paper thumbnail of Current controlled vocabularies are insufficient to uniquely map molecular entities to mass spectrometry signal

BMC bioinformatics, Jan 23, 2015

The comparison of analyte mass spectrometry precursor (MS1) signal is central to many proteomic (... more The comparison of analyte mass spectrometry precursor (MS1) signal is central to many proteomic (and other -omic) workflows. Standard vocabularies for mass spectrometry exist and provide good coverage for most experimental applications yet are insufficient for concise and unambiguous description of data concepts spanning the range of signal provenance from a molecular perspective (e.g. from charged peptides down to fine isotopes). Without a standard unambiguous nomenclature, literature searches, algorithm reproducibility and algorithm evaluation for MS-omics data processing are nearly impossible. We show how terms from current official ontologies are too vague or ambiguous to explicitly map molecular entities to MS signals and we illustrate the inconsistency and ambiguity of current colloquially used terms. We also propose a set of terms for MS1 signal that uniquely, succinctly and intuitively describe data concepts spanning the range of signal provenance from full molecule downs to...

Research paper thumbnail of Structures of the Gβ-CCT and PhLP1-Gβ-CCT complexes reveal a mechanism for G-protein β-subunit folding and Gβγ dimer assembly

Proceedings of the National Academy of Sciences of the United States of America, Jan 24, 2015

G-protein signaling depends on the ability of the individual subunits of the G-protein heterotrim... more G-protein signaling depends on the ability of the individual subunits of the G-protein heterotrimer to assemble into a functional complex. Formation of the G-protein βγ (Gβγ) dimer is particularly challenging because it is an obligate dimer in which the individual subunits are unstable on their own. Recent studies have revealed an intricate chaperone system that brings Gβ and Gγ together. This system includes cytosolic chaperonin containing TCP-1 (CCT; also called TRiC) and its cochaperone phosducin-like protein 1 (PhLP1). Two key intermediates in the Gβγ assembly process, the Gβ-CCT and the PhLP1-Gβ-CCT complexes, were isolated and analyzed by a hybrid structural approach using cryo-electron microscopy, chemical cross-linking coupled with mass spectrometry, and unnatural amino acid cross-linking. The structures show that Gβ interacts with CCT in a near-native state through interactions of the Gγ-binding region of Gβ with the CCTγ subunit. PhLP1 binding stabilizes the Gβ fold, disru...

Research paper thumbnail of Elucidation and Improvement of Algorithms for Mass Spectrometry Isotope Trace Detection

Mass spectrometry facilitates cutting edge advancements in many fields. Although instrumentation ... more Mass spectrometry facilitates cutting edge advancements in many fields. Although instrumentation has advanced dramatically in the last 100 years, data processing algorithms have not kept pace. Without sensitive and accurate signal segmentation algorithms, the utility of mass spectrometry is limited. In this dissertation, we provide an overview and analysis of mass spectrometry data processing. A tutorial to ease the learning curve for those outside the field is provided. We draw attention to the lack of critical evaluation in the field and describe the resulting effects, including a glut of algorithm contributions of questionable novel contribution. To facilitate increased critical evaluation, we show the importance of a modular paradigm for mass spectrometry data processing through highlighting the impact of data processing algorithm choice upon experimental results. Our novel controlled vocabulary is presented with the aim of facilitating literature reviews for comparisons. We propose a novel nomenclature and mathematical characterization of mass spectrometry data. We present several novel algorithms for mass spectrometry data segmentation that outperform existing standard approaches. We end with an overview of future research which will continue to advance the state of the art in mass spectrometry data processing.

Research paper thumbnail of The Genomes, Proteomes, and Structures of Three Novel Phages That Infect the Bacillus cereus Group and Carry Putative Virulence Factors

Journal of Virology, 2014

This article reports the results of studying three novel bacteriophages, JL, Shanette, and Basili... more This article reports the results of studying three novel bacteriophages, JL, Shanette, and Basilisk, which infect the pathogen Bacillus cereus and carry genes that may contribute to its pathogenesis. We analyzed host range and superinfection ability, mapped their genomes, and characterized phage structure by mass spectrometry and transmission electron microscopy (TEM). The JL and Shanette genomes were 96% similar and contained 217 open reading frames (ORFs) and 220 ORFs, respectively, while Basilisk has an unrelated genome containing 138 ORFs. Mass spectrometry revealed 23 phage particle proteins for JL and 15 for Basilisk, while only 11 and 4, respectively, were predicted to be present by sequence analysis. Structural protein homology to well-characterized phages suggested that JL and Shanette were members of the family Myoviridae , which was confirmed by TEM. The third phage, Basilisk, was similar only to uncharacterized phages and is an unrelated siphovirus. Cryogenic electron mi...

Research paper thumbnail of Automated structural classification of lipids by machine learning

Bioinformatics (Oxford, England), Jan 29, 2014

Modern lipidomics is largely dependent upon structural ontologies because of the great diversity ... more Modern lipidomics is largely dependent upon structural ontologies because of the great diversity exhibited in the lipidome, but no automated lipid classification exists to facilitate this partitioning. The size of the putative lipidome far exceeds the number currently classified, despite a decade of work. Automated classification would benefit ongoing classification efforts by decreasing the time needed and increasing the accuracy of classification while providing classifications for mass spectral identification algorithms. We introduce a tool that automates classification into the LIPID MAPS ontology of known lipids with >95% accuracy and novel lipids with 63% accuracy. The classification is based upon simple chemical characteristics and modern machine learning algorithms. The decision trees produced are intelligible and can be used to clarify implicit assumptions about the current LIPID MAPS classification scheme. These characteristics and decision trees are made available to f...

Research paper thumbnail of Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view

BMC bioinformatics, 2014

For decades, mass spectrometry data has been analyzed to investigate a wide array of research int... more For decades, mass spectrometry data has been analyzed to investigate a wide array of research interests, including disease diagnostics, biological and chemical theory, genomics, and drug development. Progress towards solving any of these disparate problems depends upon overcoming the common challenge of interpreting the large data sets generated. Despite interim successes, many data interpretation problems in mass spectrometry are still challenging. Further, though these challenges are inherently interdisciplinary in nature, the significant domain-specific knowledge gap between disciplines makes interdisciplinary contributions difficult. This paper provides an introduction to the burgeoning field of computational mass spectrometry. We illustrate key concepts, vocabulary, and open problems in MS-omics, as well as provide invaluable resources such as open data sets and key search terms and references. This paper will facilitate contributions from mathematicians, computer scientists, a...

Research paper thumbnail of Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection

Bioinformatics (Oxford, England), Jan 15, 2014

Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrom... more Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrometry (XC-MS) data analysis that faces a multitude of technical challenges on complex samples. The Kalman filter (KF) application to IT detection addresses some of these challenges; it discriminates closely eluting ITs in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet, the behavior of this KF application has not been fully characterized, as no cost-free open-source implementation exists and incomplete evaluation standards for IT detection persist. Massifquant is an open-source solution for KF IT detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative IT detection. Compared with centWave, matchedFilter and MZMine2-alternative IT detection engines-Massifquant detected more true ITs in a real...

Research paper thumbnail of The need for a public proteomics repository

Nature Biotechnology, 2004

Research paper thumbnail of The Case of the Disappearing Drug Target

Research paper thumbnail of Programmed Cell Death Protein 5 Interacts with the Cytosolic Chaperonin Containing Tailless Complex Polypeptide 1 (CCT) to Regulate β-Tubulin Folding

Journal of Biological Chemistry, 2013

Background: Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic f... more Background: Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic factor with tumor suppressor capabilities. Results: PDCD5 forms a complex with the cytosolic chaperonin CCT and inhibits ␤-tubulin folding. Conclusion: PDCD5 functions as a modulator of CCT to regulate ␤-tubulin folding. Significance: PDCD5 may exert its pro-apoptotic function by blocking ␤-tubulin folding. Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic factor and tumor suppressor. However, the mechanisms underlying its apoptotic function are largely unknown. A proteomics search for binding partners of phosducin-like protein, a co-chaperone for the cytosolic chaperonin containing tailless complex polypeptide 1 (CCT), revealed a robust interaction between PDCD5 and CCT. PDCD5 formed a complex with CCT and ␤-tubulin, a key CCT-folding substrate, and specifically inhibited ␤-tubulin folding. Cryo-electron microscopy studies of the PDCD5⅐CCT complex suggested a possible mechanism of inhibition of ␤-tubulin folding. PDCD5 bound the apical domain of the CCT␤ subunit, projecting above the folding cavity without entering it. Like PDCD5, ␤-tubulin also interacts with the CCT␤ apical domain, but a second site is found at the sensor loop deep within the folding cavity. These orientations of PDCD5 and ␤-tubulin suggest that PDCD5 sterically interferes with ␤-tubulin binding to the CCT␤ apical domain and inhibits ␤-tubulin folding. Given the importance of tubulins in cell division and proliferation, PDCD5 might exert its apoptotic function at least in part through inhibition of ␤-tubulin folding. A fundamental question in biology is how proteins, which are synthesized by the ribosome as a linear sequence of amino acids, fold into their native functional state. It is now clear that many proteins require the assistance of molecular chaperones to maneuver through the folding process. Molecular chaperones are themselves proteins that protect newly synthesized or unfolded proteins from aggregation and help them reach their native state in the very concentrated protein environment of the cell (1). One important class of molecular chaperones is the chaperonins, which are large multisubunit complexes that form stacked double-ring structures with a central cavity in each ring. These cavities provide an isolated environment for client proteins to bind and fold (2, 3). Each subunit consists of three domains as follows: an equatorial domain that binds and hydrolyzes ATP, an apical domain that traps substrates, and an intermediate domain that connects the two other domains and facilitates interdomain communication (3). There are two types of chaperonins. Group I chaperonins are found in bacteria (i.e. GroEL from Escherichia coli), mitochondria, and chloroplasts (Hsp60). Their ring structures are composed of seven identical subunits that bind and hydrolyze ATP. Binding of ATP is coordinated with encapsulation of substrates within the folding cavity by a co-chaperone called GroES in E. coli and Hsp10 in eukaryotes (1, 2). The group II chaperonins are found in archaebacteria (named thermosomes) and in the eukaryotic cytosol (CCT, 3 cytosolic chaperonin containing tailless complex polypeptide 1, also called TRiC). CCT is the most complex of all the chaperonins with each of the two rings composed of eight paralogous subunits that orchestrate the folding of many proteins, with the most abundant substrates being actins and tubulins (3). CCT substrates tend to have complex domain topologies and range up to ϳ70 kDa in size (4, 5). Nascent polypeptides or denatured proteins bind inside the folding cavity to regions of both the equatorial and apical domains of the CCT subunits (6). The process of ATP binding and hydrolysis induces dramatic conformational changes in the apical domains that result in closure of the folding cavity by finger-like

Research paper thumbnail of Mass spectrometry of the M. smegmatis proteome: Protein expression levels correlate with function, operons, and codon bias

Genome Research, 2005

The fast-growing bacterium Mycobacterium smegmatis is a model mycobacterial system, a nonpathogen... more The fast-growing bacterium Mycobacterium smegmatis is a model mycobacterial system, a nonpathogenic soil bacterium that nonetheless shares many features with the pathogenic Mycobacterium tuberculosis, the causative agent of tuberculosis. The study of M. smegmatis is expected to shed light on mechanisms of mycobacterial growth and complex lipid metabolism, and provides a tractable system for antimycobacterial drug development. Although the M. smegmatis genome sequence is not yet completed, we used multidimensional chromatography and tandem mass spectrometry, in combination with the partially completed genome sequence, to detect and identify a total of 901 distinct proteins from M. smegmatis over the course of 25 growth conditions, providing experimental annotation for many predicted genes with an ∼5% false-positive identification rate. We observed numerous proteins involved in energy production (9.8% of expressed proteins), protein translation (8.7%), and lipid biosynthesis (5.4%); 3...

Research paper thumbnail of AICAR inhibits ceramide biosynthesis in skeletal muscle

Diabetology & Metabolic Syndrome, 2012

Background The worldwide prevalence of obesity has lead to increased efforts to find therapies to... more Background The worldwide prevalence of obesity has lead to increased efforts to find therapies to treat obesity-related pathologies. Ceramide is a well-established mediator of several health problems that arise from adipose tissue expansion. The purpose of this study was to determine whether AICAR, an AMPK-activating drug, selectively reduces skeletal muscle ceramide synthesis. Methods Murine myotubes and rats were challenged with palmitate and high-fat diet, respectively, to induce ceramide accrual, in the absence or presence of AICAR. Transcript levels of the rate-limiting enzyme in ceramide biosynthesis, serine palmitoyltransferase 2 (SPT2) were measured, in addition to lipid analysis. Student’s t-test and ANOVA were used to assess the association between outcomes and groups. Results Palmitate alone induced an increase in serine palmitoyltransferase 2 (SPT2) expression and an elevation of ceramide levels in myotubes. Co-incubation with palmitate and AICAR prevented both effects. ...

Research paper thumbnail of LC-MS alignment in theory and practice: a comprehensive algorithmic review

Briefings in Bioinformatics, 2013

Liquid chromatography-mass spectrometry is widely used for comparative replicate sample analysis ... more Liquid chromatography-mass spectrometry is widely used for comparative replicate sample analysis in proteomics, lipidomics and metabolomics. Before statistical comparison, registration must be established to match corresponding analytes from run to run. Alignment, the most popular correspondence approach, consists of constructing a function that warps the content of runs to most closely match a given reference sample. To date, dozens of correspondence algorithms have been proposed, creating a daunting challenge for practitioners in algorithm selection. Yet, existing reviews have highlighted only a few approaches. In this review, we describe 50 correspondence algorithms to facilitate practical algorithm selection. We elucidate the motivation for correspondence and analyze the limitations of current approaches, which include prohibitive runtimes, numerous user parameters, model limitations and the need for reference samples. We suggest and describe a paradigm shift for overcoming current correspondence limitations by building on known liquid chromatography-mass spectrometry behavior.

Research paper thumbnail of Controlling for confounding variables in MS-omics protocol: why modularity matters

Briefings in Bioinformatics, 2013

As the field of bioinformatics research continues to grow, more and more novel techniques are pro... more As the field of bioinformatics research continues to grow, more and more novel techniques are proposed to meet new challenges and improvements upon solutions to long-standing problems. These include data processing techniques and wet lab protocol techniques. Although the literature is consistently thorough in experimental detail and variable-controlling rigor for wet lab protocol techniques, bioinformatics techniques tend to be less described and less controlled. As the validation or rejection of hypotheses rests on the experiment's ability to isolate and measure a variable of interest, we urge the importance of reducing confounding variables in bioinformatics techniques during mass spectrometry experimentation.

Research paper thumbnail of Insulin increases ceramide synthesis in skeletal muscle

Journal of diabetes research, 2014

The purpose of this study was to determine the effect of insulin on ceramide metabolism in skelet... more The purpose of this study was to determine the effect of insulin on ceramide metabolism in skeletal muscle. Skeletal muscle cells were treated with insulin with or without palmitate for various time periods. Lipids (ceramides and TAG) were isolated and gene expression of multiple biosynthetic enzymes were quantified. Additionally, adult male mice received daily insulin injections for 14 days, followed by muscle ceramide analysis. In muscle cells, insulin elicited an increase in ceramides comparable to palmitate alone. This is likely partly due to an insulin-induced increase in expression of multiple enzymes, particularly SPT2, which, when knocked down, prevented the increase in ceramides. In mice, 14 days of insulin injection resulted in increased soleus ceramides, but not TAG. However, insulin injections did significantly increase hepatic TAG compared with vehicle-injected animals. This study suggests that insulin elicits an anabolic effect on sphingolipid metabolism in skeletal mu...

Research paper thumbnail of A Comprehensive Protein-protein Interactome for Yeast PAS Kinase 1 Reveals Direct Inhibition of Respiration Through the Phosphorylation of Cbf1

Per-Arnt-Sim (PAS) kinase is a sensory protein kinase required for glucose homeostasis in yeast, ... more Per-Arnt-Sim (PAS) kinase is a sensory protein kinase required for glucose homeostasis in yeast, mice, and humans, yet little is known about the molecular mechanisms of its function. Using both yeast two-hybrid and copurification approaches, we identified the protein-protein interactome for yeast PAS kinase 1 (Psk1), revealing 93 novel putative protein binding partners. Several of the Psk1 binding partners expand the role of PAS kinase in glucose homeostasis, including new pathways involved in mitochondrial metabolism. In addition, the interactome suggests novel roles for PAS kinase in cell growth (gene/protein expression, replication/cell division, and protein modification and degradation), vacuole function, and stress tolerance. In vitro kinase studies using a subset of 25 of these binding partners identified Mot3, Zds1, Utr1, and Cbf1 as substrates. Further evidence is provided for the in vivo phosphorylation of Cbf1 at T211/T212 and for the subsequent inhibition of respiration. This respiratory role of PAS kinase is consistent with the reported hypermetabolism of PAS kinase-deficient mice, identifying a possible molecular mechanism and solidifying the evolutionary importance of PAS kinase in the regulation of glucose homeostasis.

Research paper thumbnail of Metriculator: quality assessment for mass spectrometry-based proteomics

Bioinformatics, Sep 2, 2013

Quality control in mass spectrometry-based proteomics remains subjective, labor-intensive and inc... more Quality control in mass spectrometry-based proteomics remains subjective, labor-intensive and inconsistent between laboratories. We introduce Metriculator, a software designed to facilitate long-term storage of extensive performance metrics as introduced by NIST in 2010. Metriculator features a web interface that generates interactive comparison plots for contextual understanding of metric values and an automated metric generation toolkit. The comparison plots are designed for at-a-glance determination of outliers and trends in the datasets, together with relevant statistical comparisons. Easyto-use quantitative comparisons and a framework for integration plugins will encourage a culture of quality assurance within the proteomics community.

Research paper thumbnail of Rubabel: wrapping open Babel with Ruby

Journal of Cheminformatics, 2013

Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the divers... more Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating, time-consuming, arcane, and verbose. Although high-level language wrappers have been implemented, more can be done to leverage the intuitiveness of object-orientation, the paradigms of high-level languages, and the extensibility of languages such as Ruby. We introduce Rubabel, an intuitive, object-oriented suite of functionality that substantially increases the accessibily of the tools in the Open Babel chemoinformatics library. Results: Rubabel requires fewer lines of code than any other actively developed wrapper, providing better object organization and navigation, and more intuitive object behavior than extant solutions. Moreover, Rubabel provides a convenient interface to the many extensions currently available in Ruby, greatly streamlining otherwise onerous tasks such as creating web applications that serve up Rubabel functionality. Conclusions: Rubabel is powerful, intuitive, concise, freely available, cross-platform, and easy to install. We expect it to be a platform of choice for new users, Ruby users, and some users of current solutions.

Research paper thumbnail of Annotated Chromatographic Isotope Features from a Highly Complex Mass Spectrometry Proteomic Dataset (MOUSE) for Feature Detection Algorithm Evaluation

Annotated chromatographic isotope features from a highly complex mass spectrometry proteomic data... more Annotated chromatographic isotope features from a highly complex mass spectrometry proteomic dataset (MOUSE) for feature detection algorithm evaluation

Research paper thumbnail of A coherent mathematical characterization of isotope trace extraction, isotopic envelope extraction, and LC-MS correspondence

BMC bioinformatics, Jan 23, 2015

Liquid chromatography-mass spectrometry is a popular technique for high-throughput protein, lipid... more Liquid chromatography-mass spectrometry is a popular technique for high-throughput protein, lipid, and metabolite comparative analysis. Such statistical comparison of millions of data points requires the generation of an inter-run correspondence. Though many techniques for generating this correspondence exist, few if any, address certain well-known run-to-run LC-MS behaviors such as elution order swaps, unbounded retention time swaps, missing data, and significant differences in abundance. Moreover, not all extant correspondence methods leverage the rich discriminating information offered by isotope envelope extraction informed by isotope trace extraction. To date, no attempt has been made to create a formal generalization of extant algorithms for these problems. By enumerating extant objective functions for these problems, we elucidate discrepancies between known LC-MS data behavior and extant approaches. We propose novel objective functions that more closely model known LC-MS beha...

Research paper thumbnail of Current controlled vocabularies are insufficient to uniquely map molecular entities to mass spectrometry signal

BMC bioinformatics, Jan 23, 2015

The comparison of analyte mass spectrometry precursor (MS1) signal is central to many proteomic (... more The comparison of analyte mass spectrometry precursor (MS1) signal is central to many proteomic (and other -omic) workflows. Standard vocabularies for mass spectrometry exist and provide good coverage for most experimental applications yet are insufficient for concise and unambiguous description of data concepts spanning the range of signal provenance from a molecular perspective (e.g. from charged peptides down to fine isotopes). Without a standard unambiguous nomenclature, literature searches, algorithm reproducibility and algorithm evaluation for MS-omics data processing are nearly impossible. We show how terms from current official ontologies are too vague or ambiguous to explicitly map molecular entities to MS signals and we illustrate the inconsistency and ambiguity of current colloquially used terms. We also propose a set of terms for MS1 signal that uniquely, succinctly and intuitively describe data concepts spanning the range of signal provenance from full molecule downs to...

Research paper thumbnail of Structures of the Gβ-CCT and PhLP1-Gβ-CCT complexes reveal a mechanism for G-protein β-subunit folding and Gβγ dimer assembly

Proceedings of the National Academy of Sciences of the United States of America, Jan 24, 2015

G-protein signaling depends on the ability of the individual subunits of the G-protein heterotrim... more G-protein signaling depends on the ability of the individual subunits of the G-protein heterotrimer to assemble into a functional complex. Formation of the G-protein βγ (Gβγ) dimer is particularly challenging because it is an obligate dimer in which the individual subunits are unstable on their own. Recent studies have revealed an intricate chaperone system that brings Gβ and Gγ together. This system includes cytosolic chaperonin containing TCP-1 (CCT; also called TRiC) and its cochaperone phosducin-like protein 1 (PhLP1). Two key intermediates in the Gβγ assembly process, the Gβ-CCT and the PhLP1-Gβ-CCT complexes, were isolated and analyzed by a hybrid structural approach using cryo-electron microscopy, chemical cross-linking coupled with mass spectrometry, and unnatural amino acid cross-linking. The structures show that Gβ interacts with CCT in a near-native state through interactions of the Gγ-binding region of Gβ with the CCTγ subunit. PhLP1 binding stabilizes the Gβ fold, disru...

Research paper thumbnail of Elucidation and Improvement of Algorithms for Mass Spectrometry Isotope Trace Detection

Mass spectrometry facilitates cutting edge advancements in many fields. Although instrumentation ... more Mass spectrometry facilitates cutting edge advancements in many fields. Although instrumentation has advanced dramatically in the last 100 years, data processing algorithms have not kept pace. Without sensitive and accurate signal segmentation algorithms, the utility of mass spectrometry is limited. In this dissertation, we provide an overview and analysis of mass spectrometry data processing. A tutorial to ease the learning curve for those outside the field is provided. We draw attention to the lack of critical evaluation in the field and describe the resulting effects, including a glut of algorithm contributions of questionable novel contribution. To facilitate increased critical evaluation, we show the importance of a modular paradigm for mass spectrometry data processing through highlighting the impact of data processing algorithm choice upon experimental results. Our novel controlled vocabulary is presented with the aim of facilitating literature reviews for comparisons. We propose a novel nomenclature and mathematical characterization of mass spectrometry data. We present several novel algorithms for mass spectrometry data segmentation that outperform existing standard approaches. We end with an overview of future research which will continue to advance the state of the art in mass spectrometry data processing.

Research paper thumbnail of The Genomes, Proteomes, and Structures of Three Novel Phages That Infect the Bacillus cereus Group and Carry Putative Virulence Factors

Journal of Virology, 2014

This article reports the results of studying three novel bacteriophages, JL, Shanette, and Basili... more This article reports the results of studying three novel bacteriophages, JL, Shanette, and Basilisk, which infect the pathogen Bacillus cereus and carry genes that may contribute to its pathogenesis. We analyzed host range and superinfection ability, mapped their genomes, and characterized phage structure by mass spectrometry and transmission electron microscopy (TEM). The JL and Shanette genomes were 96% similar and contained 217 open reading frames (ORFs) and 220 ORFs, respectively, while Basilisk has an unrelated genome containing 138 ORFs. Mass spectrometry revealed 23 phage particle proteins for JL and 15 for Basilisk, while only 11 and 4, respectively, were predicted to be present by sequence analysis. Structural protein homology to well-characterized phages suggested that JL and Shanette were members of the family Myoviridae , which was confirmed by TEM. The third phage, Basilisk, was similar only to uncharacterized phages and is an unrelated siphovirus. Cryogenic electron mi...

Research paper thumbnail of Automated structural classification of lipids by machine learning

Bioinformatics (Oxford, England), Jan 29, 2014

Modern lipidomics is largely dependent upon structural ontologies because of the great diversity ... more Modern lipidomics is largely dependent upon structural ontologies because of the great diversity exhibited in the lipidome, but no automated lipid classification exists to facilitate this partitioning. The size of the putative lipidome far exceeds the number currently classified, despite a decade of work. Automated classification would benefit ongoing classification efforts by decreasing the time needed and increasing the accuracy of classification while providing classifications for mass spectral identification algorithms. We introduce a tool that automates classification into the LIPID MAPS ontology of known lipids with >95% accuracy and novel lipids with 63% accuracy. The classification is based upon simple chemical characteristics and modern machine learning algorithms. The decision trees produced are intelligible and can be used to clarify implicit assumptions about the current LIPID MAPS classification scheme. These characteristics and decision trees are made available to f...

Research paper thumbnail of Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view

BMC bioinformatics, 2014

For decades, mass spectrometry data has been analyzed to investigate a wide array of research int... more For decades, mass spectrometry data has been analyzed to investigate a wide array of research interests, including disease diagnostics, biological and chemical theory, genomics, and drug development. Progress towards solving any of these disparate problems depends upon overcoming the common challenge of interpreting the large data sets generated. Despite interim successes, many data interpretation problems in mass spectrometry are still challenging. Further, though these challenges are inherently interdisciplinary in nature, the significant domain-specific knowledge gap between disciplines makes interdisciplinary contributions difficult. This paper provides an introduction to the burgeoning field of computational mass spectrometry. We illustrate key concepts, vocabulary, and open problems in MS-omics, as well as provide invaluable resources such as open data sets and key search terms and references. This paper will facilitate contributions from mathematicians, computer scientists, a...

Research paper thumbnail of Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection

Bioinformatics (Oxford, England), Jan 15, 2014

Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrom... more Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrometry (XC-MS) data analysis that faces a multitude of technical challenges on complex samples. The Kalman filter (KF) application to IT detection addresses some of these challenges; it discriminates closely eluting ITs in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet, the behavior of this KF application has not been fully characterized, as no cost-free open-source implementation exists and incomplete evaluation standards for IT detection persist. Massifquant is an open-source solution for KF IT detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative IT detection. Compared with centWave, matchedFilter and MZMine2-alternative IT detection engines-Massifquant detected more true ITs in a real...

Research paper thumbnail of The need for a public proteomics repository

Nature Biotechnology, 2004

Research paper thumbnail of The Case of the Disappearing Drug Target

Research paper thumbnail of Programmed Cell Death Protein 5 Interacts with the Cytosolic Chaperonin Containing Tailless Complex Polypeptide 1 (CCT) to Regulate β-Tubulin Folding

Journal of Biological Chemistry, 2013

Background: Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic f... more Background: Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic factor with tumor suppressor capabilities. Results: PDCD5 forms a complex with the cytosolic chaperonin CCT and inhibits ␤-tubulin folding. Conclusion: PDCD5 functions as a modulator of CCT to regulate ␤-tubulin folding. Significance: PDCD5 may exert its pro-apoptotic function by blocking ␤-tubulin folding. Programmed cell death protein 5 (PDCD5) has been proposed to act as a pro-apoptotic factor and tumor suppressor. However, the mechanisms underlying its apoptotic function are largely unknown. A proteomics search for binding partners of phosducin-like protein, a co-chaperone for the cytosolic chaperonin containing tailless complex polypeptide 1 (CCT), revealed a robust interaction between PDCD5 and CCT. PDCD5 formed a complex with CCT and ␤-tubulin, a key CCT-folding substrate, and specifically inhibited ␤-tubulin folding. Cryo-electron microscopy studies of the PDCD5⅐CCT complex suggested a possible mechanism of inhibition of ␤-tubulin folding. PDCD5 bound the apical domain of the CCT␤ subunit, projecting above the folding cavity without entering it. Like PDCD5, ␤-tubulin also interacts with the CCT␤ apical domain, but a second site is found at the sensor loop deep within the folding cavity. These orientations of PDCD5 and ␤-tubulin suggest that PDCD5 sterically interferes with ␤-tubulin binding to the CCT␤ apical domain and inhibits ␤-tubulin folding. Given the importance of tubulins in cell division and proliferation, PDCD5 might exert its apoptotic function at least in part through inhibition of ␤-tubulin folding. A fundamental question in biology is how proteins, which are synthesized by the ribosome as a linear sequence of amino acids, fold into their native functional state. It is now clear that many proteins require the assistance of molecular chaperones to maneuver through the folding process. Molecular chaperones are themselves proteins that protect newly synthesized or unfolded proteins from aggregation and help them reach their native state in the very concentrated protein environment of the cell (1). One important class of molecular chaperones is the chaperonins, which are large multisubunit complexes that form stacked double-ring structures with a central cavity in each ring. These cavities provide an isolated environment for client proteins to bind and fold (2, 3). Each subunit consists of three domains as follows: an equatorial domain that binds and hydrolyzes ATP, an apical domain that traps substrates, and an intermediate domain that connects the two other domains and facilitates interdomain communication (3). There are two types of chaperonins. Group I chaperonins are found in bacteria (i.e. GroEL from Escherichia coli), mitochondria, and chloroplasts (Hsp60). Their ring structures are composed of seven identical subunits that bind and hydrolyze ATP. Binding of ATP is coordinated with encapsulation of substrates within the folding cavity by a co-chaperone called GroES in E. coli and Hsp10 in eukaryotes (1, 2). The group II chaperonins are found in archaebacteria (named thermosomes) and in the eukaryotic cytosol (CCT, 3 cytosolic chaperonin containing tailless complex polypeptide 1, also called TRiC). CCT is the most complex of all the chaperonins with each of the two rings composed of eight paralogous subunits that orchestrate the folding of many proteins, with the most abundant substrates being actins and tubulins (3). CCT substrates tend to have complex domain topologies and range up to ϳ70 kDa in size (4, 5). Nascent polypeptides or denatured proteins bind inside the folding cavity to regions of both the equatorial and apical domains of the CCT subunits (6). The process of ATP binding and hydrolysis induces dramatic conformational changes in the apical domains that result in closure of the folding cavity by finger-like

Research paper thumbnail of Mass spectrometry of the M. smegmatis proteome: Protein expression levels correlate with function, operons, and codon bias

Genome Research, 2005

The fast-growing bacterium Mycobacterium smegmatis is a model mycobacterial system, a nonpathogen... more The fast-growing bacterium Mycobacterium smegmatis is a model mycobacterial system, a nonpathogenic soil bacterium that nonetheless shares many features with the pathogenic Mycobacterium tuberculosis, the causative agent of tuberculosis. The study of M. smegmatis is expected to shed light on mechanisms of mycobacterial growth and complex lipid metabolism, and provides a tractable system for antimycobacterial drug development. Although the M. smegmatis genome sequence is not yet completed, we used multidimensional chromatography and tandem mass spectrometry, in combination with the partially completed genome sequence, to detect and identify a total of 901 distinct proteins from M. smegmatis over the course of 25 growth conditions, providing experimental annotation for many predicted genes with an ∼5% false-positive identification rate. We observed numerous proteins involved in energy production (9.8% of expressed proteins), protein translation (8.7%), and lipid biosynthesis (5.4%); 3...

Research paper thumbnail of AICAR inhibits ceramide biosynthesis in skeletal muscle

Diabetology & Metabolic Syndrome, 2012

Background The worldwide prevalence of obesity has lead to increased efforts to find therapies to... more Background The worldwide prevalence of obesity has lead to increased efforts to find therapies to treat obesity-related pathologies. Ceramide is a well-established mediator of several health problems that arise from adipose tissue expansion. The purpose of this study was to determine whether AICAR, an AMPK-activating drug, selectively reduces skeletal muscle ceramide synthesis. Methods Murine myotubes and rats were challenged with palmitate and high-fat diet, respectively, to induce ceramide accrual, in the absence or presence of AICAR. Transcript levels of the rate-limiting enzyme in ceramide biosynthesis, serine palmitoyltransferase 2 (SPT2) were measured, in addition to lipid analysis. Student’s t-test and ANOVA were used to assess the association between outcomes and groups. Results Palmitate alone induced an increase in serine palmitoyltransferase 2 (SPT2) expression and an elevation of ceramide levels in myotubes. Co-incubation with palmitate and AICAR prevented both effects. ...

Research paper thumbnail of LC-MS alignment in theory and practice: a comprehensive algorithmic review

Briefings in Bioinformatics, 2013

Liquid chromatography-mass spectrometry is widely used for comparative replicate sample analysis ... more Liquid chromatography-mass spectrometry is widely used for comparative replicate sample analysis in proteomics, lipidomics and metabolomics. Before statistical comparison, registration must be established to match corresponding analytes from run to run. Alignment, the most popular correspondence approach, consists of constructing a function that warps the content of runs to most closely match a given reference sample. To date, dozens of correspondence algorithms have been proposed, creating a daunting challenge for practitioners in algorithm selection. Yet, existing reviews have highlighted only a few approaches. In this review, we describe 50 correspondence algorithms to facilitate practical algorithm selection. We elucidate the motivation for correspondence and analyze the limitations of current approaches, which include prohibitive runtimes, numerous user parameters, model limitations and the need for reference samples. We suggest and describe a paradigm shift for overcoming current correspondence limitations by building on known liquid chromatography-mass spectrometry behavior.

Research paper thumbnail of Controlling for confounding variables in MS-omics protocol: why modularity matters

Briefings in Bioinformatics, 2013

As the field of bioinformatics research continues to grow, more and more novel techniques are pro... more As the field of bioinformatics research continues to grow, more and more novel techniques are proposed to meet new challenges and improvements upon solutions to long-standing problems. These include data processing techniques and wet lab protocol techniques. Although the literature is consistently thorough in experimental detail and variable-controlling rigor for wet lab protocol techniques, bioinformatics techniques tend to be less described and less controlled. As the validation or rejection of hypotheses rests on the experiment's ability to isolate and measure a variable of interest, we urge the importance of reducing confounding variables in bioinformatics techniques during mass spectrometry experimentation.