Shirley Pepke - Academia.edu (original) (raw)
Papers by Shirley Pepke
Journal of Visualized Experiments
Differential gene expression analysis is an important technique for understanding disease states.... more Differential gene expression analysis is an important technique for understanding disease states. The machine learning algorithm CorEx has shown utility in analyzing differential expression of groups of genes in tumor RNA-seq in a way that may be helpful for advancing precision oncology. However, CorEx produces many factors that can be challenging to analyze and connect to existing understanding. To facilitate such connections, we have built a website, CorExplorer, that allows users to interactively explore the data and answer common questions related to its analysis. We trained CorEx on RNA-seq gene expression data for four tumor types: ovarian, lung, melanoma, and colorectal. We then incorporated corresponding survival, protein-protein interactions, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments, and heatmaps into the website for association with the factor graph visualization. Here we employ example protocols to illustrate use of the database for comprehending the significance of the learned tumor factors in the context of this external data.
SSRN Electronic Journal
We consider a portfolio-based approach to financing ovarian cancer therapeutics in which multiple... more We consider a portfolio-based approach to financing ovarian cancer therapeutics in which multiple candidates are funded within a single structure. Twenty-five potential early-stage drug development projects were identified for inclusion in a hypothetical portfolio through interviews with gynecological oncologists and leading experts, a review of ovarian cancer-related trials registered in the ClinicalTrials.gov database, and an extensive literature review. The annualized returns of this portfolio were simulated under a purely private sector structure both with and without partial funding from philanthropic grants, and a public-private partnership that included government guarantees. We find that public-private structures of this type can increase expected returns and reduce tail risk, allowing greater amounts of private sector capital to fund early-stage research and development.
Novel Therapeutics for Ovarian Cancer
BMC Medical Genomics
Background: De novo inference of clinically relevant gene function relationships from tumor RNA-s... more Background: De novo inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. Current methods typically either partition patient samples into a few subtypes or rely upon analysis of pairwise gene correlations that will miss some groups in noisy data. Leveraging higher dimensional information can be expected to increase the power to discern targetable pathways, but this is commonly thought to be an intractable computational problem. Methods: In this work we adapt a recently developed machine learning algorithm for sensitive detection of complex gene relationships. The algorithm, CorEx, efficiently optimizes over multivariate mutual information and can be iteratively applied to generate a hierarchy of relatively independent latent factors. The learned latent factors are used to stratify patients for survival analysis with respect to both single factors and combinations. These analyses are performed and interpreted in the context of biological function annotations and protein network interactions that might be utilized to match patients to multiple therapies. Results: Analysis of ovarian tumor RNA-seq samples demonstrates the algorithm's power to infer well over one hundred biologically interpretable gene cohorts, several times more than standard methods such as hierarchical clustering and k-means. The CorEx factor hierarchy is also informative, with related but distinct gene clusters grouped by upper nodes. Some latent factors correlate with patient survival, including one for a pathway connected with the epithelial-mesenchymal transition in breast cancer that is regulated by a microRNA that modulates epigenetics. Further, combinations of factors lead to a synergistic survival advantage in some cases. Conclusions: In contrast to studies that attempt to partition patients into a small number of subtypes (typically 4 or fewer) for treatment purposes, our approach utilizes subgroup information for combinatoric transcriptional phenotyping. Considering only the 66 gene expression groups that are found to both have significant Gene Ontology enrichment and are small enough to indicate specific drug targets implies a computational phenotype for ovarian cancer that allows for 3 66 possible patient profiles, enabling truly personalized treatment. The findings here demonstrate a new technique that sheds light on the complexity of gene expression dependencies in tumors and could eventually enable the use of patient RNA-seq profiles for selection of personalized and effective cancer treatments.
Physical Review E Statistical Physics Plasmas Fluids and Related Interdisciplinary Topics, 1994
We study the predictability of large events in self-organizing systems. We focus on a set of mode... more We study the predictability of large events in self-organizing systems. We focus on a set of models which have been studied as analogs of earthquake faults and fault systems, and apply methods based on techniques which are of current interest in seismology. In all cases we find detectable correlations between precursory smaller events and the large events we aim to forecast. We compare predictions based on different patterns of precursory events and find that for all of the models a new precursor based on the spatial distribution of activity outperforms more traditional measures based on temporal variations in the local activity.
Ca2+/calmodulin dependent protein kinase II (CaMKII) is a dodecameric serine/threonine protein ki... more Ca2+/calmodulin dependent protein kinase II (CaMKII) is a dodecameric serine/threonine protein kinase that is an essential component of the molecular mechanisms underlying learning and memory. Mice lacking both copies of the gene for the alpha subunit of CaMKII cannot perform spatial learning tasks; heterozygotes have behavioral phenotypes that resemble schizophrenia in humans. CaMKII is activated upon binding of Ca2+/calmodulin (CaM), which is itself a Ca2+-activated protein that binds four Ca2+ ions, two on its carboxyl (C) and two on its amino terminus (N). The major source of Ca2+ for activation of CaMKII at synapses is Ca2+ influx through the NMDA-type glutamate receptor in the postsynaptic dendritic spine. Strong activation of CaMKII by NMDA receptors initiates a series of molecular modifications in the spine that enhance the strength of the synapse. Here we present two kinetic models of activation of monomeric catalytic subunits of CaMKII (mCaMKII) that include binding of Ca2...
Changes in the strength of synaptic connections in the brain underlie our ability to form memorie... more Changes in the strength of synaptic connections in the brain underlie our ability to form memories and to learn. One type of experimentally induced change in synaptic strength, long term potentiation (LTP), is dependent upon the activation of Ca2+/calmodulin dependent protein kinase II (CaMKII). CaMKII is a serine/threonine protein kinase that constitutes 1-2% of all brain protein by weight. It is activated by binding of the Ca2+/calmodulin (Ca2+/CaM), which binds up to four Ca2+ ions upon Ca2+ flux through the NMDA receptor in postsynaptic dendritic spines. CaM removes the inhibitory domain of the kinase from the catalytic site and allows CaMKII to autophosphorylate itself and phosphorylate its substrates. CaMKII is activated within milliseconds of Ca2+ influx and can remain active for tens of minutes afterward. It is an early and essential molecular component of the complex signal transduction processes that underlie LTP. Here we present a thermodynamically complete model of activ...
Physical review. B, Condensed matter, Jan 15, 1990
We observe a sharp feature in the ultra-low-temperature magnetoconductivity of degenerately doped... more We observe a sharp feature in the ultra-low-temperature magnetoconductivity of degenerately doped Ge:Sb at H-25 kOe, which is robust up to at least three times the critical density for the insulator-metal transition. This field corresponds to a low-energy scale characteristic of the special nature of antimony donors in germanium. Its presence and sensitivity to uniaxial stress confirm the notion of metallic impurity bands in doped germanium.
Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 1994
We study the predictability of large events in self-organizing systems. We focus on a set of mode... more We study the predictability of large events in self-organizing systems. We focus on a set of models which have been studied as analogs of earthquake faults and fault systems, and apply methods based on techniques which are of current interest in seismology. In all cases we find detectable correlations between precursory smaller events and the large events we aim to forecast. We compare predictions based on different patterns of precursory events and find that for all of the models a new precursor based on the spatial distribution of activity outperforms more traditional measures based on temporal variations in the local activity.
Highlights d Hundreds of lncRNAs are dynamically expressed during reprogramming d Early reprogram... more Highlights d Hundreds of lncRNAs are dynamically expressed during reprogramming d Early reprogramming events include activation of Ras signaling pathways d lncRNAs activated during reprogramming can repress lineage-specific genes d lncRNAs activated in multiple reprogramming cell types regulate metabolism
Journal of Molecular Evolution, 2007
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid... more We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program cov-SEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than wholetree phylogenies, may be the most realistic goal for phylogenetic methods.
Journal of Geophysical Research, 1994
We present results for long term and intermediate term prediction algorithms applied to a simple ... more We present results for long term and intermediate term prediction algorithms applied to a simple mechanical model of a fault. We use long term prediction methods based, for example, on the distribution of repeat times between large events to establish a benchmark for predictability in the model. In comparison, intermediate term prediction techniques, analogous to the pattern recognition algorithms CN and M8 introduced and studied by Keilis-Borok et al., are more effective at predicting coming large events. We consider the implications of several different quality functions Q which can be used to optimize the algorithms with respect to features such as space, time, and magnitude windows, and find that our results are not overly sensitive to variations in these algorithm parameters. We also study the intrinsic uncertainities which are associated with seismicity catalogs of restricted lengths.
Genome Research, 2013
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, ... more We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) ...
Developmental Biology, 2010
Biology Ca 2+ /calmodulin-dependent protein kinase II (CaMKII) is important in LTP induction and ... more Biology Ca 2+ /calmodulin-dependent protein kinase II (CaMKII) is important in LTP induction and memory formation. Ca 2+ entering through NMDA receptors activates CaMKII through calmodulin. CaMKII is necessary for normal synaptic plasticity and activates many downstream pathways. CaMKII function is fine-tuned through interaction with other proteins, autophosphorylation, and inter-subunit regulation. We combine computational modelling and simulations with biochemical experiments in order to understand CaMKII regulation. Modelling synaptic proteins poses three kinds of problems: Small molecule numbers, large numbers of possible states, and complex geometries. Small molecule numbers
Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but th... more Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of 6150 bp surrounding Twist occupied summits, and enrichment of GA-and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic. [Supplemental material is available for this article. The microarray data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE26285, and the sequence data from this study have been submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/ sra.cgi) under accession no. SRA027330.]
During the acquisition of memories, influx of Ca 2+ into the postsynaptic spine through the pores... more During the acquisition of memories, influx of Ca 2+ into the postsynaptic spine through the pores of activated N-methyl-Daspartate-type glutamate receptors triggers processes that change the strength of excitatory synapses. The pattern of Ca 2+ influx during the first few seconds of activity is interpreted within the Ca 2+-dependent signaling network such that synaptic strength is eventually either potentiated or depressed. Many of the critical signaling enzymes that control synaptic plasticity, including Ca 2+ /calmodulin-dependent protein kinase II (CaMKII), are regulated by calmodulin, a small protein that can bind up to 4 Ca 2+ ions. As a first step toward clarifying how the Ca 2+-signaling network decides between potentiation or depression, we have created a kinetic model of the interactions of Ca 2+ , calmodulin, and CaMKII that represents our best understanding of the dynamics of these interactions under conditions that resemble those in a postsynaptic spine. We constrained parameters of the model from data in the literature, or from our own measurements, and then predicted time courses of activation and autophosphorylation of CaMKII under a variety of conditions. Simulations showed that species of calmodulin with fewer than four bound Ca 2+ play a significant role in activation of CaMKII in the physiological regime, supporting the notion that processing of Ca 2+ signals in a spine involves competition among target enzymes for binding to unsaturated species of CaM in an environment in which the concentration of Ca 2+ is fluctuating rapidly. Indeed, we showed that dependence of activation on the frequency of Ca 2+ transients arises from the kinetics of interaction of fluctuating Ca 2+ with calmodulin/CaMKII complexes. We used parameter sensitivity analysis to identify which parameters will be most beneficial to measure more carefully to improve the accuracy of predictions. This model provides a quantitative base from which to build more complex dynamic models of postsynaptic signal transduction during learning.
Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by ... more Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. While early-adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. This review describes the multilayered analyses of ChIP-seq and RNA-seq datasets, discusses the software packages currently available to perform tasks at each layer, and describes some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery, and expression quantification.
Journal of Visualized Experiments
Differential gene expression analysis is an important technique for understanding disease states.... more Differential gene expression analysis is an important technique for understanding disease states. The machine learning algorithm CorEx has shown utility in analyzing differential expression of groups of genes in tumor RNA-seq in a way that may be helpful for advancing precision oncology. However, CorEx produces many factors that can be challenging to analyze and connect to existing understanding. To facilitate such connections, we have built a website, CorExplorer, that allows users to interactively explore the data and answer common questions related to its analysis. We trained CorEx on RNA-seq gene expression data for four tumor types: ovarian, lung, melanoma, and colorectal. We then incorporated corresponding survival, protein-protein interactions, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments, and heatmaps into the website for association with the factor graph visualization. Here we employ example protocols to illustrate use of the database for comprehending the significance of the learned tumor factors in the context of this external data.
SSRN Electronic Journal
We consider a portfolio-based approach to financing ovarian cancer therapeutics in which multiple... more We consider a portfolio-based approach to financing ovarian cancer therapeutics in which multiple candidates are funded within a single structure. Twenty-five potential early-stage drug development projects were identified for inclusion in a hypothetical portfolio through interviews with gynecological oncologists and leading experts, a review of ovarian cancer-related trials registered in the ClinicalTrials.gov database, and an extensive literature review. The annualized returns of this portfolio were simulated under a purely private sector structure both with and without partial funding from philanthropic grants, and a public-private partnership that included government guarantees. We find that public-private structures of this type can increase expected returns and reduce tail risk, allowing greater amounts of private sector capital to fund early-stage research and development.
Novel Therapeutics for Ovarian Cancer
BMC Medical Genomics
Background: De novo inference of clinically relevant gene function relationships from tumor RNA-s... more Background: De novo inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. Current methods typically either partition patient samples into a few subtypes or rely upon analysis of pairwise gene correlations that will miss some groups in noisy data. Leveraging higher dimensional information can be expected to increase the power to discern targetable pathways, but this is commonly thought to be an intractable computational problem. Methods: In this work we adapt a recently developed machine learning algorithm for sensitive detection of complex gene relationships. The algorithm, CorEx, efficiently optimizes over multivariate mutual information and can be iteratively applied to generate a hierarchy of relatively independent latent factors. The learned latent factors are used to stratify patients for survival analysis with respect to both single factors and combinations. These analyses are performed and interpreted in the context of biological function annotations and protein network interactions that might be utilized to match patients to multiple therapies. Results: Analysis of ovarian tumor RNA-seq samples demonstrates the algorithm's power to infer well over one hundred biologically interpretable gene cohorts, several times more than standard methods such as hierarchical clustering and k-means. The CorEx factor hierarchy is also informative, with related but distinct gene clusters grouped by upper nodes. Some latent factors correlate with patient survival, including one for a pathway connected with the epithelial-mesenchymal transition in breast cancer that is regulated by a microRNA that modulates epigenetics. Further, combinations of factors lead to a synergistic survival advantage in some cases. Conclusions: In contrast to studies that attempt to partition patients into a small number of subtypes (typically 4 or fewer) for treatment purposes, our approach utilizes subgroup information for combinatoric transcriptional phenotyping. Considering only the 66 gene expression groups that are found to both have significant Gene Ontology enrichment and are small enough to indicate specific drug targets implies a computational phenotype for ovarian cancer that allows for 3 66 possible patient profiles, enabling truly personalized treatment. The findings here demonstrate a new technique that sheds light on the complexity of gene expression dependencies in tumors and could eventually enable the use of patient RNA-seq profiles for selection of personalized and effective cancer treatments.
Physical Review E Statistical Physics Plasmas Fluids and Related Interdisciplinary Topics, 1994
We study the predictability of large events in self-organizing systems. We focus on a set of mode... more We study the predictability of large events in self-organizing systems. We focus on a set of models which have been studied as analogs of earthquake faults and fault systems, and apply methods based on techniques which are of current interest in seismology. In all cases we find detectable correlations between precursory smaller events and the large events we aim to forecast. We compare predictions based on different patterns of precursory events and find that for all of the models a new precursor based on the spatial distribution of activity outperforms more traditional measures based on temporal variations in the local activity.
Ca2+/calmodulin dependent protein kinase II (CaMKII) is a dodecameric serine/threonine protein ki... more Ca2+/calmodulin dependent protein kinase II (CaMKII) is a dodecameric serine/threonine protein kinase that is an essential component of the molecular mechanisms underlying learning and memory. Mice lacking both copies of the gene for the alpha subunit of CaMKII cannot perform spatial learning tasks; heterozygotes have behavioral phenotypes that resemble schizophrenia in humans. CaMKII is activated upon binding of Ca2+/calmodulin (CaM), which is itself a Ca2+-activated protein that binds four Ca2+ ions, two on its carboxyl (C) and two on its amino terminus (N). The major source of Ca2+ for activation of CaMKII at synapses is Ca2+ influx through the NMDA-type glutamate receptor in the postsynaptic dendritic spine. Strong activation of CaMKII by NMDA receptors initiates a series of molecular modifications in the spine that enhance the strength of the synapse. Here we present two kinetic models of activation of monomeric catalytic subunits of CaMKII (mCaMKII) that include binding of Ca2...
Changes in the strength of synaptic connections in the brain underlie our ability to form memorie... more Changes in the strength of synaptic connections in the brain underlie our ability to form memories and to learn. One type of experimentally induced change in synaptic strength, long term potentiation (LTP), is dependent upon the activation of Ca2+/calmodulin dependent protein kinase II (CaMKII). CaMKII is a serine/threonine protein kinase that constitutes 1-2% of all brain protein by weight. It is activated by binding of the Ca2+/calmodulin (Ca2+/CaM), which binds up to four Ca2+ ions upon Ca2+ flux through the NMDA receptor in postsynaptic dendritic spines. CaM removes the inhibitory domain of the kinase from the catalytic site and allows CaMKII to autophosphorylate itself and phosphorylate its substrates. CaMKII is activated within milliseconds of Ca2+ influx and can remain active for tens of minutes afterward. It is an early and essential molecular component of the complex signal transduction processes that underlie LTP. Here we present a thermodynamically complete model of activ...
Physical review. B, Condensed matter, Jan 15, 1990
We observe a sharp feature in the ultra-low-temperature magnetoconductivity of degenerately doped... more We observe a sharp feature in the ultra-low-temperature magnetoconductivity of degenerately doped Ge:Sb at H-25 kOe, which is robust up to at least three times the critical density for the insulator-metal transition. This field corresponds to a low-energy scale characteristic of the special nature of antimony donors in germanium. Its presence and sensitivity to uniaxial stress confirm the notion of metallic impurity bands in doped germanium.
Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 1994
We study the predictability of large events in self-organizing systems. We focus on a set of mode... more We study the predictability of large events in self-organizing systems. We focus on a set of models which have been studied as analogs of earthquake faults and fault systems, and apply methods based on techniques which are of current interest in seismology. In all cases we find detectable correlations between precursory smaller events and the large events we aim to forecast. We compare predictions based on different patterns of precursory events and find that for all of the models a new precursor based on the spatial distribution of activity outperforms more traditional measures based on temporal variations in the local activity.
Highlights d Hundreds of lncRNAs are dynamically expressed during reprogramming d Early reprogram... more Highlights d Hundreds of lncRNAs are dynamically expressed during reprogramming d Early reprogramming events include activation of Ras signaling pathways d lncRNAs activated during reprogramming can repress lineage-specific genes d lncRNAs activated in multiple reprogramming cell types regulate metabolism
Journal of Molecular Evolution, 2007
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid... more We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program cov-SEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than wholetree phylogenies, may be the most realistic goal for phylogenetic methods.
Journal of Geophysical Research, 1994
We present results for long term and intermediate term prediction algorithms applied to a simple ... more We present results for long term and intermediate term prediction algorithms applied to a simple mechanical model of a fault. We use long term prediction methods based, for example, on the distribution of repeat times between large events to establish a benchmark for predictability in the model. In comparison, intermediate term prediction techniques, analogous to the pattern recognition algorithms CN and M8 introduced and studied by Keilis-Borok et al., are more effective at predicting coming large events. We consider the implications of several different quality functions Q which can be used to optimize the algorithms with respect to features such as space, time, and magnitude windows, and find that our results are not overly sensitive to variations in these algorithm parameters. We also study the intrinsic uncertainities which are associated with seismicity catalogs of restricted lengths.
Genome Research, 2013
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, ... more We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) ...
Developmental Biology, 2010
Biology Ca 2+ /calmodulin-dependent protein kinase II (CaMKII) is important in LTP induction and ... more Biology Ca 2+ /calmodulin-dependent protein kinase II (CaMKII) is important in LTP induction and memory formation. Ca 2+ entering through NMDA receptors activates CaMKII through calmodulin. CaMKII is necessary for normal synaptic plasticity and activates many downstream pathways. CaMKII function is fine-tuned through interaction with other proteins, autophosphorylation, and inter-subunit regulation. We combine computational modelling and simulations with biochemical experiments in order to understand CaMKII regulation. Modelling synaptic proteins poses three kinds of problems: Small molecule numbers, large numbers of possible states, and complex geometries. Small molecule numbers
Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but th... more Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of 6150 bp surrounding Twist occupied summits, and enrichment of GA-and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic. [Supplemental material is available for this article. The microarray data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE26285, and the sequence data from this study have been submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/ sra.cgi) under accession no. SRA027330.]
During the acquisition of memories, influx of Ca 2+ into the postsynaptic spine through the pores... more During the acquisition of memories, influx of Ca 2+ into the postsynaptic spine through the pores of activated N-methyl-Daspartate-type glutamate receptors triggers processes that change the strength of excitatory synapses. The pattern of Ca 2+ influx during the first few seconds of activity is interpreted within the Ca 2+-dependent signaling network such that synaptic strength is eventually either potentiated or depressed. Many of the critical signaling enzymes that control synaptic plasticity, including Ca 2+ /calmodulin-dependent protein kinase II (CaMKII), are regulated by calmodulin, a small protein that can bind up to 4 Ca 2+ ions. As a first step toward clarifying how the Ca 2+-signaling network decides between potentiation or depression, we have created a kinetic model of the interactions of Ca 2+ , calmodulin, and CaMKII that represents our best understanding of the dynamics of these interactions under conditions that resemble those in a postsynaptic spine. We constrained parameters of the model from data in the literature, or from our own measurements, and then predicted time courses of activation and autophosphorylation of CaMKII under a variety of conditions. Simulations showed that species of calmodulin with fewer than four bound Ca 2+ play a significant role in activation of CaMKII in the physiological regime, supporting the notion that processing of Ca 2+ signals in a spine involves competition among target enzymes for binding to unsaturated species of CaM in an environment in which the concentration of Ca 2+ is fluctuating rapidly. Indeed, we showed that dependence of activation on the frequency of Ca 2+ transients arises from the kinetics of interaction of fluctuating Ca 2+ with calmodulin/CaMKII complexes. We used parameter sensitivity analysis to identify which parameters will be most beneficial to measure more carefully to improve the accuracy of predictions. This model provides a quantitative base from which to build more complex dynamic models of postsynaptic signal transduction during learning.
Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by ... more Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. While early-adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. This review describes the multilayered analyses of ChIP-seq and RNA-seq datasets, discusses the software packages currently available to perform tasks at each layer, and describes some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery, and expression quantification.