Aidan Daly | University of Oxford (original) (raw)
Papers by Aidan Daly
arXiv (Cornell University), Apr 1, 2021
Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method f... more Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as in situ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from in situ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during in situ and high-throughput experiments.
Mathematical modeling has been instrumental to the development of natural sciences over the last ... more Mathematical modeling has been instrumental to the development of natural sciences over the last half-century. Through iterated interactions between modeling and real-world exper- imentation, these models have furthered our understanding of the processes in biology and chemistry that they seek to represent. In certain application domains, such as the field of car- diac biology, communities of modelers with common interests have emerged, leading to the development of many models that attempt to explain the same or similar phenomena. As these communities have developed, however, reporting standards for modeling studies have been in- consistent, often focusing on the final parameterized result, and obscuring the assumptions and data used during their creation. These practices make it difficult for researchers to adapt exist- ing models to new systems or newly available data, and also to assess the identifiability of said models — the degree to which their optimal parameters are constra...
Bioinformatics, 2021
Motivation Registration of histology images from multiple sources is a pressing problem in large-... more Motivation Registration of histology images from multiple sources is a pressing problem in large-scale studies of spatial -omics data. Researchers often perform ‘common coordinate registration’, akin to segmentation, in which samples are partitioned based on tissue type to allow for quantitative comparison of similar regions across samples. Accuracy in such registration requires both high image resolution and global awareness, which mark a difficult balancing act for contemporary deep learning architectures. Results We present a novel convolutional neural network (CNN) architecture that combines (i) a local classification CNN that extracts features from image patches sampled sparsely across the tissue surface and (ii) a global segmentation CNN that operates on these extracted features. This hybrid network can be trained in an end-to-end manner, and we demonstrate its relative merits over competing approaches on a reference histology dataset as well as two published spatial transcrip...
Registration of histology images from multiple sources is a pressing problem in large-scale studi... more Registration of histology images from multiple sources is a pressing problem in large-scale studies of spatial -omics data. Researchers often perform “common coordinate registration,” akin to segmentation, in which samples are partitioned based on tissue type to allow for quantitative comparison of similar regions across samples. Accuracy in such registration requires both high image resolution and global awareness, which mark a difficult balancing act for contemporary deep learning architectures. We present a novel convolutional neural network (CNN) architecture that combines (1) a local classification CNN that extracts features from image patches sampled sparsely across the tissue surface, and (2) a global segmentation CNN that operates on these extracted features. This hybrid network can be trained in an end-to-end manner, and we demonstrate its relative merits over competing approaches on a reference histology dataset as well as two published spatial transcriptomics datasets. We...
Journal of the Royal Society, Interface, 2018
As systems approaches to the development of biological models become more mature, attention is in... more As systems approaches to the development of biological models become more mature, attention is increasingly focusing on the problem of inferring parameter values within those models from experimental data. However, particularly for nonlinear models, it is not obvious, either from inspection of the model or from the experimental data, that the inverse problem of parameter fitting will have a unique solution, or even a non-unique solution that constrains the parameters to lie within a plausible physiological range. Where parameters cannot be constrained they are termed 'unidentifiable'. We focus on gaining insight into the causes of unidentifiability using inference-based methods, and compare a recently developed measure-theoretic approach to inverse sensitivity analysis to the popular Markov chain Monte Carlo and approximate Bayesian computation techniques for Bayesian inference. All three approaches map the uncertainty in quantities of interest in the output space to the pro...
Progress in biophysics and molecular biology, Jan 26, 2018
The modelling of the electrophysiology of cardiac cells is one of the most mature areas of system... more The modelling of the electrophysiology of cardiac cells is one of the most mature areas of systems biology. This extended concentration of research effort brings with it new challenges, foremost among which is that of choosing which of these models is most suitable for addressing a particular scientific question. In a previous paper, we presented our initial work in developing an online resource for the characterisation and comparison of electrophysiological cell models in a wide range of experimental scenarios. In that work, we described how we had developed a novel protocol language that allowed us to separate the details of the mathematical model (the majority of cardiac cell models take the form of ordinary differential equations) from the experimental protocol being simulated. We developed a fully-open online repository (which we termed the Cardiac Electrophysiology Web Lab) which allows users to store and compare the results of applying the same experimental protocol to compet...
ChemElectroChem
We describe the use of Bayesian inference for quantitative comparison of voltammetric methods for... more We describe the use of Bayesian inference for quantitative comparison of voltammetric methods for investigating electrode kinetics. We illustrate the utility of the approach by comparing the information content in both DC and AC voltammetry at a planar electrode for the case of a quasi-reversible one electron reaction mechanism. Using synthetic data (i.e. simulated data based on Butler-Volmer electrode kinetics for which the true parameter values are known and to which realistic levels of simulated experimental noise have been added), we are able to show that AC voltammetry is less affected by experimental noise (so that in effect it has a greater information content then the corresponding DC measurement) and hence yields more accurate estimates of the experimental parameters for a given level of noise. Significantly, the AC approach is shown to be able to distinguish higher values of the rate constant. The results of using synthetic data are then confirmed for an illustrative case of experimental data for the [Fe(CN) 6 ] 3−/4− process.
Journal of The Royal Society Interface
Bayesian methods are advantageous for biological modelling studies due to their ability to quanti... more Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to va...
Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently ... more Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently reaching efficiencies of over 9\% and offering desirable properties for manufacturing and installation. We study conjugated donor polymers for high-efficiency bulk-heterojunction photovoltaic devices with a molecular library motivated by experimental feasibility. We use quantum mechanics and a distributed computing approach to explore this vast molecular space. We will detail the screening approach starting from the generation of the molecular library, which can be ...
Royal Society Open Science, 2015
As cardiac cell models become increasingly complex, a correspondingly complex ‘genealogy’ of inhe... more As cardiac cell models become increasingly complex, a correspondingly complex ‘genealogy’ of inherited parameter values has also emerged. The result has been the loss of a direct link between model parameters and experimental data, limiting both reproducibility and the ability to re-fit to new data. We examine the ability of approximate Bayesian computation (ABC) to infer parameter distributions in the seminal action potential model of Hodgkin and Huxley, for which an immediate and documented connection to experimental results exists. The ability of ABC to produce tight posteriors around the reported values for the gating rates of sodium and potassium ion channels validates the precision of this early work, while the highly variable posteriors around certain voltage dependency parameters suggests that voltage clamp experiments alone are insufficient to constrain the full model. Despite this, Hodgkin and Huxley's estimates are shown to be competitive with those produced by ABC, a...
Bulletin of the American Physical Society, Feb 27, 2012
We present the Harvard Clean Energy Project (CEP) which is concerned with the computational scree... more We present the Harvard Clean Energy Project (CEP) which is concerned with the computational screening and design of new organic photovoltaic materials. CEP has established an automated, high-throughput, in silico framework to study millions of potential candidate structures. This presentation discusses the CEP branch which employs first-principles computational quantum chemistry for the characterization of molecular motifs and the assessment of their quality with respect to applications as electronic materials. In addition to finding specific ...
arXiv (Cornell University), Apr 1, 2021
Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method f... more Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as in situ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from in situ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during in situ and high-throughput experiments.
Mathematical modeling has been instrumental to the development of natural sciences over the last ... more Mathematical modeling has been instrumental to the development of natural sciences over the last half-century. Through iterated interactions between modeling and real-world exper- imentation, these models have furthered our understanding of the processes in biology and chemistry that they seek to represent. In certain application domains, such as the field of car- diac biology, communities of modelers with common interests have emerged, leading to the development of many models that attempt to explain the same or similar phenomena. As these communities have developed, however, reporting standards for modeling studies have been in- consistent, often focusing on the final parameterized result, and obscuring the assumptions and data used during their creation. These practices make it difficult for researchers to adapt exist- ing models to new systems or newly available data, and also to assess the identifiability of said models — the degree to which their optimal parameters are constra...
Bioinformatics, 2021
Motivation Registration of histology images from multiple sources is a pressing problem in large-... more Motivation Registration of histology images from multiple sources is a pressing problem in large-scale studies of spatial -omics data. Researchers often perform ‘common coordinate registration’, akin to segmentation, in which samples are partitioned based on tissue type to allow for quantitative comparison of similar regions across samples. Accuracy in such registration requires both high image resolution and global awareness, which mark a difficult balancing act for contemporary deep learning architectures. Results We present a novel convolutional neural network (CNN) architecture that combines (i) a local classification CNN that extracts features from image patches sampled sparsely across the tissue surface and (ii) a global segmentation CNN that operates on these extracted features. This hybrid network can be trained in an end-to-end manner, and we demonstrate its relative merits over competing approaches on a reference histology dataset as well as two published spatial transcrip...
Registration of histology images from multiple sources is a pressing problem in large-scale studi... more Registration of histology images from multiple sources is a pressing problem in large-scale studies of spatial -omics data. Researchers often perform “common coordinate registration,” akin to segmentation, in which samples are partitioned based on tissue type to allow for quantitative comparison of similar regions across samples. Accuracy in such registration requires both high image resolution and global awareness, which mark a difficult balancing act for contemporary deep learning architectures. We present a novel convolutional neural network (CNN) architecture that combines (1) a local classification CNN that extracts features from image patches sampled sparsely across the tissue surface, and (2) a global segmentation CNN that operates on these extracted features. This hybrid network can be trained in an end-to-end manner, and we demonstrate its relative merits over competing approaches on a reference histology dataset as well as two published spatial transcriptomics datasets. We...
Journal of the Royal Society, Interface, 2018
As systems approaches to the development of biological models become more mature, attention is in... more As systems approaches to the development of biological models become more mature, attention is increasingly focusing on the problem of inferring parameter values within those models from experimental data. However, particularly for nonlinear models, it is not obvious, either from inspection of the model or from the experimental data, that the inverse problem of parameter fitting will have a unique solution, or even a non-unique solution that constrains the parameters to lie within a plausible physiological range. Where parameters cannot be constrained they are termed 'unidentifiable'. We focus on gaining insight into the causes of unidentifiability using inference-based methods, and compare a recently developed measure-theoretic approach to inverse sensitivity analysis to the popular Markov chain Monte Carlo and approximate Bayesian computation techniques for Bayesian inference. All three approaches map the uncertainty in quantities of interest in the output space to the pro...
Progress in biophysics and molecular biology, Jan 26, 2018
The modelling of the electrophysiology of cardiac cells is one of the most mature areas of system... more The modelling of the electrophysiology of cardiac cells is one of the most mature areas of systems biology. This extended concentration of research effort brings with it new challenges, foremost among which is that of choosing which of these models is most suitable for addressing a particular scientific question. In a previous paper, we presented our initial work in developing an online resource for the characterisation and comparison of electrophysiological cell models in a wide range of experimental scenarios. In that work, we described how we had developed a novel protocol language that allowed us to separate the details of the mathematical model (the majority of cardiac cell models take the form of ordinary differential equations) from the experimental protocol being simulated. We developed a fully-open online repository (which we termed the Cardiac Electrophysiology Web Lab) which allows users to store and compare the results of applying the same experimental protocol to compet...
ChemElectroChem
We describe the use of Bayesian inference for quantitative comparison of voltammetric methods for... more We describe the use of Bayesian inference for quantitative comparison of voltammetric methods for investigating electrode kinetics. We illustrate the utility of the approach by comparing the information content in both DC and AC voltammetry at a planar electrode for the case of a quasi-reversible one electron reaction mechanism. Using synthetic data (i.e. simulated data based on Butler-Volmer electrode kinetics for which the true parameter values are known and to which realistic levels of simulated experimental noise have been added), we are able to show that AC voltammetry is less affected by experimental noise (so that in effect it has a greater information content then the corresponding DC measurement) and hence yields more accurate estimates of the experimental parameters for a given level of noise. Significantly, the AC approach is shown to be able to distinguish higher values of the rate constant. The results of using synthetic data are then confirmed for an illustrative case of experimental data for the [Fe(CN) 6 ] 3−/4− process.
Journal of The Royal Society Interface
Bayesian methods are advantageous for biological modelling studies due to their ability to quanti... more Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to va...
Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently ... more Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently reaching efficiencies of over 9\% and offering desirable properties for manufacturing and installation. We study conjugated donor polymers for high-efficiency bulk-heterojunction photovoltaic devices with a molecular library motivated by experimental feasibility. We use quantum mechanics and a distributed computing approach to explore this vast molecular space. We will detail the screening approach starting from the generation of the molecular library, which can be ...
Royal Society Open Science, 2015
As cardiac cell models become increasingly complex, a correspondingly complex ‘genealogy’ of inhe... more As cardiac cell models become increasingly complex, a correspondingly complex ‘genealogy’ of inherited parameter values has also emerged. The result has been the loss of a direct link between model parameters and experimental data, limiting both reproducibility and the ability to re-fit to new data. We examine the ability of approximate Bayesian computation (ABC) to infer parameter distributions in the seminal action potential model of Hodgkin and Huxley, for which an immediate and documented connection to experimental results exists. The ability of ABC to produce tight posteriors around the reported values for the gating rates of sodium and potassium ion channels validates the precision of this early work, while the highly variable posteriors around certain voltage dependency parameters suggests that voltage clamp experiments alone are insufficient to constrain the full model. Despite this, Hodgkin and Huxley's estimates are shown to be competitive with those produced by ABC, a...
Bulletin of the American Physical Society, Feb 27, 2012
We present the Harvard Clean Energy Project (CEP) which is concerned with the computational scree... more We present the Harvard Clean Energy Project (CEP) which is concerned with the computational screening and design of new organic photovoltaic materials. CEP has established an automated, high-throughput, in silico framework to study millions of potential candidate structures. This presentation discusses the CEP branch which employs first-principles computational quantum chemistry for the characterization of molecular motifs and the assessment of their quality with respect to applications as electronic materials. In addition to finding specific ...