Soumya Banerjee - Academia.edu (original) (raw)
Uploads
phdtheses by Soumya Banerjee
For half a century, artificial intelligence research has attempted to reproduce the human qualiti... more For half a century, artificial intelligence research has attempted to reproduce the human qualities of abstraction and reasoning-creating computer systems that can learn new concepts from a minimal set of examples, in settings where humans find this easy. While specific neural networks are able to solve an impressive range of problems, broad generalisation to situations outside their training data has proved elusive. In this work, we look at several novel approaches for solving the Abstraction & Reasoning Corpus (ARC). This is a dataset of abstract visual reasoning tasks introduced to test algorithms on broad generalization. Despite three international competitions with $100,000 in prizes, the best algorithms still fail to solve a majority of ARC tasks. The best solvers today rely on complex hand-crafted rules, without using machine learning at all. We revisit whether recent advances in neural networks allow progress on this task, or whether an entirely different class of models are required. First, we adapt the DreamCoder Neurosymbolic reasoning solver to ARC. DreamCoder automatically writes programs in a bespoke domain-specific language to perform reasoning, using a neural network to mimic human intuition. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allow DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation. We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks. LLMs are able to solve a different group of problems to state-of-the-art solvers, and provide an interesting way to complement other approaches. We perform an ensemble analysis, combining models to achieve better results than any system alone. Finally, we publish the arckit Python library to make future research on ARC easier. Teaser We develop machine learning techniques for abstraction and reasoning.
articles by Soumya Banerjee
Objective Survival models are used extensively in biomedical sciences, where they allow the inves... more Objective Survival models are used extensively in biomedical sciences, where they allow the investigation of the effect of exposures on health outcomes. It is desirable to use diverse data sets in survival analyses, because this offers increased statistical power and generalisability of results. However, there are often challenges with bringing data together in one location or following an analysis plan and sharing results. DataSHIELD is an analysis platform that helps users to overcome these ethical, governance and process difficulties. It allows users to analyse data remotely, using functions that are built to restrict access to the detailed data items (federated analysis). Previous works have provided survival modelling functionality in DataSHIELD (dsSurvival package), but there is a requirement to provide functions that offer privacy enhancing survival curves that retain useful information. Results We introduce an enhanced version of the dsSurvival package which offers privacy enhancing survival curves for DataSHIELD. Different methods for enhancing privacy were evaluated for their effectiveness in enhancing privacy while maintaining utility. We demonstrated how our selected method could enhance privacy in different scenarios using real survival data. The details of how DataSHIELD can be used to generate survival curves can be found in the associated tutorial.
Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full ... more Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full access to the detailed data items (federated analysis). While this feature helps to overcome difficulties with data sharing, it can make it challenging to write code without full visibility of the data. One solution is to generate realistic, non-disclosive synthetic data that can be transferred to the analyst so they can perfect their code without the access limitation. When this process is complete, they can run the code on the real data. We have created a package in DataSHIELD (dsSynthetic) which allows generation of realistic synthetic data, building on existing packages. In our paper and accompanying tutorial we demonstrate how the use of synthetic data generated with our package can help DataSHIELD users with tasks such as writing analysis scripts and harmonising data to common scales and measures.
Summary Artificial intelligence (AI) is increasingly taking on a greater role in healthcare. Howe... more Summary Artificial intelligence (AI) is increasingly taking on a greater role in healthcare. However, hype and negative news reports about AI abound. Integrating patient and public involvement (PPI) in healthcare AI projects may help in adoption and acceptance of these technologies. We argue that AI algorithms should also be co-designed with patients and healthcare workers. We specifically suggest (1) including patients with lived experience of the disease, and (2) creating a research advisory group (RAG) and using these group meetings to walk patients through the process of AI model building, starting with simple (e.g., linear) models. We present a framework, case studies, best practices, and tools for applying participative data science to healthcare, enabling data scientists, clinicians, and patients to work together. The strategy of co-designing with patients can help set more realistic expectations for all stakeholders, since conventional narratives of AI revolve around dystopia or limitless optimism.
Achieving sufficient statistical power in a survival analysis usually requires large amounts of d... more Achieving sufficient statistical power in a survival analysis usually requires large amounts of data from different sites. Sensitivity of individual-level data, ethical and practical considerations regarding data sharing across institutions could be a potential challenge for achieving this added power. Hence we implemented a federated meta-analysis approach of survival models in DataSHIELD, where only anonymous aggregated data are shared across institutions, while simultaneously allowing for exploratory, interactive modelling. In this case, meta-analysis techniques to combine analysis results from each site are a solution, but a manual analysis workflow hinders exploration. Thus, the aim is to provide a framework for performing meta-analysis of Cox regression models across institutions without manual analysis steps for the data providers. We introduce a package (dsSurvival) which allows privacy preserving meta-analysis of survival models, including the calculation of hazard ratios. Our tool can be of great use in biomedical research where there is a need for building survival models and there are privacy concerns about sharing data. A tutorial in bookdown format with code, diagnostics, plots and synthetic data is available here: https://neelsoumya.github.io/dsSurvivalbookdown/ All code is available from the following repositories: https://github.com/neelsoumya/dsSurvivalClient/ https://github.com/neelsoumya/dsSurvival/ {\#}{\#}{\#} Competing Interest Statement The authors have declared no competing interest.
Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms t... more Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms that train themselves. They have been widely applied in the healthcare domain. However, many trained ML algorithms operate as ‘black boxes', producing a prediction from input data without a clear explanation of their workings. Non-transparent predictions are of limited utility in many clinical domains, where decisions must be justifiable. Here, we apply class-contrastive counterfactual reasoning to ML to demonstrate how specific changes in inputs lead to different predictions of mortality in people with severe mental illness (SMI), a major public health challenge. We produce predictions accompanied by visual and textual explanations as to how the prediction would have differed given specific changes to the input. We apply it to routinely collected data from a mental health secondary care provider in patients with schizophrenia. Using a data structuring framework informed by clinical knowledge, we captured information on physical health, mental health, and social predisposing factors. We then trained an ML algorithm and other statistical learning techniques to predict the risk of death. The ML algorithm predicted mortality with an area under receiver operating characteristic curve (AUROC) of 0.80 (95{\%} confidence intervals [0.78, 0.82]). We used class-contrastive analysis to produce explanations for the model predictions. We outline the scenarios in which class-contrastive analysis is likely to be successful in producing explanations for model predictions. Our aim is not to advocate for a particular model but show an application of the class-contrastive analysis technique to electronic healthcare record data for a disease of public health significance. In patients with schizophrenia, our work suggests that use or prescription of medications like antidepressants was associated with lower risk of death. Abuse of alcohol/drugs and a diagnosis of delirium were associated with higher risk of death. Our ML models highlight the role of co-morbidities in determining mortality in patients with schizophrenia and the need to manage co-morbidities in these patients. We hope that some of these bio-social factors can be targeted therapeutically by either patient-level or service-level interventions. Our approach combines clinical knowledge, health data, and statistical learning, to make predictions interpretable to clinicians using class-contrastive reasoning. This is a step towards interpretable AI in the management of patients with schizophrenia and potentially other diseases.
Background. Machine learning (ML), one aspect of artificial intelligence (AI), involves computer ... more Background. Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms that train themselves. They have been widely applied in the healthcare domain. However, many trained ML algorithms operate as black boxes, producing a prediction from input data without a clear explanation of their workings. Non-transparent predictions are of limited utility in many clinical domains, where decisions must be justifiable. Methods. Here, we apply class-contrastive counterfactual reasoning to ML to demonstrate how specific changes in inputs lead to different predictions of mortality in people with severe mental illness (SMI), a major public health challenge. We produce predictions accompanied by visual and textual explanations as to how the prediction would have differed given specific changes to the input. We apply it to routinely collected data from a mental health secondary care provider in patients with schizophrenia. Using a data structuring framework informed by clinical knowledge, we captured information on physical health, mental health, and social predisposing factors. We then trained an ML algorithm to predict the risk of death. Results. The ML algorithm predicted mortality with an area under receiver operating characteristic curve (AUROC) of 0.8 (compared to an AUROC of 0.67 from a logistic regression model), and produced class-contrastive explanations for its predictions. Conclusions. In patients with schizophrenia, our work suggests that use of medications like second generation antipsychotics and antidepressants was associated with lower risk of death. Abuse of alcohol/drugs and a diagnosis of delirium were associated with higher risk of death. Our ML models highlight the role of co-morbidities in determining mortality in patients with SMI and the need to manage them. We hope that some of these bio-social factors can be targeted therapeutically by either patient-level or service-level interventions. This approach combines clinical knowledge, health data, and statistical learning, to make predictions interpretable to clinicians using class-contrastive reasoning. This is a step towards interpretable AI in the management of patients with SMI and potentially other diseases. {\#}{\#}{\#} Competing Interest Statement RNC consults for Campden Instruments Ltd and receives royalties from Cambridge University Press, Cambridge Enterprise, and Routledge. SB, PL and PJ declare they have no conflicts of interest to disclose. {\#}{\#}{\#} Funding Statement This work was funded by an MRC Mental Health Data Pathfinder grant (MC PC 17213). PBJ is supported by the NIHR Applied Research Collaboration East of England. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This research was supported in part by the NIHR Cambridge Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the MRC, the NHS, the NIHR, or the Department of Health and Social Care. {\#}{\#}{\#} Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The CPFT Research Database operates under UK NHS Research Ethics approvals (REC references 12/EE/0407, 17/EE/0442; IRAS project ID 237953). All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes This study reports on human clinical data which cannot be published directly due to reasonable privacy concerns, as per NHS research ethics approvals and information governance rules.
Objectives: Face-to-face healthcare, including psychiatric provision, must continue despite reduc... more Objectives: Face-to-face healthcare, including psychiatric provision, must continue despite reduced interpersonal contact during the COVID-19 (SARS-CoV-2 coronavirus) pandemic. Community-based services might use domiciliary visits, consultations in healthcare settings, or remote consultations. Services might also alter direct contact between clinicians. We examined the effects of appointment types and clinician–clinician encounters upon infection rates.
Summary Local cell contraction pulses play important roles in tissue and cell morphogenesis. Here... more Summary Local cell contraction pulses play important roles in tissue and cell morphogenesis. Here, we improve a chemo-optogenetic approach and apply it to investigate the signal network that generates these pulses. We use these measurements to derive and parameterize a system of ordinary differential equations describing temporal signal network dynamics. Bifurcation analysis and numerical simulations predict a strong dependence of oscillatory system dynamics on the concentration of GEF-H1, an Lbc-type RhoGEF, which mediates the positive feedback amplification of Rho activity. This prediction is confirmed experimentally via optogenetic tuning of the effective GEF-H1 concentration in individual living cells. Numerical simulations show that pulse amplitude is most sensitive to external inputs into the myosin component at low GEF-H1 concentrations and that the spatial pulse width is dependent on GEF-H1 diffusion. Our study offers a theoretical framework to explain the emergence of local cell contraction pulses and their modulation by biochemical and mechanical signals.
Objective Dysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest... more Objective Dysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest a central role of interleukin (IL)-23-producing mononuclear phagocytes in disease pathogenesis. Mechanistic insights into the regulation of IL-23 are prerequisite for selective IL-23 targeting therapies as part of personalised medicine. Design We performed transcriptomic analysis to investigate IL-23 expression in human mononuclear phagocytes and peripheral blood mononuclear cells. We investigated the regulation of IL-23 expression and used single-cell RNA sequencing to derive a transcriptomic signature of hyperinflammatory monocytes. Using gene network correlation analysis, we deconvolved this signature into components associated with homeostasis and inflammation in patient biopsy samples. Results We characterised monocyte subsets of healthy individuals and patients with IBD that express IL-23. We identified autosensing and paracrine sensing of IL-1$\alpha$/IL-1$\beta$ and IL-10 as key cytokines that control IL-23-producing monocytes. Whereas Mendelian genetic defects in IL-10 receptor signalling induced IL-23 secretion after lipopolysaccharide stimulation, whole bacteria exposure induced IL-23 production in controls via acquired IL-10 signalling resistance. We found a transcriptional signature of IL-23-producing inflammatory monocytes that predicted both disease and resistance to antitumour necrosis factor (TNF) therapy and differentiated that from an IL-23-associated lymphocyte differentiation signature that was present in homeostasis and in disease. Conclusion Our work identifies IL-10 and IL-1 as critical regulators of monocyte IL-23 production. We differentiate homeostatic IL-23 production from hyperinflammation-associated IL-23 production in patients with severe ulcerating active Crohn's disease and anti-TNF treatment non-responsiveness. Altogether, we identify subgroups of patients with IBD that might benefit from IL-23p19 and/or IL-1$\alpha$/IL-1$\beta$-targeting therapies upstream of IL-23.
For half a century, artificial intelligence research has attempted to reproduce the human qualiti... more For half a century, artificial intelligence research has attempted to reproduce the human qualities of abstraction and reasoning-creating computer systems that can learn new concepts from a minimal set of examples, in settings where humans find this easy. While specific neural networks are able to solve an impressive range of problems, broad generalisation to situations outside their training data has proved elusive. In this work, we look at several novel approaches for solving the Abstraction & Reasoning Corpus (ARC). This is a dataset of abstract visual reasoning tasks introduced to test algorithms on broad generalization. Despite three international competitions with $100,000 in prizes, the best algorithms still fail to solve a majority of ARC tasks. The best solvers today rely on complex hand-crafted rules, without using machine learning at all. We revisit whether recent advances in neural networks allow progress on this task, or whether an entirely different class of models are required. First, we adapt the DreamCoder Neurosymbolic reasoning solver to ARC. DreamCoder automatically writes programs in a bespoke domain-specific language to perform reasoning, using a neural network to mimic human intuition. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allow DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation. We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks. LLMs are able to solve a different group of problems to state-of-the-art solvers, and provide an interesting way to complement other approaches. We perform an ensemble analysis, combining models to achieve better results than any system alone. Finally, we publish the arckit Python library to make future research on ARC easier. Teaser We develop machine learning techniques for abstraction and reasoning.
Objective Survival models are used extensively in biomedical sciences, where they allow the inves... more Objective Survival models are used extensively in biomedical sciences, where they allow the investigation of the effect of exposures on health outcomes. It is desirable to use diverse data sets in survival analyses, because this offers increased statistical power and generalisability of results. However, there are often challenges with bringing data together in one location or following an analysis plan and sharing results. DataSHIELD is an analysis platform that helps users to overcome these ethical, governance and process difficulties. It allows users to analyse data remotely, using functions that are built to restrict access to the detailed data items (federated analysis). Previous works have provided survival modelling functionality in DataSHIELD (dsSurvival package), but there is a requirement to provide functions that offer privacy enhancing survival curves that retain useful information. Results We introduce an enhanced version of the dsSurvival package which offers privacy enhancing survival curves for DataSHIELD. Different methods for enhancing privacy were evaluated for their effectiveness in enhancing privacy while maintaining utility. We demonstrated how our selected method could enhance privacy in different scenarios using real survival data. The details of how DataSHIELD can be used to generate survival curves can be found in the associated tutorial.
Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full ... more Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full access to the detailed data items (federated analysis). While this feature helps to overcome difficulties with data sharing, it can make it challenging to write code without full visibility of the data. One solution is to generate realistic, non-disclosive synthetic data that can be transferred to the analyst so they can perfect their code without the access limitation. When this process is complete, they can run the code on the real data. We have created a package in DataSHIELD (dsSynthetic) which allows generation of realistic synthetic data, building on existing packages. In our paper and accompanying tutorial we demonstrate how the use of synthetic data generated with our package can help DataSHIELD users with tasks such as writing analysis scripts and harmonising data to common scales and measures.
Summary Artificial intelligence (AI) is increasingly taking on a greater role in healthcare. Howe... more Summary Artificial intelligence (AI) is increasingly taking on a greater role in healthcare. However, hype and negative news reports about AI abound. Integrating patient and public involvement (PPI) in healthcare AI projects may help in adoption and acceptance of these technologies. We argue that AI algorithms should also be co-designed with patients and healthcare workers. We specifically suggest (1) including patients with lived experience of the disease, and (2) creating a research advisory group (RAG) and using these group meetings to walk patients through the process of AI model building, starting with simple (e.g., linear) models. We present a framework, case studies, best practices, and tools for applying participative data science to healthcare, enabling data scientists, clinicians, and patients to work together. The strategy of co-designing with patients can help set more realistic expectations for all stakeholders, since conventional narratives of AI revolve around dystopia or limitless optimism.
Achieving sufficient statistical power in a survival analysis usually requires large amounts of d... more Achieving sufficient statistical power in a survival analysis usually requires large amounts of data from different sites. Sensitivity of individual-level data, ethical and practical considerations regarding data sharing across institutions could be a potential challenge for achieving this added power. Hence we implemented a federated meta-analysis approach of survival models in DataSHIELD, where only anonymous aggregated data are shared across institutions, while simultaneously allowing for exploratory, interactive modelling. In this case, meta-analysis techniques to combine analysis results from each site are a solution, but a manual analysis workflow hinders exploration. Thus, the aim is to provide a framework for performing meta-analysis of Cox regression models across institutions without manual analysis steps for the data providers. We introduce a package (dsSurvival) which allows privacy preserving meta-analysis of survival models, including the calculation of hazard ratios. Our tool can be of great use in biomedical research where there is a need for building survival models and there are privacy concerns about sharing data. A tutorial in bookdown format with code, diagnostics, plots and synthetic data is available here: https://neelsoumya.github.io/dsSurvivalbookdown/ All code is available from the following repositories: https://github.com/neelsoumya/dsSurvivalClient/ https://github.com/neelsoumya/dsSurvival/ {\#}{\#}{\#} Competing Interest Statement The authors have declared no competing interest.
Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms t... more Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms that train themselves. They have been widely applied in the healthcare domain. However, many trained ML algorithms operate as ‘black boxes', producing a prediction from input data without a clear explanation of their workings. Non-transparent predictions are of limited utility in many clinical domains, where decisions must be justifiable. Here, we apply class-contrastive counterfactual reasoning to ML to demonstrate how specific changes in inputs lead to different predictions of mortality in people with severe mental illness (SMI), a major public health challenge. We produce predictions accompanied by visual and textual explanations as to how the prediction would have differed given specific changes to the input. We apply it to routinely collected data from a mental health secondary care provider in patients with schizophrenia. Using a data structuring framework informed by clinical knowledge, we captured information on physical health, mental health, and social predisposing factors. We then trained an ML algorithm and other statistical learning techniques to predict the risk of death. The ML algorithm predicted mortality with an area under receiver operating characteristic curve (AUROC) of 0.80 (95{\%} confidence intervals [0.78, 0.82]). We used class-contrastive analysis to produce explanations for the model predictions. We outline the scenarios in which class-contrastive analysis is likely to be successful in producing explanations for model predictions. Our aim is not to advocate for a particular model but show an application of the class-contrastive analysis technique to electronic healthcare record data for a disease of public health significance. In patients with schizophrenia, our work suggests that use or prescription of medications like antidepressants was associated with lower risk of death. Abuse of alcohol/drugs and a diagnosis of delirium were associated with higher risk of death. Our ML models highlight the role of co-morbidities in determining mortality in patients with schizophrenia and the need to manage co-morbidities in these patients. We hope that some of these bio-social factors can be targeted therapeutically by either patient-level or service-level interventions. Our approach combines clinical knowledge, health data, and statistical learning, to make predictions interpretable to clinicians using class-contrastive reasoning. This is a step towards interpretable AI in the management of patients with schizophrenia and potentially other diseases.
Background. Machine learning (ML), one aspect of artificial intelligence (AI), involves computer ... more Background. Machine learning (ML), one aspect of artificial intelligence (AI), involves computer algorithms that train themselves. They have been widely applied in the healthcare domain. However, many trained ML algorithms operate as black boxes, producing a prediction from input data without a clear explanation of their workings. Non-transparent predictions are of limited utility in many clinical domains, where decisions must be justifiable. Methods. Here, we apply class-contrastive counterfactual reasoning to ML to demonstrate how specific changes in inputs lead to different predictions of mortality in people with severe mental illness (SMI), a major public health challenge. We produce predictions accompanied by visual and textual explanations as to how the prediction would have differed given specific changes to the input. We apply it to routinely collected data from a mental health secondary care provider in patients with schizophrenia. Using a data structuring framework informed by clinical knowledge, we captured information on physical health, mental health, and social predisposing factors. We then trained an ML algorithm to predict the risk of death. Results. The ML algorithm predicted mortality with an area under receiver operating characteristic curve (AUROC) of 0.8 (compared to an AUROC of 0.67 from a logistic regression model), and produced class-contrastive explanations for its predictions. Conclusions. In patients with schizophrenia, our work suggests that use of medications like second generation antipsychotics and antidepressants was associated with lower risk of death. Abuse of alcohol/drugs and a diagnosis of delirium were associated with higher risk of death. Our ML models highlight the role of co-morbidities in determining mortality in patients with SMI and the need to manage them. We hope that some of these bio-social factors can be targeted therapeutically by either patient-level or service-level interventions. This approach combines clinical knowledge, health data, and statistical learning, to make predictions interpretable to clinicians using class-contrastive reasoning. This is a step towards interpretable AI in the management of patients with SMI and potentially other diseases. {\#}{\#}{\#} Competing Interest Statement RNC consults for Campden Instruments Ltd and receives royalties from Cambridge University Press, Cambridge Enterprise, and Routledge. SB, PL and PJ declare they have no conflicts of interest to disclose. {\#}{\#}{\#} Funding Statement This work was funded by an MRC Mental Health Data Pathfinder grant (MC PC 17213). PBJ is supported by the NIHR Applied Research Collaboration East of England. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This research was supported in part by the NIHR Cambridge Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the MRC, the NHS, the NIHR, or the Department of Health and Social Care. {\#}{\#}{\#} Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The CPFT Research Database operates under UK NHS Research Ethics approvals (REC references 12/EE/0407, 17/EE/0442; IRAS project ID 237953). All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes This study reports on human clinical data which cannot be published directly due to reasonable privacy concerns, as per NHS research ethics approvals and information governance rules.
Objectives: Face-to-face healthcare, including psychiatric provision, must continue despite reduc... more Objectives: Face-to-face healthcare, including psychiatric provision, must continue despite reduced interpersonal contact during the COVID-19 (SARS-CoV-2 coronavirus) pandemic. Community-based services might use domiciliary visits, consultations in healthcare settings, or remote consultations. Services might also alter direct contact between clinicians. We examined the effects of appointment types and clinician–clinician encounters upon infection rates.
Summary Local cell contraction pulses play important roles in tissue and cell morphogenesis. Here... more Summary Local cell contraction pulses play important roles in tissue and cell morphogenesis. Here, we improve a chemo-optogenetic approach and apply it to investigate the signal network that generates these pulses. We use these measurements to derive and parameterize a system of ordinary differential equations describing temporal signal network dynamics. Bifurcation analysis and numerical simulations predict a strong dependence of oscillatory system dynamics on the concentration of GEF-H1, an Lbc-type RhoGEF, which mediates the positive feedback amplification of Rho activity. This prediction is confirmed experimentally via optogenetic tuning of the effective GEF-H1 concentration in individual living cells. Numerical simulations show that pulse amplitude is most sensitive to external inputs into the myosin component at low GEF-H1 concentrations and that the spatial pulse width is dependent on GEF-H1 diffusion. Our study offers a theoretical framework to explain the emergence of local cell contraction pulses and their modulation by biochemical and mechanical signals.
Objective Dysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest... more Objective Dysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest a central role of interleukin (IL)-23-producing mononuclear phagocytes in disease pathogenesis. Mechanistic insights into the regulation of IL-23 are prerequisite for selective IL-23 targeting therapies as part of personalised medicine. Design We performed transcriptomic analysis to investigate IL-23 expression in human mononuclear phagocytes and peripheral blood mononuclear cells. We investigated the regulation of IL-23 expression and used single-cell RNA sequencing to derive a transcriptomic signature of hyperinflammatory monocytes. Using gene network correlation analysis, we deconvolved this signature into components associated with homeostasis and inflammation in patient biopsy samples. Results We characterised monocyte subsets of healthy individuals and patients with IBD that express IL-23. We identified autosensing and paracrine sensing of IL-1$\alpha$/IL-1$\beta$ and IL-10 as key cytokines that control IL-23-producing monocytes. Whereas Mendelian genetic defects in IL-10 receptor signalling induced IL-23 secretion after lipopolysaccharide stimulation, whole bacteria exposure induced IL-23 production in controls via acquired IL-10 signalling resistance. We found a transcriptional signature of IL-23-producing inflammatory monocytes that predicted both disease and resistance to antitumour necrosis factor (TNF) therapy and differentiated that from an IL-23-associated lymphocyte differentiation signature that was present in homeostasis and in disease. Conclusion Our work identifies IL-10 and IL-1 as critical regulators of monocyte IL-23 production. We differentiate homeostatic IL-23 production from hyperinflammation-associated IL-23 production in patients with severe ulcerating active Crohn's disease and anti-TNF treatment non-responsiveness. Altogether, we identify subgroups of patients with IBD that might benefit from IL-23p19 and/or IL-1$\alpha$/IL-1$\beta$-targeting therapies upstream of IL-23.
Dysregulated intestinal immune responses are the cause of inflammatory bowel diseases (IBD). Usin... more Dysregulated intestinal immune responses are the cause of inflammatory bowel diseases (IBD). Using single-cell and bulk transcriptomic approaches we investigate the responses of monocytes and peripheral blood mononuclear cells to multiple stimuli and relate those to transcriptional responses in the inflamed intestine. We identify auto- and paracrine sensing of IL-1$\alpha$/$\beta$ and IL-10 regulation as key signals that control the development of inflammatory IL-23-producing monocytes. Uptake of whole bacteria induces IL-10 resistance and favours IL-23 secretion. IL-1$\alpha$/$\beta$+CD14+ monocyte signatures are enriched in patients with ulcerating intestinal inflammation and resistance to anti-TNF therapy. In contrast, IL-23 and tumour necrosis factor expression in the absence of this inflammatory monocyte signature was associated with homeostatic lymphocyte differentiation explaining why IL-23 and TNF expression alone are poor predictors for IBD activity. Gene co-expression analysis assists the identification of IBD patient subgroups that might benefit from IL-23p19 and/or IL-1$\alpha$/IL-1$\beta$-targeting therapies.
Microbial community metabolomics, particularly in the human gut, are beginning to provide a new r... more Microbial community metabolomics, particularly in the human gut, are beginning to provide a new route to identify functions and ecology disrupted in disease. However, these data can be costly and difficult to obtain at scale, while amplicon or shotgun metagenomic sequencing data are readily available for populations of many thousands. Here, we describe a computational approach to predict potentially unobserved metabolites in new microbial communities, given a model trained on paired metabolomes and metagenomes from the environment of interest. Focusing on two independent human gut microbiome datasets, we demonstrate that our framework successfully recovers community metabolic trends for more than 50{\%} of associated metabolites. Similar accuracy is maintained using amplicon profiles of coral-associated, murine gut, and human vaginal microbiomes. We also provide an expected performance score to guide application of the model in new samples. Our results thus demonstrate that this ‘predictive metabolomic' approach can aid in experimental design and provide useful insights into the thousands of community profiles for which only metagenomes are currently available.
When stakeholders commit to building infrastructure as part of strategic, long-term planning, the... more When stakeholders commit to building infrastructure as part of strategic, long-term planning, the final facilities are not normally amenable to modification after completion. A consequence of this is that users are forced to operate within the original specifications for, at least, as long as it takes to carry out major refurbishments or retrofitting, and even then, the constraints imposed by the original layout may be inescapable. On one hand, the original infrastructure plans enhance (or limit) the users' ability to operate efficiently for years to come. As time passes and the payback period approaches, changing operating conditions and unforeseen bottlenecks in the original blueprint can, at best, affect the economic returns and, at worst, defeat the purpose of the whole project (see, for example, Castellon airport in Spain, which was built but is grossly underutilised), producing unanticipated economical, social and political repercussions. On the other hand, managers and operators (that is, those living with the consequences of the strategic planning) have some leeway to compensate for miscalculations by means of their tactical and operational planning. In this chapter, we explore the use of quantitative techniques to, first, amend bottlenecks and uncertain market and operating conditions that affect the performance of infrastructure investments (the tactic and operational levels), and second, validate the effectiveness of the original infrastructure design (the strategic level) under these changing conditions. More specifically, we present a rail scheduling case study where we combine demand forecasting using Machine Learning techniques and formal Operations Research methods to assess and maximise the value of already-existing infrastructure. Rail scheduling is a typical optimisation problem popular in the literature, but its potential value is bounded not only by its technical properties and specifications (how good the algorithm is) but also by the accuracy of data feeding the algorithm. Such data is critical in specifying the demand that a facility will experience in the future, and the costs that will be incurred to operate it. The use of intensive data analytics and appropriate Machine Learning techniques can resolve this and provide a substantial competitive edge for investors and operators of rail inter-modal terminals. We anticipate that Machine Learning algorithms that predict future demand, coupled with optimisation techniques that streamline operations of facilities, can be integrated to create tools that help policy makers and terminal operators maximise the value of their current infrastructure, while meeting ever-changing demand.
Information plays a critical role in complex biologicalsystems. This article proposes a role for ... more Information plays a critical role in complex biologicalsystems. This article proposes a role for information processing in questions around the origin of life and suggests how computational simulations may yield insights into questions related to the origin of life. Such a computational model of the origin of life would unify thermodynamics with information processing and we would gain an appreciation of why proteins and nucleotides evolved as the substrate of computation andinformation processing in living systems that we see on Earth. Answers to questions like these may give us insights into noncarbon based forms of life that we could search for outside Earth. I hypothesize that carbon-based life forms are only one amongst a continuum of life-like systems in the universe.Investigations into the role of computational substrates that allow information processing is important and could yield insights into:1) novel non-carbon based computational substrates thatmay have “life-like” pro...
Stochasticity and spatial distribution of the pathogen play a critical role in determining the ou... more Stochasticity and spatial distribution of the pathogen play a critical role in determining the outcome of an infection. 1 in a million immune system cells are specific to a particular pathogen. The serendipitous encounter of such a rare immune system cell with its fated antigen can determine the mortality of the infected animal. Moreover, pathogens may remain initially localized in a small volume of tissue. Hence stochastic and spatial aspects play an important role in pathogenesis, especially early on in the infection. Current efforts at investigating the effect of stochasticity and space in modeling of host immune response and pathogens use agent based models (ABMs). However these are computationally expensive. Population level approaches like ordinary differential equations (ODEs) are computationally tractable. However they make simplifying assumptions that are unlikely to be true early on in the infection. We proposed a stage-structured hybrid model that aims to strike a balance...
Objective. Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without h... more Objective. Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without havingfull access to the detailed data items (federated analysis). While this feature helps to overcome difficultieswith data sharing, it can make it challenging to write code without full visibility of the data. One solutionis to generate realistic, non-disclosive synthetic data that can be transferred to the analyst so they canperfect their code without the access limitation. When this process is complete, they can run the code on the real data. Results. We have created a package in DataSHIELD (dsSynthetic) which allows generation of realistic synthetic data, building on existing packages. In our paper and accompanying tutorial we demonstratehow the use of synthetic data generated with our package can help DataSHIELD users with tasks suchas writing analysis scripts and harmonising data to common scales and measures.
Interdisciplinary Description of Complex Systems, 2021
The immune system is a distributed decentralized system that functions without any centralized co... more The immune system is a distributed decentralized system that functions without any centralized control. The immune system has millions of cells that function somewhat independently and can detect and respond to pathogens with considerable speed and efficiency. Lymph nodes are physical anatomical structures that allow the immune system to rapidly detect pathogens and mobilize cells to respond to it. Lymph nodes function as: 1) information processing centres, and 2) a distributed detection and response network. We introduce biologically inspired computing that uses lymph nodes as inspiration. We outline applications to diverse domains like mobile robots, distributed computing clusters, peer-to-peer networks and online social networks. We argue that lymph node inspired computing systems provide powerful metaphors for distributed computing and complement existing artificial immune systems. We view our work as a first step towards holistic simulations of the immune system that would capture all the complexities and the power of a complex adaptive system like the immune system. Ultimately this would lead to immune system inspired computing that captures all the complexities and power of the immune system in human-engineered complex systems.
ObjectiveAchieving sufficient statistical power in a survival analysis usually requires large amo... more ObjectiveAchieving sufficient statistical power in a survival analysis usually requires large amounts of data from different sites. Sensitivity of individual-level data, ethical and practical considerations regarding data sharing across institutions could be a potential challenge for achieving this added power. Hence we implemented a federated meta-analysis approach of survival models in DataSHIELD, where only anonymous aggregated data are shared across institutions, while simultaneously allowing for exploratory, interactive modelling.In this case, meta-analysis techniques to combine analysis results from each site are a solution, but an analytic workflow involving local analysis undertaken at individual studies hinders exploration. Thus, the aim is to provide a framework for performing meta-analysis of Cox regression models across institutions without manual analysis steps for the data providers.ResultsWe introduce a package (dsSurvival) which allows privacy preserving meta-analysi...
Cell Reports, 2020
Leif (2020) Optogenetic tuning reveals rho amplificationdependent dynamics of a cell contraction ... more Leif (2020) Optogenetic tuning reveals rho amplificationdependent dynamics of a cell contraction signal network. Cell Rep, 33 (9). a108467 1-14.
Journal of Psychiatric Research, 2020
BACKGROUND: COVID-19 has affected social interaction and healthcare worldwide. METHODS: We examin... more BACKGROUND: COVID-19 has affected social interaction and healthcare worldwide. METHODS: We examined changes in presentations and referrals to the primary provider of mental health and community health services in Cambridgeshire and Peterborough, UK (population ~0•86 million), plus service activity and deaths. We conducted interrupted time series analyses with respect to the time of UK "lockdown", which was shortly before the peak of COVID-19 infections in this area. We examined changes in standardized mortality ratio for those with and without severe mental illness (SMI). RESULTS: Referrals and presentations to nearly all mental and physical health services dropped at lockdown, with evidence for changes in both supply (service provision) and demand (help-seeking). This was followed by an increase in demand for some services. This pattern was seen for all major forms of presentation to liaison psychiatry services, except for eating disorders, for which there was no evidence of change. Inpatient numbers fell, but new detentions under the Mental Health Act were unchanged. Many services shifted from face-to-face to remote contacts. Excess mortality was primarily in the over-70s. There was a much greater increase in mortality for patients with SMI, which was not explained by ethnicity. CONCLUSIONS: COVID-19 has been associated with a system-wide drop in the use of mental health services, with some subsequent return in activity. "Supply" changes may have reduced access to mental health services for some. "Demand" changes may reflect a genuine reduction of need or a lack of help-seeking with pent-up demand. There has been a disproportionate increase in death among those with SMI during the pandemic. Cambridgeshire & Peterborough NHS Foundation Trust (CPFT) provides community PH services, psychological therapy services, and all secondary care MH services (including some embedded within primary care) to C&P, which has a population of ~0•86 million. 15 It provides MH inpatient facilities in the cities of Cambridge and Peterborough, PH inpatient rehabilitation facilities in Cambridge, Peterborough, Ely, and Wisbech, and Minor Injury Units (MIUs) in Ely, Wisbech, and Doddington. See Supplementary Methods for more details of population demographics, geography, and services. Data sources De-identified data was extracted from CPFT clinical records by CPFT's Information & Performance team and via the CPFT Research Database (NHS research ethics 17/EE/0442; see Supplementary Methods). We obtained data from four clinical records systems (RiO, SystmOne, PCMIS, and Epic; see Supplementary Methods), representing all clinical records systems for CPFT plus one for services provided by CPFT within another Trust (Epic). Data from each system were analysed separately. Variables For MH services, we extracted the following variables, per day: • Referrals to CPFT teams embedded in primary care. We counted referrals to CPFT's embedded primary care mental health service, and to its Improving Access to Psychological Therapies (IAPT) service, including self-referrals. • Calls to 111 for MH crises. CPFT provides the NHS 111 MH crisis telephone service. We counted calls and triage psychiatric assessments. • Referrals to secondary care CPFT teams. We classified teams as (a) child and adolescent mental health (CAMH) teams; (b) community MH teams (CMHTs) for adults; (c) crisis resolution/home treatment teams (CRHTs); (d) adult liaison psychiatry (LP) teams; (e) early intervention in psychosis (EIP) teams; (f) eating disorder teams; (g) other specialist services. See Supplementary Methods for detail. • Liaison Psychiatry referrals and presenting problems. For CPFT's LP service at Cambridge University Hospitals (CUH), we counted referrals, split as (a) from the Emergency Department (ED) with its associated Clinical Decision Unit (CDU), and (b) from other wards. When responding to a referral, LP clinicians record the primary reason(s) for referral. We counted presenting problems of (a) alcohol and/or drug use; (b) anxiety; (c) confusion, cognitive problems, requests for assistance with mental capacity assessment, and behavioural disturbance; (d) eating disorders; (e) low mood and suicidal ideation; (f) overdose and other forms of self-harm; (g) psychosis and mood elevation (hypomania/mania). • Admissions. We counted admissions, discharges, inter-ward transfers, and inpatients per day across MH wards (excluding day-care facilities). We classified admission days as "voluntary" or "detained"
Background. Face-to-face healthcare, including psychiatric provision, must continue despite reduc... more Background. Face-to-face healthcare, including psychiatric provision, must continue despite reduced interpersonal contact during the COVID-19 (SARS-CoV-2 coronavirus) pandemic. Community-based services might use domiciliary visits, consultations in healthcare settings, or remote consultations. Services might also alter direct contact between clinicians. Aims. We examined the effects of appointment types and clinician-clinician encounters upon infection rates. Methods. We modelled a COVID-19-like disease in a hypothetical community healthcare team, their patients, and patients' household contacts (family). In one condition, clinicians met patients and briefly met family (e.g. home visit or collateral history). In another, patients attended alone (e.g. clinic visit), segregated from each other. In another, face-to-face contact was eliminated (e.g. videoconferencing). We also varied clinician-clinician contact; baseline and ongoing "external" infection rates; whether over...
We outline an automated computational and machine learning framework that predicts disease severi... more We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.
Journal of The Royal Society Interface, 2018
The thymus is the primary organ for the generation of naive T cells, a key component of the immun... more The thymus is the primary organ for the generation of naive T cells, a key component of the immune system. Tolerance of T cells to self is achieved primarily in the thymic medulla, where immature T cells (thymocytes) sample self-peptides presented by medullary thymic epithelial cells (mTECs). A sufficiently strong interaction activates the thymocytes leading to negative selection. A key question of current interest is whether there is any structure in the manner in which mTECs present peptides: can any mTEC present any peptide at any time, or are there particular patterns of correlated peptide presentation? We investigate this question using a mathematical model of negative selection. We find that correlated patterns of peptide presentation may be advantageous in negatively selecting low-degeneracy thymocytes (that is, those thymocytes which respond to relatively few peptides). We also quantify the probability that an autoreactive thymocyte exits the thymus before it encounters a cognate antigen. The results suggest that heterogeneity of gene co-expression in mTECs has an effect on the probability of escape of autoreactive thymocytes.
Palgrave Communications, 2015
While cities have been the engine for innovation and growth for many millennia, they have also en... more While cities have been the engine for innovation and growth for many millennia, they have also endured disproportionately more crime than smaller cities. Similarly to other urban sociological quantities, such as income, gross domestic product (GDP) and number of granted patents, it has been observed that crime scales super-linearly with city size. The default assumption is that super-linear scaling of crime, like other urban attributes, derives from agglomerative effects (that is, increasing returns from potentially more productive connections among criminals). However, crime initiation appears to be generated linearly with the population of a city, and the number of law enforcement officials scales sublinearly with city population. We hypothesize that the observed scaling exponent for net crime in a city is the result of competing dynamics between criminals and law enforcement, each with different scaling exponents, and where criminals win in the numbers game. We propose a simple d...
Springer Proceedings in Complexity, Dec 26, 2016
Scientific collaboration networks are an important component of scientific output and contribute ... more Scientific collaboration networks are an important component of scientific output and contribute significantly to expanding our knowledge and to the economy and gross domestic product of nations. Here we examine a dataset from the Mendeley scientific collaboration network. We analyze this data using a combination of machine learning techniques and dynamical models. We find interesting clusters of countries with different characteristics of collaboration. Some of these clusters are dominated by developed countries that have higher number of self connections compared with connections to other countries. Another cluster is dominated by impoverished nations that have mostly connections and collaborations with other countries but fewer self connections. We also propose a complex systems dynamical model that explains these characteristics. Our model explains how the scientific collaboration networks of impoverished and developing nations change over time. We also find interesting patterns in the behaviour of countries that may reflect past foreign policies and contemporary geopolitics. Our model and analysis gives insights and guidelines into how scientific development of developing countries can be guided. This is intimately related to fostering economic development of impoverished nations and creating a richer and more prosperous society.
Gut, 2020
ObjectiveDysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest ... more ObjectiveDysregulated immune responses are the cause of IBDs. Studies in mice and humans suggest a central role of interleukin (IL)-23-producing mononuclear phagocytes in disease pathogenesis. Mechanistic insights into the regulation of IL-23 are prerequisite for selective IL-23 targeting therapies as part of personalised medicine.DesignWe performed transcriptomic analysis to investigate IL-23 expression in human mononuclear phagocytes and peripheral blood mononuclear cells. We investigated the regulation of IL-23 expression and used single-cell RNA sequencing to derive a transcriptomic signature of hyperinflammatory monocytes. Using gene network correlation analysis, we deconvolved this signature into components associated with homeostasis and inflammation in patient biopsy samples.ResultsWe characterised monocyte subsets of healthy individuals and patients with IBD that express IL-23. We identified autosensing and paracrine sensing of IL-1α/IL-1β and IL-10 as key cytokines that co...
ABSTRACTBACKGROUND & AIMSDysregulated immune responses are the cause of inflammatory bowel diseas... more ABSTRACTBACKGROUND & AIMSDysregulated immune responses are the cause of inflammatory bowel diseases. Studies in both mice and humans suggest a central role of IL-23 producing mononuclear phagocytes in disease pathogenesis. Mechanistic insights into the regulation of IL-23 are prerequisite for select IL-23 targeting therapies as part of personalized medicine.METHODSWe performed transcriptomic analysis to investigate IL-23 expression in human mononuclear phagocytes and peripheral blood mononuclear cells. We investigated the regulation of IL-23 expression and used single-cell RNA-sequencing to derive a transcriptomic signature of hyper-inflammatory monocytes. Using gene network correlation analysis, we deconvolve this signature into components associated with homeostasis and inflammation in patient biopsy samples.RESULTSWe characterized monocyte subsets of healthy individuals and patients with inflammatory bowel disease that express IL-23. We identified auto- and paracrine sensing of I...
Journal of the Royal Society, Interface / the Royal Society, 2016
West Nile virus (WNV) is an emerging pathogen that has decimated bird populations and caused seve... more West Nile virus (WNV) is an emerging pathogen that has decimated bird populations and caused severe outbreaks of viral encephalitis in humans. Currently, little is known about the within-host viral kinetics of WNV during infection. We developed mathematical models to describe viral replication, spread and host immune response in wild-type and immunocompromised mice. Our approach fits a target cell-limited model to viremia data from immunocompromised knockout mice and an adaptive immune response model to data from wild-type mice. Using this approach, we first estimate parameters governing viral production and viral spread in the host using simple models without immune responses. We then use these parameters in a more complex immune response model to characterize the dynamics of the humoral immune response. Despite substantial uncertainty in input parameters, our analysis generates relatively precise estimates of important viral characteristics that are composed of nonlinear combinati...
The Journal of Immunology
The vast majority of biological rates and times decrease systematically with increasing animal bo... more The vast majority of biological rates and times decrease systematically with increasing animal body size. For example, circulation times, a range of developmental times, and life span are systematically slower in larger animals. We use empirical data and ordinary differential equation (ODE) models of disease progression to show that immune response is independent of host body size, but that viral pathogen replication rates in vivo are systematically slower for larger animals. The models focus on West Nile Virus, but we suggest that this pattern holds more generally. We examine how the sizes and numbers of lymph nodes and the size of the lymphocyte repertoire enable immune responses that are similar across animals with vast differences in body size. We discuss how body size influences pathogenesis, epidemic spread of multi-host pathogens and the metabolic cost of immunity. Comparative studies of immune response and pathogen replication have implications for human health and animal di...
arXiv (Cornell University), Aug 16, 2010
The immune system can detect and respond against pathogens in time that does not vary with the si... more The immune system can detect and respond against pathogens in time that does not vary with the size of the host animal. We suggest that this is due to the architecture of lymph nodes. Lymph nodes are anatomical structures that facilitate the otherwise serendipitous encounter of immune system cells with pathogens. We develop two complementary mathematical approaches to derive the optimal distribution of lymph nodes that enable a rapid immune response. Our work gives insights into the optimal design and architecture of the immune system and provides valuable inspiration for designing efficient computing systems.
This document outlines the basics and derivations of a Bayesian linear regression model