A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years (original) (raw)

A predictive ensemble classifier for the gene expression diagnosis of ASD at ages 1 to 4 years

Molecular Psychiatry

Autism Spectrum Disorder (ASD) diagnosis remains behavior-based and the median age of diagnosis is ~52 months, nearly 5 years after its first-trimester origin. Accurate and clinically-translatable early-age diagnostics do not exist due to ASD genetic and clinical heterogeneity. Here we collected clinical, diagnostic, and leukocyte RNA data from 240 ASD and typically developing (TD) toddlers (175 toddlers for training and 65 for test). To identify gene expression ASD diagnostic classifiers, we developed 42,840 models composed of 3570 gene expression feature selection sets and 12 classification methods. We found that 742 models had AUC-ROC ≥ 0.8 on both Training and Test sets. Weighted Bayesian model averaging of these 742 models yielded an ensemble classifier model with accurate performance in Training and Test gene expression datasets with ASD diagnostic classification AUC-ROC scores of 85–89% and AUC-PR scores of 84–92%. ASD toddlers with ensemble scores above and below the overall...

Characteristics and Predictive Value of Blood Transcriptome Signature in Males with Autism Spectrum Disorders

Autism Spectrum Disorders (ASD) is a spectrum of highly heritable neurodevelopmental disorders in which known mutations contribute to disease risk in 20% of cases. Here, we report the results of the largest blood transcriptome study to date that aims to identify differences in 170 ASD cases and 115 age/sex-matched controls and to evaluate the utility of gene expression profiling as a tool to aid in the diagnosis of ASD. The differentially expressed genes were enriched for the neurotrophin signaling, long-term potentiation/depression, and notch signaling pathways. We developed a 55-gene prediction model, using a cross-validation strategy, on a sample cohort of 66 male ASD cases and 33 age-matched male controls (P1). Subsequently, 104 ASD cases and 82 controls were recruited and used as a validation set (P2). This 55-gene expression signature achieved 68% classification accuracy with the validation cohort (area under the receiver operating characteristic curve (AUC): 0.70 [95% confidence interval [CI]: 0.62-0.77]). Not surprisingly, our prediction model that was built and trained with male samples performed well for males (AUC 0.73, 95% CI 0.65-0.82), but not for female samples (AUC 0.51, 95% CI 0.36-0.67). The 55-gene signature also performed robustly when the prediction model was trained with P2 male samples to classify P1 samples (AUC 0.69, 95% CI 0.58-0.80). Our result suggests that the use of blood expression profiling for ASD detection may be feasible. Further study is required to determine the age at which such a test should be deployed, and what genetic characteristics of ASD can be identified.

Prediction of Autism by Translation and Immune/ Inflammation Coexpressed Genes in Toddlers From Pediatric Community Practices

The identification of genomic signatures that aid early identification of individuals at risk for autism spectrum disorder (ASD) in the toddler period remains a major challenge because of the genetic and phenotypic heterogeneity of the disorder. Generally, ASD is not diagnosed before the fourth to fifth birthday. OBJECTIVE To apply a functional genomic approach to identify a biologically relevant signature with promising performance in the diagnostic classification of infants and toddlers with ASD.

A Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

IEEE Access

Autism Spectrum Disorder (ASD) is a type of neurodevelopmental disorder that affects the everyday life of affected patients. Though it is considered hard to completely eradicate this disease, disease severity can be mitigated by taking early interventions. In this paper, we propose an effective framework for the evaluation of various Machine Learning (ML) techniques for the early detection of ASD. The proposed framework employs four different Feature Scaling (FS) strategies i.e., Quantile Transformer (QT), Power Transformer (PT), Normalizer, and Max Abs Scaler (MAS). Then, the feature-scaled datasets are classified through eight simple but effective ML algorithms like Ada Boost (AB), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA). Our experiments are performed on four standard ASD datasets (Toddlers, Adolescents, Children, and Adults). Comparing the classification outcomes using various statistical evaluation measures (Accuracy, Receiver Operating Characteristic: ROC curve, F1-score, Precision, Recall, Mathews Correlation Coefficient: MCC, Kappa score, and Log loss), the best-performing classification methods, and the best FS techniques for each ASD dataset are identified. After analyzing the experimental outcomes of different classifiers on feature-scaled ASD datasets, it is found that AB predicted ASD with the highest accuracy of 99.25%, and 97.95% for Toddlers and Children, respectively and LDA predicted ASD with the highest accuracy of 97.12% and 99.03% for Adolescents and Adults datasets, respectively. These highest accuracies are achieved while scaling Toddlers and Children with normalizer FS and Adolescents and Adults with the QT FS method. Afterward, the ASD risk factors are calculated, and the most important attributes are ranked according to their importance values using four different Feature Selection Techniques (FSTs) i.e., Info Gain Attribute Evaluator (IGAE), Gain Ratio Attribute Evaluator (GRAE), Relief F Attribute Evaluator (RFAE), and Correlation Attribute Evaluator (CAE). These detailed experimental evaluations indicate that proper finetuning of the ML methods can play an essential role in predicting ASD in people of different ages. We argue that the detailed feature importance analysis in this paper will guide the decision-making of healthcare practitioners while screening ASD cases. The proposed framework has achieved promising results compared to existing approaches for the early detection of ASD.

Machine learning analysis of pregnancy data enables early identification of a subpopulation of newborns with ASD

Scientific Reports

To identify newborns at risk of developing ASD and to detect ASD biomarkers early after birth, we compared retrospectively ultrasound and biological measurements of babies diagnosed later with ASD or neurotypical (NT) that are collected routinely during pregnancy and birth. We used a supervised machine learning algorithm with a cross-validation technique to classify NT and ASD babies and performed various statistical tests. With a minimization of the false positive rate, 96% of NT and 41% of ASD babies were identified with a positive predictive value of 77%. We identified the following biomarkers related to ASD: sex, maternal familial history of auto-immune diseases, maternal immunization to CMV, IgG CMV level, timing of fetal rotation on head, femur length in the 3rd trimester, white blood cell count in the 3rd trimester, fetal heart rate during labor, newborn feeding and temperature difference between birth and one day after. Furthermore, statistical models revealed that a subpopu...

Early diagnosis of ASD traits in children by using logistic regression and machine learning

AGRIVOLTAICS2021 CONFERENCE: Connecting Agrivoltaics Worldwide

Machine learning plays vital role in health care which requires process which reduces cost and time. The main aim of this paper is to implement machine learning algorithm and predict the disorder of autism. Autism is a neural and progressive disorder that initiates in childhood and persists all over a person's lifespan. The present study proposes the logistic regression algorithm of machine learning to classify the autism spectrum disorder. Number of features like age, gender, country, jaundice affected etc. are extracted from the dataset using algorithm and statistically analyzed. The depiction of statistical result showed sensitivity=96%, specificity=0.9%, accuracy=52%, precision=67%, F1= 67%. It is calculated using R studio which is generally used for statistical calculations.

Developing a Predictive Gene Classifier for Autism Spectrum Disorders Based upon Differential Gene Expression Profiles of Phenotypic Subgroups

North American journal of medicine & science, 2013

Autism spectrum disorders (ASD) are neurodevelopmental disorders which are currently diagnosed solely on the basis of abnormal stereotyped behavior as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been identified on the basis of genetic analyses and up to 20% of ASD cases can be collectively associated with a genetic abnormality, no single gene or genetic variant is applicable to more than 1-2 percent of the general ASD population. In this report, we apply class prediction algorithms to gene expression profiles of lymphoblastoid cell lines (LCL) from several phenotypic subgroups of idiopathic autism defined by cluster analyses of behavioral severity scores on the Autism Diagnostic Interview-Revised diagnostic instrument for ASD. We further demonstrate that individuals from these ASD subgroups can be distinguished from nonautistic controls on the basis of limited sets of differentially expressed genes with a predicted ...

Classification of autism spectrum disorder from blood metabolites: Robustness to the presence of co-occurring conditions

Research in Autism Spectrum Disorders, 2020

Background: Previous studies have found plasma measurements of metabolites from the folatedependent one-carbon metabolism (FOCM) and transsulfuration (TS) pathways to be useful for differentiating individuals with autism spectrum disorder (ASD) from their typically developing peers. However, ASD is heterogeneous due to wide variation in the presence of co-occurring behavioral and medical conditions, and it is unknown how these conditions influence the ability to identify ASD based on FOCM/TS metabolites. Method: This study employs a previously developed multivariate model that makes use of five FOCM/TS measurements (S-adenosylmethionine/S-adenosylhomocysteine, glutamylcysteine, glutathione disulfide, free cystine/free cysteine, and percent oxidized glutathione) to distinguish children with ASD from typically developing children. The model is used here to evaluate an independent cohort of individuals having ASD with diagnosed co-occurring conditions (age range 2-17 years old) and assess classifier performance in the presence/absence of these conditions. The four categories of co-occurring conditions considered were allergic disorders, gastrointestinal disorders, immune/metabolic disorders, and neurological disorders. All data were collected and retrospectively analyzed from previous clinical studies. Results: The model was able to identify 124 of 131 participants with ASD (94.7 %) correctly regardless of co-occurring condition status. Model performance was generally not sensitive to the absence or presence of most co-occurring conditions, with the exceptions of ever/never having allergies or gastrointestinal symptoms, or currently (not) having any condition, all of which had minor impacts on model prediction accuracy. Conclusion: The results of this exploratory study suggest that a FOCM/TS-based classifier for diagnosing ASD may potentially be robust to variations in co-occurring conditions and potentially applicable across ASD subtypes. Larger, more comprehensive follow-up studies with typically developing and/or developmentally delayed control groups are required to provide a more conclusive assessment of classifier robustness to co-occurring conditions.

Early Predictors of ASD in Young Children Using a Nationally Representative Data Set

Journal of Early Intervention, 2013

Current clinical diagnosis of Autism Spectrum Disorders (ASD) occurs between 3 and 4 years of age, but increasing evidence indicates that intervention begun earlier may improve outcomes. Using secondary analysis of the Early Childhood Longitudinal Study–Birth Cohort data set, the current study identifies early predictors prior to the diagnosis of ASD at 4 years for approximately 100 children. Children with ASD were compared with children with other disabilities and children who were typically developing. Multinomial logistic regression analyses identified limited unique characteristics (e.g., self-regulation and sleep patterns) at the 9-month time point. A majority of the differences in communication and language, mental/cognitive function, motor function, social interaction, and self-regulation were found at the 2-year time point. Implications for research and practice are presented.

Data-driven Autism Biomarkers Selection by using Signal Processing and Machine Learning Techniques

Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies, 2019

To analyze microarray gene expression data from homogeneous group of children diagnosed with classic autism, a synergy of signal processing and machine learning techniques is proposed. The main focus of the paper is the gene expression preprocessing, which relies on Fractional Fourier Transformation, and the obtained data is further used for biomarker selection using an entropy-based method. This is a crucial step needed to obtain knowledge of the most informative genes (biomarkers) in terms of their discriminative power between the autistic and the control (healthy) group. The relevance of the selected biomarkers is tested using discriminative and generative machine learning classification algorithms. Furthermore, a data-driven approach is used to evaluate the performance of the classifiers by using a set of two performance measures (sensitivity and specificity). The evaluation showed that the model learned by Naive Bayes provides best results. Finally, a reliable biomarkers set is obtained and each gene is analyzed in terms of its chromosomal location and accordingly compared to the critical chromosomes published in the literature.