Consensus Clustering of temporal profiles for the identification of metabolic markers of pre-diabetes in childhood (EarlyBird 73) (original) (raw)

Laboratory data clustering in defining population cohorts: Case study on metabolic indicators

JSCS, 2022

The knowledge on the general population health is important for creating public policies and organization of medical services. However, personal data are often limited, and mathematical models are employed to achieve a general overview. Cluster analysis was used in this study to assess general trends in population health based on laboratory data. Metabolic indicators were chosen to test the model and define population cohorts. Data on blood analysis of 33,049 persons, namely the concentrations of glucose, total cholesterol and triglycerides, were collected in a public health laboratory and used to define metabolic cohorts employing computational data clustering (CLARA method). The population was shown to be distributed in 3 clusters: persons with hypercholesterolemia with or without changes in the concentration of triglycerides or glucose, persons with reference or close to reference concentrations of all three analytes and persons with predominantly elevated all three parameters. Clustering of biochemical data, thus, is a useful statistical tool in defining population groups in respect to certain health aspect.

A comparison of methods for clustering longitudinal data with slowly changing trends

Communications in Statistics - Simulation and Computation, 2021

Longitudinal clustering provides a detailed yet comprehensible description of time profiles among subjects. With several approaches that are commonly used for this purpose, it remains unclear under which conditions a method is preferred over another method. We investigated the performance of five methods using Monte Carlo simulations on synthetic datasets, representing various scenarios involving polynomial time profiles. The performance was evaluated on two aspects: The agreement of the group assignment to the simulated reference, as measured by the split-join distance, and the trend estimation error, as measured by a weighted minimum of the mean squared error (WMMSE). Growth mixture modeling (GMM) was found to achieve the best overall performance, followed closely by a two-step approach using growth curve modeling and k-means (GCKM). Considering the model similarities between GMM and GCKM, the latter is preferred for large datasets for its computational efficiency. Longitudinal k-means (KML) and group-based trajectory modeling were found to have practically identical solutions in the case that the group trajectory model of the latter method is correctly specified. Both methods performed less than GMM and GCKM in most settings.

Model-based clustering for longitudinal data

Computational Statistics & Data Analysis, 2008

A model-based clustering method is proposed for clustering individuals on the basis of measurements taken over time. Data variability is taken into account through non-linear hierarchical models leading to a mixture of hierarchical models. We study both frequentist and Bayesian estimation procedures. From a classical viewpoint, we discuss maximum likelihood estimation of this family of models through the EM algorithm. From a Bayesian standpoint, we develop appropriate Markov chain Monte Carlo (MCMC) sampling schemes for the exploration of target posterior distribution of parameters. The methods are illustrated with the identification of hormone trajectories that are likely to lead to adverse pregnancy outcomes in a group of pregnant women.

Model-based clustering of longitudinal data

2010

A new family of mixture models for the model-based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are derived via expectation-maximization (EM) algorithms. The Bayesian information criterion is used for model selection and a convergence criterion based on Aitken's acceleration is used to determine convergence of these EM algorithms. This new family of models is applied to yeast sporulation time course data, where the models give good clustering performance. Further constraints are then imposed on the decomposition to allow a deeper investigation of correlation structure of the yeast data. These constraints greatly extend this new family of models, with the addition of many parsimonious models.

Kml: A package to cluster longitudinal data

Computer Methods and Programs in Biomedicine, 2011

Cohort studies are becoming essential tools in epidemiological research. In these studies, measurements are not restricted to single variables but can be seen as trajectories. Thus, an important question concerns the existence of homogeneous patient trajectories. KmL is an R package providing an implementation of k-means designed to work specifically on longitudinal data. It provides several different techniques for dealing with missing values in trajectories (classical ones like linear interpolation or LOCF but also new ones like copyMean). It can run k-means with distances specifically designed for longitudinal data (like Frechet distance or any user-defined distance). Its graphical interface helps the user to choose the appropriate number of clusters when classic criteria are not efficient. It also provides an easy way to export graphical representations of the mean trajectories resulting from the clustering. Finally, it runs the algorithm several times, using various kinds of starting conditions and/or numbers of clusters to be sought, thus sparing the user a lot of manual re-sampling.

Longitudinal omics modeling and integration in clinical metabonomics research: challenges in childhood metabolic health research

Frontiers in Molecular Biosciences, 2015

Systems biology is an important approach for deciphering the complex processes in health maintenance and the etiology of metabolic diseases. Such integrative methodologies will help better understand the molecular mechanisms involved in growth and development throughout childhood, and consequently will result in new insights about metabolic and nutritional requirements of infants, children and adults. To achieve this, a better understanding of the physiological processes at anthropometric, cellular and molecular level for any given individual is needed. In this respect, novel omics technologies in combination with sophisticated data modeling techniques are key. Due to the highly complex network of influential factors determining individual trajectories, it becomes imperative to develop proper tools and solutions that will comprehensively model biological information related to growth and maturation of our body functions. The aim of this review and perspective is to evaluate, succinctly, promising data analysis approaches to enable data integration for clinical research, with an emphasis on the longitudinal component. Approaches based on empirical and mechanistic modeling of omics data are essential to leverage findings from high dimensional omics datasets and enable biological interpretation and clinical translation. On the one hand, empirical methods, which provide quantitative descriptions of patterns in the data, are mostly used for exploring and mining datasets. On the other hand, mechanistic models are based on an understanding of the behavior of a system's components and condense information about the known functions, allowing robust and reliable analyses to be performed by bioinformatics pipelines and similar tools. Herein, we will illustrate current examples, challenges and perspectives in the applications of empirical and mechanistic modeling in the context of childhood metabolic health research.

Cluster Analysis Based on Fasting and Postprandial Plasma Glucose and Insulin Concentrations

Plasma glucose and insulin concentrations are clinical markers used in the diagnosis of metabolic diseases, particularly prediabetes and diabetes. In this paper, we carried out a cluster analysis using plasma glucose and insulin data in fasting and two-hour postprandial. Different clustering experiments were performed by changing the attributes, from one (fasting glucose) to four (fasting and postprandial glucose and insulin) attribute input to a k-means clustering algorithm. Based on the elbow and silhouette methods, three clusters were chosen to carry out the clustering experiments. The Pearson correlation coefficient was used to assess the dependence between the glucose and insulin levels for each cluster created. Results show that one cluster contained prediabetics, another cluster contained diabetics, and subjects without prediabetes and diabetes were assigned to another cluster. Although age was not used as an attribute, we have found that subjects in the three clusters have a...

Clustering of adult-onset diabetes into novel subgroups guides therapy and improves prediction of outcome

Background: Diabetes is presently classified into two main forms, type 1 (T1D) and type 2 diabetes (T2D), but especially T2D is highly heterogeneous. A refined classification could provide a powerful tool individualize treatment regimes and identify individuals with increased risk of complications already at diagnosis. Methods: We applied data-driven cluster analysis (k-means and hierarchical clustering) in newly diagnosed diabetic patients (N=8,980) from the Swedish ANDIS (All New Diabetics in Scania) cohort, using five variables (GAD-antibodies, BMI, HbA1c, HOMA2-B and HOMA2-IR), and related to prospective data on development of complications and prescription of medication from patient records. Replication was performed in three independent cohorts: the Scania Diabetes Registry (SDR, N=1466), ANDIU (All New Diabetics in Uppsala, N=844) and DIREVA (Diabetes Registry Vaasa, N=3485). Cox regression and logistic regression was used to compare time to medication, time to reaching the t...

Are All Breast-fed Infants Equal? Clustering Metabolomics Data to Identify Predictive Risk Clusters for Childhood Obesity

Journal of Pediatric Gastroenterology & Nutrition, 2018

Objectives: Fetal and early life represent a period of developmental plasticity during which metabolic pathways are modified by environmental and nutritional cues. Little is known on the pathways underlying this multifactorial complex. We explored whether 6 months old breast-fed infants could be clustered into metabolically similar groups and that those metabotypes could be used to predict later obesity risk. Methods: Plasma samples were obtained from 183 breast-fed infants aged 6 months participating in the European multicenter Childhood Obesity Project study. We measured amino acids along with polar lipid concentrations (acylcarnitines, lysophosphatidylcholines, phosphatidylcholines, sphingomyelins). We determined the metabotypes using a Bayesian agglomerative clustering method and investigated the properties of these clusters with respect to clinical, programming, and metabolic factors up to 6 years of age. Results: We identified 20 metabolite clusters comprising 1 to 39 children. Phosphatidylcholines predominantly influenced the clustering process. In the largest clusters (n ! 14), large differences existed for birth length (unadjusted P < 0.0001) and length and weight at 6 months (unadjusted P < 0.0001 and P ¼ 0.012, respectively). Infants tended to cluster together by country (unadjusted P < 0.001). The body mass index (BMI) z score at 6 years of age tended to differ (unadjusted P ¼ 0.07). Conclusions: Our exploratory study provided evidence that breast-fed infants are not metabolically homogeneous and that variation in metabolic profiles among infants may provide insight into later development and health. This work highlights the potential of metabotypes for identifying inter-individual differences that may form the basis for developing personalized early preventive strategies.

Cluster analysis of an insulin-dependent diabetic cohort towards the definition of clinical subtypes

Journal of Clinical Epidemiology, 1990

Clinical and biochemial data on 111 consecutive insulin-dependent diabetic children enrolled in a longitudinal prospective study were analyzed to determine if more than one clinical expression of Type I diabetes exists. Use of multivariate statistical methods, including Correspondence Analysis, k-means clustering and RECPAM (RECursive Partition and AMalgamation), show that there are two well differentiated clinical expressions of IDDM each characterized by a cluster. One is characterized by later age, less severe onset, longer symptom duration, less /3-cell disappearance after 12 months, more females; the other by earlier age, more sudden and severe onset, DR 3/4, earlier disappearance of j-cell function and more males. RECPAM analysis provides further insight into the structure of the two clusters. An other RECPAM tree identifies low, medium and high risk groups of disappearance of b-cell function at 12 months after diagnosis. Clinical expression of IDDM B-cell function Correspondence Analysis k-Means clustering RECPAM