Evaluate the performance of the K-means method when increasing the sample Size (original) (raw)

Deteksi Penyakit Diabetes Dengan Metode Fuzzy C-Means Clustering Dan K-Means Clustering

Computatio : Journal of Computer Science and Information Systems

Diabetes adalah penyakit yang terjadi ketika kandungan glukosa di dalam darah tinggi. Tes glukosa yang menghasilkan keakuratan tinggi harus dilakukan beberapa kali untuk mendeteksi diabetes di dalam tubuh. Beberapa indikator di dalam tubuh dapat menjadi titik awal untuk mendeteksi diabetes. Bagaimanapun juga, keterbatasan seorang tenaga medis dalam mendeteksi dalam jumlah data yang sangat besar dengan cara manual menjadi kendala. Salah satu solusi untuk gap tersebut adalah menggunakan komputer sebagai perhitungan matematika dalam metode pengelompokan K-Means dan Fuzzy C-Means. Pengelompokan terdiri dari kelompok diabetes dan non-diabetes. Pengujian untuk masing-masing metode dilakukan terhadap 9 data. Hasil pengujian terbaik metode K-Means adalah 73,438% dan untuk metode Fuzzy C-Means adalah 82,812%.

APPLICATION OF K-MEANS ALGORITHM IN DATA MINING

, http://www.euroasiapub.org (An open access scholarly, peer-reviewed, interdisciplinary, monthly, and fully refereed journal.) ABSTRACT It is an algorithm to classify or to group your objects based on attributes/features into K number of classified clusters based on the similarity of some attributes. Here K is positive integer number. The types of diabetes disorder symptoms are grouped based on the category of diabetes. The grouping is done by minimizing the sum of squares of distance between data and the corresponding cluster centroid. Thus, the purpose of K-mean clustering is to classify the data. Here the attributes are significant factors causing diabetes such as body mass index, diabetes pedigree function, Plasma glucose concentration in saliva and age .These factors must be grouped based on acquiring a type of diabetes or not .The acquired factor k. partitions the data into classes with high intra-class similarity or low inter-class similarity. An algorithm starts with a random solution, and iteratively makes small changes to the solution, each time improving it a little. When the algorithm cannot see any improvement anymore, it terminates. Ideally, at that point the current solution is close to optimal solution. The k‐means algorithm is a simple iterative method to partition a given dataset into a user specified number of clusters, k. • Here it is tested with a small cluster of symptoms and types of diabetes disorder. One needs to find a suitable stopping criterion for large dataset in medical diagnosis. Here it groups the type-i and type-ii diabetes in one group which is commonly present in most population also mody and gestational diabetes which occur in selective group in another cluster. K-means remains the most widely used partition clustering algorithm in practice. The algorithm is simple, easily understandable and reasonably scalable., http://www.euroasiapub.org (An open access scholarly, peer-reviewed, interdisciplinary, monthly, and fully refereed journal.)

An Approach to Data Mining in Healthcare: Improved K-means Algorithm

Journal of Industrial and Intelligent Information, 2013

Nowadays, the application of data mining in the healthcare industry is necessary. Data mining brings a set of tools and techniques that can be applied to discover hidden patterns that provide healthcare professionals an additional source of knowledge for making decisions. In more detail, clustering the patients that have the same status helps discovering new disease, but the suitable number of clusters is not often obvious. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Then, an improved algorithm is presented for learning k while clustering. Finally, we evaluate the algorithm, apply to dataset of patients and results show its efficiency.

K-means cluster analysis to support diabetic patient care

This research work seeks to improve the output of a data mining algorithm that supports diabetic patients' care. The model that is currently in operation uses three variables that are obtained from a general medical appointment database. This work aims to find other characteristics in the database to add them to those already considered to better describe patients to provide more accurate information. The article shows the process followed to improve the results of a k-means grouping algorithm for the follow-up process of diabetic patients. We present the process of defining the considered characteristics that were not part of the model, to analyze and eventually add them. A qualitative comparison between the algorithms is shown and the findings are explained during the analysis of the studied variables, in relation to sex and age of the patients.

Clustering of Patient Disease Data by Using K-Means Clustering

Clustering is a method of grouping records in a database based on certain criteria. One method of clustering is K-Means Clustering. K-Means Clustering divides data into multiple data sets and can accept data inputs without class labels. This research uses K-Means Clustering method and implemented on patient disease data at Haji Adam Malik Hospital in Medan. The results of this study provide an illustration of the tendency of patient diseases at Haji Adam Malik Hospital. Through this research is expected to be a reference to anticipate priority services for patients, especially patients Social Security and Healthcare Security user.

Statistical considerations on the k-means algorithm

Annals of the University of Craiova - Mathematics and Computer Science Series, 2015

Cluster analysis from a data mining point of view is an important method for knowledge discovery in large databases. Clustering has a wide range of applications in life sciences and over the years it has been used in many areas. The k-means is one of the most popular and simple clustering algorithm that keeps data in main memory. It is a well known algorithm for its efficiency in clustering large data sets and converges to acceptable results in different areas. The k-means algorithm with a large number of variables may be computationally faster than other algorithms. The current work presents the description of two algorithmic procedures involved in the implementation of k-means algorithm and also describes a statistical study on a known data set. The statistics obtained from the experiment, by detailed analysis, can be used to improve the future implementation techniques at the k-means algorithm to optimize data mining algorithms which are based on a model.

Application of K-Mean Algorithm for Medicine Data Clustering in Puskesmas Rumbai

2017

Through the government's health insurance program, efforts are made to ensure the health of the community through Puskesmas or community clinics. One of the most important components in health is the availability of medicines. The availability of medicines should be well managed to ensure that the medicines needed by the community are always available in sufficient quantities. Clustering on Data mining can be used to analyze the use of medicines during this time at a Puskesmas to be used as one of considerations for the Puskesmas to submit the demand of medicines in the period to come. The results of this study are expected to classify the level of medicines used in the pharmacy of Puskesmas in Rumbai Bukit Pekanbaru.

Analyze K-Value Selected Method of K-Means Clustering Algorithm to Clustering Province Based on Disease Case

International Journal of Innovative Technology and Exploring Engineering, 2020

Disease cases throughout Indonesia has increased as seen from the Indeks Pembangunan Masyarakat (IPKM). Globalization has the effect of increasing human mobility across provinces, thus accelerating the process of spreading epidemics that could pose a threat for Indonesia. The speed of action from government is needed to reducing the level if outbreaks of the disease. For this reason, accuracy from the government is needed to solving this problem. The data were taken from data disease cases in 2015 which consisted of 34 provinces in Indonesia based on the Central Statistics Agency in Indonesian. In K-Means clustering, determining of K-value is needed because it affects in convergence results. To solve this problem, this research analyzes three methods of K-Value, there are Silhouette, Elbow, and Gap Statistics Methods.The result of testing three methods of determining K-value obtained execution times on Silhouette 13.09s, Elbow 14.76s, and Gap Statistics 20.28s. So, choosing Silhouet...

Modified K-Means Clustering Algorithm for Disease Prediction

This work presents the outline of K-means clustering algorithm and enhanced technique applied on K-means clustering. The K-means clustering is the basic algorithm to find the groups of data or clusters in the dataset. To find the similar groups of data the initial selection of cetroid is done and the Euclidean distance is calculated from cetroid to all other data points, and based on the smaller Euclidean distance the data points are assigned to that centroid. The initial point selection effects on the results of the algorithm, both in the number of clusters found and their centroids. Methods to enhance the k-means clustering algorithm are discussed. With the help of these methods efficiency, accuracy, performance and computational time are improved. So, to improve the performance of clusters the Normalization which is a pre-processing stage is used to enhance the Euclidean distance by calculating more nearer centers, which result in reduced number of iterations which will reduce the computational time as compared to k-means clustering. By applying this enhanced technique one can build a new proposed algorithm which will be more efficient, supports faster data retrieval from databases, makes the data suitable for analysis and prediction, accurate and less time consuming than previous work.

Combinatorial K-Means Clustering as a Machine Learning Tool Applied to Diabetes Mellitus Type 2

International Journal of Environmental Research and Public Health

A new original procedure based on k-means clustering is designed to find the most appropriate clinical variables able to efficiently separate into groups similar patients diagnosed with diabetes mellitus type 2 (DMT2) and underlying diseases (arterial hypertonia (AH), ischemic heart disease (CHD), diabetic polyneuropathy (DPNP), and diabetic microangiopathy (DMA)). Clustering is a machine learning tool for discovering structures in datasets. Clustering has been proven to be efficient for pattern recognition based on clinical records. The considered combinatorial k-means procedure explores all possible k-means clustering with a determined number of descriptors and groups. The predetermined conditions for the partitioning were as follows: every single group of patients included patients with DMT2 and one of the underlying diseases; each subgroup formed in such a way was subject to partitioning into three patterns (good health status, medium health status, and degenerated health status...