SillyPutty: Improved clustering by optimizing the silhouette width (original) (raw)

New Results

doi: https://doi.org/10.1101/2023.11.07.566055

Loading

Abstract

Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.

Availability The SillyPutty R package has been submitted to the Comprehensive R Archive Network (CRAN). Code to perform and analyze the simulations described here can be found in a Git project hosted at https://gitlab.com/krcoombes/sillyputty.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

Copyright

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.