Clustering infrastructure (original) (raw)
Defining a clustering infrastructure, similar to the supervised framework currently available.
Use cases
- Simple/direct (naive) clustering for data exploration and QC. Should work out of the box on an
MSnSet
and then be plotted withplot2D
. - Optimisation infrastructure.
Algorithms of interest
kmeans
, as a baseline clustering method- spectral clustering (
kerlab::specc
) - Gaussian mixture models (
mclust
) - other
Interface
Taking kmeans
as example, and using the supervised framework as template.
library("pRoloc") library("pRolocdata") data(dunkley2006)
Simple clustering
res <- kmeansClustering(dunkley2006, centers = 9) head(fData(res)$kmeans)
plot2D(res, fcol = "kmeans")
Optimising k
(param <- kmeansOptimisation(dunkley2006))
## Object of class "ClustRegRes"
## Algorithm: kmeans
## Criteria: BIC AIC
## Parameters:
## k : 1 2 ... 19 20
fvarLabels(res2 <- kmeansClustering(dunkley2006, param))
## [1] "markers" "assigned" "evidence" "method" "new"
## [6] "pd.2013" "pd.markers" "kmeans"
Optimise to ground truth
kmeansOptimisation(object, fcol)
, where fcol
represents a feature data column with test cluster definitions, and the function would optimise kmeans
and its parameter to match the priors. See clue
package for criteria.
Compare clustering results
table(fData(res)$kmeans, fData(res)$specc)
- possibly requires a renumbering of clusters.- something like
plotClust(res, fcol = c("kmeans", "specc"))
or evenplot2D
. - more than 2 clusters?
References
- The clue package - tools to compute metrics to validate the quality of a clustering, as well as tools to deal with the comparison of a clustering with a known ground truth.
- Quick-R Cluster Analysis page.