Clustering infrastructure (original) (raw)

Defining a clustering infrastructure, similar to the supervised framework currently available.

Use cases

Simple/direct (naive) clustering for data exploration and QC. Should work out of the box on an MSnSet and then be plotted with plot2D.
Optimisation infrastructure.

Algorithms of interest

kmeans, as a baseline clustering method
spectral clustering (kerlab::specc)
Gaussian mixture models (mclust)
other

Interface

Taking kmeans as example, and using the supervised framework as template.

library("pRoloc") library("pRolocdata") data(dunkley2006)

Simple clustering

res <- kmeansClustering(dunkley2006, centers = 9) head(fData(res)$kmeans)

plot2D(res, fcol = "kmeans")

plot of chunk clust

Optimising `k`

(param <- kmeansOptimisation(dunkley2006))

## Object of class "ClustRegRes"
##  Algorithm: kmeans 
##  Criteria: BIC AIC 
##  Parameters:
##   k : 1 2 ... 19 20

plot of chunk koptim

fvarLabels(res2 <- kmeansClustering(dunkley2006, param))

## [1] "markers"    "assigned"   "evidence"   "method"     "new"       
## [6] "pd.2013"    "pd.markers" "kmeans"

Optimise to ground truth

kmeansOptimisation(object, fcol), where fcol represents a feature data column with test cluster definitions, and the function would optimise kmeans and its parameter to match the priors. See clue package for criteria.

Compare clustering results

table(fData(res)$kmeans, fData(res)$specc) - possibly requires a renumbering of clusters.
something like plotClust(res, fcol = c("kmeans", "specc")) or even plot2D.
more than 2 clusters?

References

The clue package - tools to compute metrics to validate the quality of a clustering, as well as tools to deal with the comparison of a clustering with a known ground truth.
Quick-R Cluster Analysis page.

Add a custom footer