DECODE: a new method for discovering clusters of different densities in spatial data (original) (raw)

Abstract

When clusters with different densities and noise lie in a spatial point set, the major obstacle to classifying these data is the determination of the thresholds for classification, which may form a series of bins for allocating each point to different clusters. Much of the previous work has adopted a model-based approach, but is either incapable of estimating the thresholds in an automatic way, or limited to only two point processes, i.e. noise and clusters with the same density. In this paper, we present a new density-based cluster method (DECODE), in which a spatial data set is presumed to consist of different point processes and clusters with different densities belong to different point processes. DECODE is based upon a reversible jump Markov Chain Monte Carlo (MCMC) strategy and divided into three steps. The first step is to map each point in the data to its _m_th nearest distance, which is referred to as the distance between a point and its _m_th nearest neighbor. In the second step, classification thresholds are determined via a reversible jump MCMC strategy. In the third step, clusters are formed by spatially connecting the points whose _m_th nearest distances fall into a particular bin defined by the thresholds. Four experiments, including two simulated data sets and two seismic data sets, are used to evaluate the algorithm. Results on simulated data show that our approach is capable of discovering the clusters automatically. Results on seismic data suggest that the clustered earthquakes, identified by DECODE, either imply the epicenters of forthcoming strong earthquakes or indicate the areas with the most intensive seismicity, this is consistent with the tectonic states and estimated stress distribution in the associated areas. The comparison between DECODE and other state-of-the-art methods, such as DBSCAN, OPTICS and Wavelet Cluster, illustrates the contribution of our approach: although DECODE can be computationally expensive, it is capable of identifying the number of point processes and simultaneously estimating the classification thresholds with little prior knowledge.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

  1. Institute of Geographical Sciences and Natural Resources Research, 11A, Datun Road Anwai, Beijing, 100101, China
    Tao Pei, A.-Xing Zhu & Chenghu Zhou
  2. Institute for Mathematical Sciences, Imperial College, London, SW7 2PG, UK
    Tao Pei
  3. Department of Mathematics, Imperial College, London, UK
    Ajay Jasra
  4. Department of Mathematics and Institute for Mathematical Sciences, Imperial College, London, UK
    David J. Hand
  5. Department of Geography, University of Wisconsin Madison, 550N, Park Street, Madison, WI, 53706-1491, USA
    A.-Xing Zhu

Authors

  1. Tao Pei
  2. Ajay Jasra
  3. David J. Hand
  4. A.-Xing Zhu
  5. Chenghu Zhou

Corresponding author

Correspondence toChenghu Zhou.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

About this article

Cite this article

Pei, T., Jasra, A., Hand, D.J. et al. DECODE: a new method for discovering clusters of different densities in spatial data.Data Min Knowl Disc 18, 337–369 (2009). https://doi.org/10.1007/s10618-008-0120-3

Download citation

Keywords