Measures in Data Mining Categorization and Computation (original) (raw)

Last Updated : 8 Dec, 2025

In data mining, measures are quantitative techniques used to summarize, describe, and analyze large datasets. They help transform raw data into meaningful statistics by capturing properties such as central tendency, dispersion, and distribution. These measures are essential for data-driven decision-making, pattern discovery, and predictive analysis.

Data mining measures are classified based on how their aggregate functions behave when data is partitioned.

1. Holistic Measures

A measure is holistic if it cannot be computed from fixed-size summaries of partitions and requires access to the full dataset.

Key properties:

Examples: median(), mode(), rank().

2. Distributive

A measure is distributive if it can be computed on data partitions and the partial results can be combined to obtain the final result.

Key properties:

Examples: sum(), count(), min(), max().

3. Algebraic Measures

A measure is algebraic if it can be computed using a fixed number of distributive measures.

Key properties:

Examples: avg() (uses sum() and count()), MinN(), MaxN(), centerOfMass().

Computation of Measures

Computing measures involves applying mathematical operations and aggregation logic to structured data. Steps in Measure Computation are discussed below:

1. Data Collection and Preprocessing

2. Measure Selection

3. Formula Application

4. Aggregation

5. Interpretation and Reporting