Аннотация:In this book, most popular clustering techniques, K-Means for partitioning, Ward's method for hierarchical clustering, similarity clustering and consensus partitions are presented in the framework of the data recovery approach. This approach leads to the Pythagorean decomposition of
the data scatter into parts explained and unexplained by the found cluster structure.
The decomposition has led to a number observations that amount to a theoretical framework in clustering.
The framework appears to be well suited for extensions of the methods to different data types such as mixed scale data including continuous, nominal and binary features. In addition, a bunch of both conventional and original interpretation aids have been derived for both partitioning and hierarchical
clustering based on contributions of features and categories to clusters and splits.
One more strain of clustering techniques, one-by-one clustering which is becoming
increasingly popular, naturally emerges within the framework to give rise to intelligent versions of K-Means, mitigating the need for user-defined setting of the number of clusters and their hypothetical prototypes. Moreover, the framework leads to a set of mathematically proven properties relating classical
clustering with other clustering techniques such as conceptual clustering, spectral clustering,
consensus clustering, and graph theoretic clustering as well as with other data analysis concepts
such as decision trees and association in contingency data tables.