Probabilistic Two-way Clustering Approaches with Emphasis on the Maximum Interaction Criterion

Bock, Hans-Hermann

Abstract: We consider the problem of simultaneously and optimally clustering the rows and columns of a real-valued I x J data matrix X = (xi j) by corresponding row and columns partitions A = (A1; :::;Am) and B = (B1; :::;Bn), with given m and n. We emphasize the need to base the clustering method on a probabilistic model for the data and then to use standard methods from statistics (e.g., maximum likelihood, divergence) to characterize optimum two-way classifications. We survey some clustering criteria and algorithms proposed in the literature for various data types. Special emphasis is given to the maximum interaction clustering criterion proposed by the author in 1980. It can be shown that it results as the maximum likelihood clustering method under a two-way ANOVA model (with individual main effects, but cluster-specific interactions). After a simple data transformation (double-centering) well-known two-way SSQ clustering algorithms can directly be used for maximization.

Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Jahr 2016
Sprache Englisch
Identifikator DOI: 10.5445/KSP/1000058747/01
ISSN: 2363-9881
URN: urn:nbn:de:swb:90-677594
KITopen ID: 1000067759
Erschienen in Archives of Data Science, Series A
Band 1
Heft 1
Seiten 3-20
Lizenz CC BY-SA 3.0 DE: Creative Commons Namensnennung – Weitergabe unter gleichen Bedingungen 3.0 Deutschland
