We consider the problem of simultaneously and optimally clustering the rows and columns of a real-valued I x J data matrix X = (xi j) by corresponding row and columns partitions A = (A1; :::;Am) and B = (B1; :::;Bn), with given m and n. We emphasize the need to base the clustering method on a probabilistic model for the data and then to use standard methods from statistics (e.g., maximum likelihood, divergence) to characterize optimum two-way classifications. We survey some clustering criteria and algorithms proposed in the literature for various data types. Special emphasis is given to the maximum interaction clustering criterion proposed by the author in 1980. It can be shown that it results as the maximum likelihood clustering method under a two-way ANOVA model (with individual main effects, but cluster-specific interactions). After a simple data transformation (double-centering) well-known two-way SSQ clustering algorithms can directly be used for maximization.