Multivariate permutation tests for the k-sample problem with clustered data

Jörg Rahnenführer

Publikation: Working/Discussion PaperWU Working Paper

25 Downloads (Pure)


The present paper deals with the choice of clustering algorithms before treating a k-sample problem. We investigate multivariate data sets that are quantized by algorithms that define partitions by maximal support planes (MSP) of a convex function. These algorithms belong to a wide class containing as special cases both the well known k-means algorithm and the Kohonen (1985) algorithm and have been profoundly investigated by Pötzelberger and Strasser (1999). For computing the test statistics for the k-sample problem we replace the data points by their conditional expections with respect to the MSP-partition. We present Monte Carlo simulations of power functions of different tests for the k-sample problem whereas the tests are carried out as multivariate permutation tests to ensure that they hold the level. The results presented show that there seems to be a vital and decisive connection between the optimal choice of the clustering algorithm and the tails of the probability distribution of the data. Especially for distributions with heavy tails like the exponential distribution the performance of tests based on a quadratic convex function with k-means type partitions totally breaks down. (author's abstract)


ReiheReport Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

WU Working Paper Reihe

  • Report Series SFB \Adaptive Information Systems and Modelling in Economics and Management Science\