Detection of block-exchangeable structure in large-scale correlation matrices

Samuel Perreault*, Thierry Duchesne, Johanna G. Nešlehová

*Korrespondierende*r Autor*in für diese Arbeit

Publikation: Wissenschaftliche FachzeitschriftOriginalbeitrag in FachzeitschriftBegutachtung

Abstract

Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d−1)∕2 to at most K(K + 1)∕2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K<d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.

OriginalspracheEnglisch
Seiten (von - bis)400-422
Seitenumfang23
FachzeitschriftJournal of Multivariate Analysis
Jahrgang169
DOIs
PublikationsstatusVeröffentlicht - Jan. 2019
Extern publiziertJa

Bibliographische Notiz

Funding Information:
We would like to thank the Acting Editor Richard A. Lockhart, the Associate Editor and two reviewers for their careful reading and insightful comments. Special thanks are due to the Editor-in-Chief, Christian Genest, for stimulating conversations, encouragements, and help with copy editing. This research was funded by individual operating grants from the Natural Sciences and Engineering Research Council of Canada to TD ( RGPIN-2016-05883 ) and JGN ( RGPIN-2015-06801 ), a team grant from the Fonds de recherche du Québec – Nature et technologies to TD and JGN ( 2015–PR–183236 ), a team grant from the Canadian Statistical Sciences Institute to JGN, and a graduate scholarship from the Fonds de recherche du Québec – Nature et technologies to SP.

Publisher Copyright:
© 2018 Elsevier Inc.

Zitat