AUEB STATS SEMINARS 28/3/2019: Clustering ranking data via copulas by Marta Nai Ruscone
Tue 26 Mar 2019 - 14:11
ΚΥΚΛΟΣ ΣΕΜΙΝΑΡΙΩΝ ΣΤΑΤΙΣΤΙΚΗΣ ΜΑΡΤΙΟΣ 2019
Marta Nai Ruscone
Università Carlo Cattaneo (LIUC)
Clustering ranking data via copulas
ΠΕΜΠΤΗ 28/3/2019
12:00
ΑΙΘΟΥΣΑ T103, 1ος ΟΡΟΦΟΣ
ΝΕΟ ΚΤΙΡΙΟ ΟΠΑ, (ΤΡΟΙΑΣ 2)
ΠΕΡΙΛΗΨΗ
Clustering of ranking data aims at the identification of groups of subjects with a homogenous, common, preference behavior. Ranking data occurs when a number of subjects are asked to rank a list of objects according to their personal preference order. The input in cluster analysis is a distance matrix, whose elements measure the distances between rankings of two subjects. The choice of the distance dramatically affects the final result and therefore the computation of an appropriate distance matrix is an issue. Several distance measures have been proposed for ranking data (Alvo & Yu, 2014). The most important are the Kendall’s t, Spearman’s r and Cayley distances (Critchlow et al., 1991; Mallows, 1957; Spearman, 1904). When the aim is to emphasize top ranks, weighted distances for ranking data should be used (Tarsitano, 2005). We propose a generalization of this kind of distances using copulas. Those generalizations provide a more flexible instrument to model different types of data dependence structures and consider different situations in the classification process. Simulated and real data are used to illustrate the pertinence and the importance of our proposal.
Marta Nai Ruscone
Università Carlo Cattaneo (LIUC)
Clustering ranking data via copulas
ΠΕΜΠΤΗ 28/3/2019
12:00
ΑΙΘΟΥΣΑ T103, 1ος ΟΡΟΦΟΣ
ΝΕΟ ΚΤΙΡΙΟ ΟΠΑ, (ΤΡΟΙΑΣ 2)
ΠΕΡΙΛΗΨΗ
Clustering of ranking data aims at the identification of groups of subjects with a homogenous, common, preference behavior. Ranking data occurs when a number of subjects are asked to rank a list of objects according to their personal preference order. The input in cluster analysis is a distance matrix, whose elements measure the distances between rankings of two subjects. The choice of the distance dramatically affects the final result and therefore the computation of an appropriate distance matrix is an issue. Several distance measures have been proposed for ranking data (Alvo & Yu, 2014). The most important are the Kendall’s t, Spearman’s r and Cayley distances (Critchlow et al., 1991; Mallows, 1957; Spearman, 1904). When the aim is to emphasize top ranks, weighted distances for ranking data should be used (Tarsitano, 2005). We propose a generalization of this kind of distances using copulas. Those generalizations provide a more flexible instrument to model different types of data dependence structures and consider different situations in the classification process. Simulated and real data are used to illustrate the pertinence and the importance of our proposal.
Permissions in this forum:
You cannot reply to topics in this forum