Let n respondents rank order d items, and suppose that . Our main task is to
uncover and display the structure of the observed rank data by an exploratory
riffle shuffling procedure which sequentially decomposes the n voters into a
finite number of coherent groups plus a noisy group: where the noisy group
represents the outlier voters and each coherent group is composed of a finite
number of coherent clusters. We consider exploratory riffle shuffling of a set
of items to be equivalent to optimal two blocks seriation of the items with
crossing of some scores between the two blocks. A riffle shuffled coherent
cluster of voters within its coherent group is essentially characterized by the
following facts: 1) Voters have identical first TCA factor score, where TCA
designates taxicab correspondence analysis, an L1 variant of
correspondence analysis; 2) Any preference
is easily interpreted as riffle shuffling of its items; 3) The nature of
different riffle shuffling of items can be seen in the structure of the
contingency table of the first-order marginals constructed from the Borda
scorings of the voters; 4) The first TCA factor scores of the items of a
coherent cluster are interpreted as Borda scale of the items. We also introduce
a crossing index, which measures the extent of crossing of scores of voters
between the two blocks seriation of the items. The novel approach is explained
on the benchmarking SUSHI data set, where we show that this data set has a very
simple structure, which can also be communicated in a tabular form.
References
[1]
Kamishima, T. (2003) Nantonac Collaborative Filtering: Recommendation Based on Order Responses. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 2003, 583-588.
https://doi.org/10.1145/956750.956823
[2]
Huang, J. and Guestrin, C. (2012) Uncovering the Riffled Independence Structure of Ranked Data. Electronic Journal of Statistics, 6, 199-230.
https://doi.org/10.1214/12-EJS670
[3]
Lu, T. and Boutilier, C. (2014) Effective Sampling and Learning for Mallows Models with Pairwise Preference Data. Journal of Machine Learning Research, 15, 3783-3829.
[4]
Vitelli, V., Sørenson, Ø., Crispino, M., Frigessi, A. and Arjas, E. (2018) Probabilistic Preference Learning with the Mallows Rank Model. Journal of Machine Learning Research, 18, 1-49.
[5]
Diaconis, P. (1989) A Generalization of Spectral Analysis with Application to Ranked Data. Annals of Statistics, 17, 949-979.
https://doi.org/10.1214/aos/1176347251
[6]
Marden, J.I. (1995) Analyzing and Modeling of Rank Data. Chapman & Hall, London
[7]
Alvo, M. and Yu, P. (2014) Statistical Methods for Ranking Data. Springer, New York. https://doi.org/10.1007/978-1-4939-1471-5
[8]
Bayer, D. and Diaconis, P. (1992) Trailing the Dovetail Shuffle to Its Lair. Annals of Probability, 2, 294-313. https://doi.org/10.1214/aoap/1177005705
[9]
Choulakian, V. (2016) Globally Homogenous Mixture Components and Local Heterogeneity of Rank Data. arXiv:1608.05058
[10]
Choulakian, V. (2006) Taxicab Correspondence Analysis. Psychometrika, 71, 333-345.
https://doi.org/10.1007/s11336-004-1231-4
[11]
Choulakian, V. (2016) Matrix Factorizations Based on Induced Norms. Statistics, Optimization and Information Computing, 4, 1-14.
https://doi.org/10.19139/soic.v4i1.160
[12]
De Borda, J. (1781) Mémoire sur les élections au scrutin. Histoire de L’Académie Royale des Sciences, 102, 657-665.
[13]
Benzécri, J.P. (1991) Comment on Leo A. Goodman’s Invited Paper. Journal of the American Statistical Association, 86, 1112-1115.
https://doi.org/10.1080/01621459.1991.10475157
[14]
Van de Velden, M. (2000) Dual Scaling and Correspondence Analysis of Rank Order Data. In: Heijmans, R.D.H., Pollock, D.S.G. and Satorra, A. Eds., Innovations in Multivariate Statistical Analysis, Vol. 36, Kluwer Academic Publishers, Dordrecht, 87-99. https://doi.org/10.1007/978-1-4615-4603-0_6
[15]
Torres, A. and Greenacre, M. (2002) Dual Scaling and Correspondence Analysis of Preferences, Paired Comparisons and Ratings. International Journal of Research in Marketing, 19, 401-405. https://doi.org/10.1016/S0167-8116(02)00101-5
[16]
Nishisato, S. (1980) Analysis of Categorical Data: Dual Scaling and Its Applications. University of Toronto Press, Toronto. https://doi.org/10.3138/9781487577995
[17]
Choulakian, V. (2014) Taxicab Correspondence Analysis of Ratings and Rankings. Journal de la Société Française de Statistique, 155, 1-23.
[18]
Khot, S. and Naor, A. (2012) Grothendieck-Type Inequalities in Combinatorial Optimization. Communications on Pure and Applied Mathematics, 65, 992-1035.
https://doi.org/10.1002/cpa.21398
[19]
Choulakian, V. and Abou-Samra, G. (2020) Mean Absolute Deviations about the Mean, Cut Norm and Taxicab Correspondence Analysis. Open Journal of Statistics, 10, 97-112. https://doi.org/10.4236/ojs.2020.101008
[20]
Diaconis, P. (1988) Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Hayward, CA.
[21]
Murphy, T.B. and Martin, D. (2003) Mixtures of Distance-Based Models for Ranking Data. Computational Statistics and Data Analysis, 41, 645-655.
https://doi.org/10.1016/S0167-9473(02)00165-2