OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Open Journal of Statistics 2021

Uncovering and Displaying the Coherent Groups of Rank Data by Exploratory Riffle Shuffling

DOI: 10.4236/ojs.2021.111010, PP. 178-212

Vartan Choulakian, Jacques Allard

Keywords: Borda Score and Scale, Exploratory Riffle Shuffle, Coherent Group, Coherent Cluster, Crossing Index, Taxicab Correspondence Analysis

Full-Text Cite this paper Add to My Lib

Abstract:

Let n respondents rank order d items, and suppose that $\"\"$ . Our main task is to uncover and display the structure of the observed rank data by an exploratory riffle shuffling procedure which sequentially decomposes the n voters into a finite number of coherent groups plus a noisy group: where the noisy group represents the outlier voters and each coherent group is composed of a finite number of coherent clusters. We consider exploratory riffle shuffling of a set of items to be equivalent to optimal two blocks seriation of the items with crossing of some scores between the two blocks. A riffle shuffled coherent cluster of voters within its coherent group is essentially characterized by the following facts: 1) Voters have identical first TCA factor score, where TCA designates taxicab correspondence analysis, an L₁ variant of correspondence analysis; 2) Any preference is easily interpreted as riffle shuffling of its items; 3) The nature of different riffle shuffling of items can be seen in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters; 4) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items. We also introduce a crossing index, which measures the extent of crossing of scores of voters between the two blocks seriation of the items. The novel approach is explained on the benchmarking SUSHI data set, where we show that this data set has a very simple structure, which can also be communicated in a tabular form.

References

[1]	Kamishima, T. (2003) Nantonac Collaborative Filtering: Recommendation Based on Order Responses. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 2003, 583-588. https://doi.org/10.1145/956750.956823
[2]	Huang, J. and Guestrin, C. (2012) Uncovering the Riffled Independence Structure of Ranked Data. Electronic Journal of Statistics, 6, 199-230. https://doi.org/10.1214/12-EJS670
[3]	Lu, T. and Boutilier, C. (2014) Effective Sampling and Learning for Mallows Models with Pairwise Preference Data. Journal of Machine Learning Research, 15, 3783-3829.
[4]	Vitelli, V., Sørenson, Ø., Crispino, M., Frigessi, A. and Arjas, E. (2018) Probabilistic Preference Learning with the Mallows Rank Model. Journal of Machine Learning Research, 18, 1-49.
[5]	Diaconis, P. (1989) A Generalization of Spectral Analysis with Application to Ranked Data. Annals of Statistics, 17, 949-979. https://doi.org/10.1214/aos/1176347251
[6]	Marden, J.I. (1995) Analyzing and Modeling of Rank Data. Chapman & Hall, London
[7]	Alvo, M. and Yu, P. (2014) Statistical Methods for Ranking Data. Springer, New York. https://doi.org/10.1007/978-1-4939-1471-5
[8]	Bayer, D. and Diaconis, P. (1992) Trailing the Dovetail Shuffle to Its Lair. Annals of Probability, 2, 294-313. https://doi.org/10.1214/aoap/1177005705
[9]	Choulakian, V. (2016) Globally Homogenous Mixture Components and Local Heterogeneity of Rank Data. arXiv:1608.05058
[10]	Choulakian, V. (2006) Taxicab Correspondence Analysis. Psychometrika, 71, 333-345. https://doi.org/10.1007/s11336-004-1231-4
[11]	Choulakian, V. (2016) Matrix Factorizations Based on Induced Norms. Statistics, Optimization and Information Computing, 4, 1-14. https://doi.org/10.19139/soic.v4i1.160
[12]	De Borda, J. (1781) Mémoire sur les élections au scrutin. Histoire de L’Académie Royale des Sciences, 102, 657-665.
[13]	Benzécri, J.P. (1991) Comment on Leo A. Goodman’s Invited Paper. Journal of the American Statistical Association, 86, 1112-1115. https://doi.org/10.1080/01621459.1991.10475157
[14]	Van de Velden, M. (2000) Dual Scaling and Correspondence Analysis of Rank Order Data. In: Heijmans, R.D.H., Pollock, D.S.G. and Satorra, A. Eds., Innovations in Multivariate Statistical Analysis, Vol. 36, Kluwer Academic Publishers, Dordrecht, 87-99. https://doi.org/10.1007/978-1-4615-4603-0_6
[15]	Torres, A. and Greenacre, M. (2002) Dual Scaling and Correspondence Analysis of Preferences, Paired Comparisons and Ratings. International Journal of Research in Marketing, 19, 401-405. https://doi.org/10.1016/S0167-8116(02)00101-5
[16]	Nishisato, S. (1980) Analysis of Categorical Data: Dual Scaling and Its Applications. University of Toronto Press, Toronto. https://doi.org/10.3138/9781487577995
[17]	Choulakian, V. (2014) Taxicab Correspondence Analysis of Ratings and Rankings. Journal de la Société Française de Statistique, 155, 1-23.
[18]	Khot, S. and Naor, A. (2012) Grothendieck-Type Inequalities in Combinatorial Optimization. Communications on Pure and Applied Mathematics, 65, 992-1035. https://doi.org/10.1002/cpa.21398
[19]	Choulakian, V. and Abou-Samra, G. (2020) Mean Absolute Deviations about the Mean, Cut Norm and Taxicab Correspondence Analysis. Open Journal of Statistics, 10, 97-112. https://doi.org/10.4236/ojs.2020.101008
[20]	Diaconis, P. (1988) Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Hayward, CA.
[21]	Murphy, T.B. and Martin, D. (2003) Mixtures of Distance-Based Models for Ranking Data. Computational Statistics and Data Analysis, 41, 645-655. https://doi.org/10.1016/S0167-9473(02)00165-2

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133