全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2014 

Exploring Empirical Rank-Frequency Distributions Longitudinally through a Simple Stochastic Process

DOI: 10.1371/journal.pone.0094920

Full-Text   Cite this paper   Add to My Lib

Abstract:

The frequent appearance of empirical rank-frequency laws, such as Zipf’s law, in a wide range of domains reinforces the importance of understanding and modeling these laws and rank-frequency distributions in general. In this spirit, we utilize a simple stochastic cascade process to simulate several empirical rank-frequency distributions longitudinally. We focus especially on limiting the process’s complexity to increase accessibility for non-experts in mathematics. The process provides a good fit for many empirical distributions because the stochastic multiplicative nature of the process leads to an often observed concave rank-frequency distribution (on a log-log scale) and the finiteness of the cascade replicates real-world finite size effects. Furthermore, we show that repeated trials of the process can roughly simulate the longitudinal variation of empirical ranks. However, we find that the empirical variation is often less that the average simulated process variation, likely due to longitudinal dependencies in the empirical datasets. Finally, we discuss the process limitations and practical applications.

References

[1]  Saichev AI, Malevergne Y, Sornette D (2009) Theory of Zipf’s Law and Beyond. Springer.
[2]  Newman MEJ (2005) Power laws, Pareto distributions and Zipf’s law. Contemp Phys 46: 323–351. doi: 10.1080/00107510500052444
[3]  Kilkki K (2007) A practical model for analyzing long tails. First Monday 12.
[4]  Zipf G (1935) The Psychobiology of Language. Boston: Houghton-Mifflin.
[5]  Laherrère J, Sornette D (1998) Stretched exponential distributions in Nature and Economy: “Fat tails” with characteristic scales. Eur Phys J B: 525–539.
[6]  Martinez-Mekler G, Martinez RA, Beltran del Rio M, Mansilla R, Miramontes P, et al. (2009) Universality of Rank-Ordering Distributions in the Arts and Sciences. PLoS One 4(3).
[7]  Hernandez G (2003) Two-dimensional model for binary fragmentation process with random system of forces, random stopping and material resistance. Physica A 323: 1–8. doi: 10.1016/s0378-4371(03)00032-3
[8]  Lloyd CJ, Williams EJ (1988) Recursive splitting of an interval when the proportions are identical and independent random variables. Stoch Process Their Appl 28(1): 111–122. doi: 10.1016/0304-4149(88)90069-5
[9]  Siegel AF, Sugihara G (1983) Moments of Particle Size Distributions under Sequential Breakage with Applications to Species Abundance. J Appl Probab 20(1): 158–164. doi: 10.2307/3213730
[10]  Kolmogorov AN (1941) On the log-normal distribution of particles sizes during break-up process. Dokl. Akad. Nauk SSSR 31 (2): 99–101.
[11]  Egghe L, Waltman L (2011) Relations between the shape of a size-frequency distribution and the shape of a rank-frequency distribution. Inf Process Manag 47(2): 238–245. doi: 10.1016/j.ipm.2010.03.009
[12]  Borgos HG (2000) Partitioning of a Line Segment. In: Stochastic Modeling and Statistical Inference of Geological Fault Populations and Patterns. Norwegian University of Science and Technology.
[13]  Frisch U, Sornette D (1997) Extreme deviations and applications. J Phys I 7(9): 1155–1171. doi: 10.1051/jp1:1997114
[14]  Bertoin J (2006) Random Fragmentation and Coagulation Processes. Cambridge: Cambridge University Press.
[15]  Peltier S, Moreau F (2010) Looking for the Long Tail: Evidence from the French Book Market. Proceedings of 16th ACEI International Conference.
[16]  Box Office Mojo. Yearly Box Office IMDB. Available: http://boxofficemojo.com/yearly/. Accessed 2013 June 1.
[17]  US Census Bureau (2000) Genealogy Data: Frequently Occurring Surnames from Census 2000. Available: http://www.census.gov/genealogy/www/data?/2000surnames/names.zip. Accessed 22 August 2013.
[18]  Bergstra J. Audioscrobbler Data. Available: http://www-etud.iro.umontreal.ca/~bergst?rj/audioscrobbler_data.html. Accessed 2013 August 16.
[19]  University of Massachusetts Amherst. UMASS Trace Repository. Available: http://traces.cs.umass.edu/index.php/Net?work/Network. Accessed 2013 August 16.
[20]  The Association of Magazine Media. Circulation Trends & Data. Available: http://www.magazine.org/insights-resourc?es/research-publications/trends-data/mag?azine-industry-facts-data/circulation-tr?ends. Accessed 2013 August 22.
[21]  Li W, Miramontes P, Cocho G (2010) Fitting Ranked Linguistic Data with Two-Parameter Functions. Entropy 12(7): 1743–1764. doi: 10.3390/e12071743
[22]  Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev Soc Ind Appl Math 51: 661–703. doi: 10.1137/070710111
[23]  Dwyer S (2010) Pandora’s Box Office: 6 Secrets to Avatar’s Success. Available: http://www.filmjunk.com/2010/02/03/pando?ras-box-office-6-secrets-to-avatars-succ?ess/. Accessed 2013 June 28.
[24]  Stumpf MPH, Porter MA (2012) Critical Truths About Power Laws. Science 335: 665–666. doi: 10.1126/science.1216142
[25]  Capocci A, Servedio VDP, Colaiori F, Buriol LS, Donato D, et al. (2006) Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia. Phys Rev E Stat Nonlin Soft Matter Phys 74(3).
[26]  Toivonen R, Onnela JP, Saram?ki J, Hyv?nen J, Kaski K (2006) A model for social networks. Physica A 371(2): 851–860. doi: 10.1016/j.physa.2006.03.050
[27]  Judge G (2012) The shape and interpretation of the long tail in sales-rank relationships: some evidence from US comic book data. University of Portsmouth, Department of Economics.
[28]  Goel S, Broder A, Gabrilovich E, Pang B (2010) Anatomy of the long tail: ordinary people with extraordinary tastes. Proceedings of the third ACM international conference on Web search and data mining.
[29]  Herrada OC (2008) Music Recommendation and Discovery in the Long Tail. Universitat Pompeu Fabra.
[30]  Naumis GG, Cocho G (2007) Tail universalities in rank distributions as an algebraic problem: the beta-like function. Physica A 387(1): 84–96. doi: 10.1016/j.physa.2007.08.002
[31]  Box Office Mojo (2013) Box Office Tracking by Time. Available: http://www.boxofficemojo.com/about/boxof?fice.htm. Accessed 17 September 2013.
[32]  Zink M, Suh K, Gu Y, Kurose J (2008) Watch Global Cache Local: YouTube Network Traces at a Campus Network - Measurements and Implications. Proceedings of SPIE 6818, Multimedia Computing and Networking.
[33]  Glen AG, Leemis LM, Drew JH (1997) A Generalized Univariate Change-of-Variable Transformation Technique. INFORMS J Comput 9(3): 288–295. doi: 10.1287/ijoc.9.3.288
[34]  Rose C, Smith MD (2006) mathStatica: Symbolic Computational Statistics. Proceedings of 17th International Conference on Computational Statistics.
[35]  Glen AG, Leemis LM, Drew JH (2004) Computing the distribution of the product of two continuous random variables. Computational Statistics and Data Analysis 44(3): 451–464. doi: 10.1016/s0167-9473(02)00234-7
[36]  Dettmann CP, Georgiou O (2009) Product of n independent uniform random variables. Stat Probab Lett 79(24): 2501–2503. doi: 10.1016/j.spl.2009.09.004
[37]  Van der Veert AW (1998) Asymptotic statistics. Cambridge, Mass: Cambridge University Press.
[38]  National Institute of Standards and Technology (2013) Digital Library of Mathmatical Functions. Available: http://dlmf.nist.gov/8.2. Accessed 21 August 2013.
[39]  Wolfram (2013) InverseGammaRegularized. Available: http://reference.wolfram.com/mathematica?/ref/InverseGammaRegularized.html. Accessed 2013 August 19.
[40]  Mathworks (2013) Inverse incomplete gamma function. Available: http://www.mathworks.se/help/matlab/ref/?gammaincinv.html. Accessed 2013 August 19.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133