This study explored how news outlets, China Daily (CD), Cable News Network (CNN), and Daily Mail (DM) have reported the COVID-19 pandemic. Mainstream media is a credible communication path to guide public attention on COVID-19. Computational text analysis contributes to understanding media activities about the pandemic and promotes health information communication. The word frequency statistics and lexical diversity highlighted how pandemic reports changed in the early outbreak. A cluster analysis illustrated the frequency and semantic relationship between the highly frequent words from CD, CNN, and DM reports. Sentiment analysis was based on natural language processing when analyzing the sentiment of all headlines and the sentiment of the different words in the headlines. This study also discussed similarities and differences in the coverage by the three different media outlets at various stages of the outbreak. Three media reported comprehensive coverage of the pandemic. Since they are based in different countries, their focus and the numbers of reports are different at different stages. The richness of the vocabulary and the degree of emotion are related to their media attributes. These results can help health departments exchange information, guide accurate public awareness, and eliminate public fears regarding misconceptions about the pandemic.
References
[1]
Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big Data for Natural Language Processing: A Streaming Approach. Knowledge-Based Systems, 79, 36-42. https://doi.org/10.1016/j.knosys.2014.11.007
[2]
Alm, C. O., Roth, D., & Sproat, R. (2005). Emotions from Text: Machine Learning for Text-Based Emotion Prediction. HLT/EMNLP 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Vancouver, 6-8 October 2005, 579-586. https://doi.org/10.3115/1220575.1220648
[3]
Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to Ideologically Diverse News and Opinion on Facebook. Science, 348, 1130-1132. https://doi.org/10.1126/science.aaa1160
[4]
Ball, P., & Maxmen, A. (2020). The Epic Battle against Coronavirus Misinformation and Conspiracy Theories. Nature, 581, 371-374. https://doi.org/10.1038/d41586-020-01452-z
[5]
Ball-Rokeach, S. J. (1985). The Origins of Individual Media-System Dependency: A Sociological Framework. Communication Research, 12, 485-510. https://doi.org/10.1177/009365085012004003
[6]
Ball-Rokeach, S. J. (1998). A Theory of Media Power and a Theory of Media Use: Different Stories, Questions, and Ways of Thinking. Mass Communication and Society, 1, 5-40. https://doi.org/10.1080/15205436.1998.9676398
[7]
Bates, E., Bretherton, I., Snyder, L., Beeghly, M., Shore, C., McNew, S. et al. (1988). From First Words to Grammar: Individual Differences and Dissociable Mechanisms. New York: Cambridge University Press.
[8]
Blom, J. N., & Hansen, K. R. (2015). Click Bait: Forward-Reference as Lure in Online News Headlines. Journal of Pragmatics, 76, 87-100. https://doi.org/10.1016/j.pragma.2014.11.010
[9]
Bode, L., & Vraga, E. K. (2018). See Something, Say Something: Correction of Global Health Misinformation on Social Media. Health Communication, 33, 1131-1140. https://doi.org/10.1080/10410236.2017.1331312
[10]
Luthra, C., & Mittal, D. (2010). Firebug 1.5: Editing, Debugging, and Monitoring Web Pages. Birmingham: Packt Publishing.
[11]
Demszky, D., Garg, N., Voigt, R., Zou, J., Shapiro, J., Gentzkow, M., & Jurafsky, D. (2019). Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings. 2019 Proceedings of NAACL, Minneapolis, June 2019, 2970-3005. https://doi.org/10.18653/v1/N19-1304
[12]
DiMaggio, P. (2015). Adapting Computational Text Analysis to Social Science (and Vice Versa). Big Data & Society, 2, 2053951715602908. https://doi.org/10.1177/2053951715602908
[13]
Donovan, J. (2020). Social-Media Companies Must Flatten the Curve of Misinformation. Nature. https://doi.org/10.1038/d41586-020-01107-z
[14]
Ekman, P., & Friesen, W. V. (1971). Constants across Cultures in the Face and Emotion. Journal of Personality and Social Psychology, 17, 124-129. https://doi.org/10.1037/h0030377
[15]
Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring Lexical Diversity in Narrative Discourse of People with Aphasia. American Journal of Speech-Language Pathology, 22, S397-S408. https://doi.org/10.1044/1058-0360(2013/12-0083)
[16]
Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21, 267-297. https://doi.org/10.1093/pan/mps028
[17]
Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing Emotion in Text-Based Communication. Conference on Human Factors in Computing Systems—Proceedings, San Jose, 28 April-3 May 2007, 929-932. https://doi.org/10.1145/1240624.1240764
[18]
Jang, K., & Baek, Y. M. (2019). When Information from Public Health Officials Is Untrustworthy: The Use of Online News, Interpersonal Networks, and Social Media during the MERS Outbreak in South Korea. Health Communication, 34, 991-998. https://doi.org/10.1080/10410236.2018.1449552
[19]
Johansson, V. (2009). Lexical Diversity and Lexical Density in Speech and Writing: A Developmental Perspective. Working Papers in Linguistics, 53, 61-79.
[20]
Kim, E., & Klinger, R. (2018). A Survey on Sentiment and Emotion Analysis for Computational Literary Studies (pp. 1-26). http://arxiv.org/abs/1808.03137
[21]
Kulshrestha, J., Eslami, M., Messias, J., Zafar, M. B., Ghosh, S., Gummadi, K. P., & Karahalios, K. (2017). Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media. Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW, Portland, 25 February-1 March 2017, 417-432. https://doi.org/10.1145/2998181.2998321
[22]
Lee, J., & Xu, W. (2018). The More Attacks, the More Retweets: Trump’s and Clinton’s Agenda Setting on Twitter. Public Relations Review, 44, 201-213. https://doi.org/10.1016/j.pubrev.2017.10.002
[23]
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A Validation Study of Sophisticated Approaches to Lexical Diversity Assessment. Behavior Research Methods, 42, 381-392. https://doi.org/10.3758/BRM.42.2.381
[24]
McCombs, M., & Shaw, D. (2016). The Agenda-Setting Function of Mass Media. Agenda Setting: Readings on Media, Public Opinion, and Policymaking, 36, 17-26. https://doi.org/10.1086/267990
[25]
Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. (2020). An “Infodemic”: Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak. Open Forum Infectious Diseases, 7, ofaa258. https://doi.org/10.1093/ofid/ofaa258
[26]
Mocanu, D., Rossi, L., Zhang, Q., Karsai, M., & Quattrociocchi, W. (2015). Collective Attention in the Age of (Mis)information. Computers in Human Behavior, 51, 1198-1204. https://doi.org/10.1016/j.chb.2015.01.024
[27]
Nelson, L. K. (2017). Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research, 49, 3-42. https://doi.org/10.1177/0049124117729703
[28]
Oh, H. J., Hove, T., Paek, H. J., Lee, B., Lee, H., Song, S. K., Jurafsky, D. et al. (2020). Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank. Asian Journal of Communication, 17, 1198-1204.
[29]
Ordun, C., Purushotham, S., & Raff, E. (2020). Exploratory Analysis of Covid-19 Tweets Using Topic Modeling, UMAP, and DiGraphs, (March). http://arxiv.org/abs/2005.03082
[30]
Örnebring, H., & Jönsson, A. M. (2004). Tabloid Journalism and the Public Sphere: A Historical Perspective on Tabloid Journalism. Journalism Studies, 5, 283-295. https://doi.org/10.1080/1461670042000246052
[31]
Pang, B., & Lee, L. (2005). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. ACL-05-43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Ann Arbor, June 2005, 115-124. https://doi.org/10.3115/1219840.1219855
[32]
Rameshbhai, C. J., & Paulose, J. (2019). Opinion Mining on Newspaper Headlines Using SVM and NLP. International Journal of Electrical and Computer Engineering, 9, 2152-2163. https://doi.org/10.11591/ijece.v9i3.pp2152-2163
[33]
Scharkow, M. (2013). Thematic Content Analysis Using Supervised Machine Learning: An Empirical Evaluation Using German Online News. Quality & Quantity, 47, 761-773. https://doi.org/10.1007/s11135-011-9545-7
[34]
Seo, M. (2019). Amplifying Panic and Facilitating Prevention: Multifaceted Effects of Traditional and Social Media Use during the 2015 MERS Crisis in South Korea. Journalism and Mass Communication Quarterly, 56, 837-849. https://doi.org/10.1177/1077699019857693
[35]
Shibutani, T. (1966). Improvised News: A Sociological Study of Rumor. Improvised News: A Sociological Study of Rumor. Oxford: Bobbs-Merrill.
[36]
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank. EMNLP 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Washington, USA, Octorber 2003, 1631-1642.
[37]
Turney, P. D., & Pantel, P. (2010). From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141-188. https://doi.org/10.1613/jair.2934
[38]
Tweedie, F. J., & Baayen, R. H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32, 323-352. https://doi.org/10.1023/A:1001749303137
[39]
van der Meer, T. G. L. A., & Jin, Y. (2020). Seeking Formula for Misinformation Treatment in Public Health Crises: The Effects of Corrective Information Type and Source. Health Communication, 35, 560-575. https://doi.org/10.1080/10410236.2019.1573295
[40]
van Eck, N. J., & Waltman, L. (2010). Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping. Scientometrics, 84, 523-538. https://doi.org/10.1007/s11192-009-0146-3
[41]
van Eck, N. J., & Waltman, L. (2013). {VOSviewer} Manual. Leiden: Univeristeit Leiden, (November). http://www.vosviewer.com/documentation/Manual_VOSviewer_1.6.1.pdf
[42]
Yu, G. (2010). Lexical Diversity in Writing and Speaking Task Performances. Applied Linguistics, 31, 236-259. https://doi.org/10.1093/applin/amp024