OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Open Journal of Statistics 2021

Machine Learning Approaches for Classifying the Distribution of Covid-19 Sentiments

DOI: 10.4236/ojs.2021.115037, PP. 620-632

M. Kuyo, S. Mwalili, E. Okang’o

Keywords: Machine Learning, Sentiment Analysis, Natural Language Processing, Covid-19, Naive Bayes, N-Gram

Full-Text Cite this paper Add to My Lib

Abstract:

Previously, rapid disease detection and prevention was difficult. This is because disease modeling and prediction was dependent on a manually obtained dataset that includes use of survey. With the increased use of social media platforms like Twitter, Facebook, Instagram, etc., data mining and sentiment analysis can help avoid diseases. Sentiment analysis is a powerful tool for analyzing people’s perceptions, emotions, value assessments, attitudes, and feelings as expressed in texts. The purpose of this research is to use machine learning techniques to classify and predict the spatial distribution of positive and negative sentiments of Covid-19 pandemic. This study research has employed machine learning to classify spatial distribution of Covid-19 twitter sentiments as positive or negative. The data for this study were geo-tagged tweets concerning COVID-19 which were live streamed using streamR package. The key terms used for streaming the data were: Corona, Covid-19, sanitizer, virus, lockdown, quarantine, and social distance. The classification used Naive Bayes algorithms with ngram approaches. N-Gram model is a probabilistic language model used to predict next item in a sequence in the form (n？-？1) order Markov. It relies on the Markov assumption—the probability of a word depends only on the previous word without looking too far into the past. The steps followed in this research include: cleaning and preprocessing the data, text tokenization using n-gram i.e. 1-gram, 2-gram, and 3-gram, tweets were converted or weighted into a matrix of numeric vectors using Term Frequency Inverse-Document. Also, data were divided 80:20 between train and test data. A confusion matrix was utilized to evaluate the classification accuracy, precision, and recall performance of the various algorithms tested. Prediction was done using the best performing Naive Bayes algorithm. The results of this research showed that under Multinomial Naive Bayes, unigram accuracy was 92.02%, bigram accuracy was 97.37%, and

References

[1]	Samuel, J., Ali, G.G., Rahman, M., Esawi, E. and Samuel, Y. (2020) Covid-19 Public Sentiment Insights and Machine Learning for Tweets Classification. Information, 11, 314. https://doi.org/10.3390/info11060314
[2]	Ivanov, D. (2020) Predicting the Impacts of Epidemic Outbreaks on Global Supply Chains: A Simulation-Based Analysis on the Coronavirus Outbreak (COVID-19/ SARS-CoV-2) Case. Transportation Research Part E: Logistics and Transportation Review, 136, Article ID: 101922. https://doi.org/10.1016/j.tre.2020.101922
[3]	Dicker, R.C., Coronado, F., Koo, D. and Parrish, R.G. (2006) Principles of Epidemiology in Public Health Practice; an Introduction to Applied Epidemiology and Biostatistics.
[4]	Jin, D., Jin, Z., Zhou, J.T. and Szolovits, P. (2019) Is Bert Really Robust? Natural Language Attack on Text Classification and Entailment.
[5]	Mäntylä, M.V., Graziotin, D. and Kuutila, M. (2018) The Evolution of Sentiment Analysis—A Review of Research Topics, Venues, and Top Cited Papers. Computer Science Review, 27, 16-32. https://doi.org/10.1016/j.cosrev.2017.10.002
[6]	Adhikari, N.C.D., Alka, A. and Garg, R. (2017) HPPS: Heart Problem Prediction System Using Machine Learning. CS & IT Conference Proceedings, Vol. 7, 23-37. https://doi.org/10.5121/csit.2017.71803
[7]	Zhao, J., Liu, K. and Xu, L. (2016) Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, Cambridge. https://doi.org/10.1162/COLI_r_00259
[8]	Prabhakar Kaila, D. and Prasad, D.A. (2020) Informational Flow on Twitter—Corona Virus Outbreak-Topic Modelling Approach. International Journal of Advanced Research in Engineering and Technology, 11, 128-134.
[9]	Medford, R.J., Saleh, S.N., Sumarsono, A., Perl, T.M. and Lehmann, C.U. (2020) An “Infodemic”: Leveraging High-Volume Twitter Data to Understand Public Sentiment for the COVID-19 Outbreak. https://doi.org/10.1101/2020.04.03.20052936
[10]	Suppala, K. and Rao, N. (2019) Sentiment Analysis Using Naïve Bayes Classifier. International Journal of Innovative Technology and Exploring Engineering, 8, 264-269.
[11]	Dubey, A.D. (2020) Twitter Sentiment Analysis during COVID19 Outbreak. https://doi.org/10.2139/ssrn.3572023
[12]	Agarwal, A., Xie, B., Vovsha, I., Rambow, O. and Passonneau, R.J. (2011) Sentiment Analysis of Twitter Data. Proceedings of the Workshop on Language in Social Media, Portland, 23 June 2011, 30-38.
[13]	Garreta, R. and Moncecchi, G. (2013) Learning Scikit-Learn: Machine Learning in Python. Packt Publishing Ltd., Birmingham.
[14]	Dey, L., Chakraborty, S., Biswas, A., Bose, B. and Tiwari, S. (2016) Sentiment Analysis of Review Datasets Using Naive Bayes and k-nn Classifier.
[15]	Manning, C.D., Schütze, H. and Raghavan, P. (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511809071
[16]	Hiemstra, D. (2001) Using Language Models for Information Retrieval. Taaluitgeverij Neslia Paniculata, Enschede.
[17]	Browning, M.H., Larson, L.R., Sharaievska, I., Rigolon, A., McAnirlin, O., Mullenbach, L., Alvarez, H.O., et al. (2021) Psychological Impacts from COVID-19 among University Students: Risk Factors across Seven States in the United States. PLoS ONE, 16, e0245327. https://doi.org/10.1371/journal.pone.0245327
[18]	Manish, S. (2020) Sentiment Analysis: An Introduction to Naive Bayes Algorithm.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133