全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

Sentiment Analysis on Social Media for Albanian Language

DOI: 10.4236/oalib.1107514, PP. 1-31

Subject Areas: Artificial Intelligence

Keywords: Sentiment Analysis, Sentiment Lexicon, Text Mining, Machine Learning, Twitter

Full-Text   Cite this paper   Add to My Lib

Abstract

The recent advances in technology and particularly, the rising prominence of social media platforms have made it possible to express our emotions through electronic means, which have led to the creation of large collections of unstructured textual documents. These collections can be saved and potentially studied with many modern technologies like Text Mining, Machine Learning and Natural Language Processing to obtain new knowledge from them. Sentiment Analysis is a field of Natural Language Processing that focuses on extracting sentiment from text. Moreover, as a Text Mining technique expresses the ability to track the subjective opinion of a text produced by an entity. The purpose of this paper is to test and review different approaches in Sentiment Analysis for messages in the Albanian language found on Twitter. Additionally, we compare the results among different methods and note the challenges that arise while finally we suggest future directions for further research. This paper’s research was conducted as follows: the data was pre-processed, before being converted from text to vector representation using a range of feature extraction techniques such as Bag-of-Words, TF-IDF, Word2Vec, and Glove. We study the performance of sentiment classification techniques from three main approaches: traditional machine learning, lexicon-based and deep learning approach. For model evaluation, since they were trained in unbalanced data, we used not only classical evaluation criteria such as Accuracy, Specificity, Precision, and Recall but more appropriate criteria such as F-measure, Balanced Accuracy, and Matthews Correlation Coefficient (MCC). According to all these criteria, our experiments revealed that LSTM based RNN with Glove as a feature extraction technique provides the best results with F-score = 87.8%, followed by Logistic Regression.

Cite this paper

Vasili, R. , Xhina, E. , Ninka, I. and Terpo, D. (2021). Sentiment Analysis on Social Media for Albanian Language. Open Access Library Journal, 8, e7514. doi: http://dx.doi.org/10.4236/oalib.1107514.

References

[1]  Chandler, J. D., Salvador, R. and Kim, Y. (2018) Language, Brand and Speech Acts on Twitter. Journal of Product and Brand Management, 27, 375-384. https://doi.org/10.1108/JPBM-06-2017-1493
[2]  Liu, B. and Zhang, L. (2012) A Survey of Opinion Mining and Sentiment Analysis. In: Mining Text Data, Springer, Boston, 415-463. https://doi.org/10.1007/978-1-4614-3223-4_13
[3]  Arora, M. and Kansal, V. (2019) Character Level Embedding with Deep Convolutional Neural Network for Text Normalization of Unstructured Data for Twitter Sentiment Analysis. Social Network Analysis and Mining, 9, Article No. 12. https://doi.org/10.1007/s13278-019-0557-y
[4]  Goularas, D. and Kamis, S. (2019) Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data. 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), Istanbul, 26-28 August 2019, 12-17. https://doi.org/10.1109/Deep-ML.2019.00011
[5]  Jose, R. and Chooralil, V.S. (2016) Prediction of Election Result by Enhanced Sentiment Analysis on Twitter Data Using Classifier Ensemble Approach. 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), Ernakulam, 16-18 March 2016, 64-67. https://doi.org/10.1109/SAPIENCE.2016.7684133
[6]  Kolchyna, O., Souza, T.T., Treleaven, P. and Aste, T. (2015) Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination. In: Mitra, G. and Yu, X., Eds., Handbook of Sentiment Analysis in Finance, OptiRisk Systems Ltd, Uxbridge, arXiv: 1507.00955.
[7]  Gupta, I. and Joshi, N. (2020) Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic. Journal of Intelligent Systems, 29, 1611-1625. https://doi.org/10.1515/jisys-2019-0106
[8]  Hassonah, M.A., Al-Sayyed, R., Rodan, A., Al-Zoubi, A.M., Aljarah, I. and Faris, H. (2020) An Efficient Hybrid Filter and Evolutionary Wrapper Approach for Sentiment Analysis of Various Topics on Twitter. Knowledge-Based Systems, 192, Article ID: 105353. https://doi.org/10.1016/j.knosys.2019.105353
[9]  Vaitheeswaran, G. and Arockiam, L. (2016) Hybrid Based Approach to Enhance the Accuracy of Sentiment Analysis on Tweets. International Journal of Computer Science & Engineering Technology, 6, 185-190.
[10]  Pak, A. and Paroubek, P. (2010) Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of The International Conference on Language Resources and Evaluation Conference, Malta, 17-23 May 2010, 1320-1326.
[11]  Alahmary, R.M., Al-Dossari, H. and Emam, A.Z. (2019) Sentiment Analysis of Saudi Dialect Using Deep Learning Techniques. 2019 International Conference on Electronics, Information, and Communication (ICEIC), Auckland, 22-25 January 2019, 1-6. https://doi.org/10.23919/ELINFOCOM.2019.8706408
[12]  Brum, H., Araújo, F. and Kepler, F. (2016) Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora. International Conference on Computational Processing of the Portuguese Language (PROPOR 2016), Tomar, 13-15 July, 134-138. https://doi.org/10.1007/978-3-319-41552-9_14
[13]  Duwairi, R., Ahmed, N.A. and Al-Rifai, S.Y. (2015) Detecting Sentiment Embedded in Arabic Social Media—A Lexicon-Based Approach. Journal of Intelligent & Fuzzy Systems, 29, 107-117. https://doi.org/10.3233/IFS-151574
[14]  Madan, A. and Ghose, U. (2021) Sentiment Analysis for Twitter Data in the Hindi Language. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, 28-29 January 2021, 784-789. https://doi.org/10.1109/Confluence51648.2021.9377142
[15]  Ochoa-Luna, J. and Ari, D. (2019) Word Embeddings and Deep Learning for Spanish Twitter Sentiment Analysis. Annual International Symposium on Information Management and Big Data, Lima, 3-5 September 2018, 19-31. https://doi.org/10.1007/978-3-030-11680-4_4
[16]  Soumya, S. and Pramod, K.V. (2020) Sentiment Analysis of Malayalam Tweets Using Machine Learning Techniques. ICT Express, 6, 300-305. https://doi.org/10.1016/j.icte.2020.04.003
[17]  Skenduli, M.P., Biba, M., Loglisci, C., Ceci, M. and Malerba, D. (2018) User-Emotion Detection Through Sentence-Based Classification Using Deep Learning: A Case-Study with Microblogs in Albanian. International Symposium on Methodologies for Intelligent Systems, Limassol, 29-31 October 2018, 258-267. https://doi.org/10.1007/978-3-030-01851-1_25
[18]  Mozetic, I., Grcar, M. and Smailovic, J. (2016) Multilingual Twitter Sentiment Classification: The Role of Human Annotators. PLoS ONE, 11, e0155036. https://doi.org/10.1371/journal.pone.0155036
[19]  Chen, Y. and Skiena, S. (2014) Building Sentiment Lexicons for All Major Languages. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 23-25 June 2014, 383-389. https://doi.org/10.3115/v1/P14-2063
[20]  Liu, B. (2015) Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, New York. https://doi.org/10.1017/CBO9781139084789
[21]  Ravi, K. and Ravi, V. (2015) A Survey on Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications. Knowledge-Based Systems, 89, 14-46. https://doi.org/10.1016/j.knosys.2015.06.015
[22]  Jagtap, V.S. and Pawar, K. (2013) Analysis of Different Approaches to Sentence-Level Sentiment Classification. International Journal of Scientific Engineering and Technology, 2, 164-170.
[23]  Wang, H., Liu, B., Li, C., Yang, Y. and Li, T. (2019) Learning with Noisy Labels for Sentence-Level Sentiment Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language & Proceedings of the 9th International Joint Conference on Natural Language Processing, Hong Kong, November 2019, 6286-6292. https://doi.org/10.18653/v1/D19-1655
[24]  Pang, B. and Lee, L. (2008) 4.1.2 Subjectivity Detection and Opinion Identification. In: de Rijke, M., Liu, Y. and Kelly, D., Eds.,, Opinion Mining and Sentiment Analysis, Now Publishers Inc., Delft, 1-135.
[25]  Balbi, S., Misuraca, M. and Scepi, G. (2018) Combining Different Evaluation Systems on Social Media for Measuring User Satisfaction. Information Processing & Management, 54, 674-685. https://doi.org/10.1016/j.ipm.2018.04.009
[26]  Fronzetti Colladon, A. (2018) The Semantic Brand Score. Journal of Business Research, 88, 150-160. https://doi.org/10.1016/j.jbusres.2018.03.026
[27]  Gloor, P.A. (2017) Sociometrics and Human Relationships: Analyzing Social Networks to Manage Brands, Predict Trends, and Improve Organizational Performance. Emerald Publishing Limited, London. https://doi.org/10.1108/9781787141124
[28]  Jeong, B., Yoon, J. and Lee, J.M. (2019) Social Media Mining for Product Planning: A Product Opportunity Mining Approach Based on Topic Modeling and Sentiment Analysis. International Journal of Information Management, 48, 280-290. https://doi.org/10.1016/j.ijinfomgt.2017.09.009
[29]  Karami, A., Lundy, M., Webb, F. and Dwivedi, Y.K. (2020) Twitter and Research: A Systematic Literature Review through Text Mining. IEEE Access, 8, 67698-67717. https://doi.org/10.1109/ACCESS.2020.2983656
[30]  Medhat, W., Hassan, A. and Korashy, H. (2014) Sentiment Analysis Algorithms and Applications: A Survey. Ain Shams Engineering Journal, 5, 1093-1113. https://doi.org/10.1016/j.asej.2014.04.011
[31]  Shi, Y., Zhu, L., Li, W., Guo, K. and Zheng, Y. (2019) Survey on Classic and Latest Textual Sentiment Analysis Articles and Techniques. International Journal of Information Technology & Decision Making, 18, 1243-1287. https://doi.org/10.1142/S0219622019300015
[32]  Biba, M. and Mane, M. (2014) Sentiment Analysis through Machine Learning: An Experimental Evaluation for Albanian. In: Thampi, S., Abraham, A., Pal, S. and Rodriguez, J., Eds., Recent Advances in Intelligent Informatics, Springer International Publishing, Cham, 195-203. https://doi.org/10.1007/978-3-319-01778-5_20
[33]  Kadriu, A. and Abazi, L. (2017) A Comparison of Algorithms for Text Classification of Albanian News Articles. Entrenova, 3, 62-68.
[34]  Kote, N., Biba, M. and Trandafili, E. (2018) An Experimental Evaluation of Algorithms for Opinion Mining in Multi-Domain Corpus in Albanian. International Symposium on Methodologies for Intelligent Systems, Limassol, 29-31 October 2018, 439-447. https://doi.org/10.1007/978-3-030-01851-1_42
[35]  Trandafili, E., Kote, N. and Biba, M. (2018) Performance Evaluation of Text Categorization Algorithms Using an Albanian Corpus. International Conference on Emerging Internetworking, Data & Web Technologies, Tirana, 15-17 March 2018, 537-547. https://doi.org/10.1007/978-3-319-75928-9_48
[36]  Kadriu, A., Abazi, L. and Abazi, H. (2019) Albanian Text Classification: Bag of Words Model and Word Analogies. Business Systems Research Journal, 10, 74-87. https://doi.org/10.2478/bsrj-2019-0006
[37]  Anxhiu, M. (2019) Language Challenges in Aspect-Based Sentiment Analysis: A Review of Albanian Language. Knowledge—International Journal, 31, 1709-1712. https://ikm.mk/ojs/index.php/KIJ/article/view/1336
[38]  Tankovska, H. (2021) Twitter: Number of Monthly Active Users 2010-2019. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
[39]  Vasili, R., Xhina, E., Ninka, I. and Souliotis, T. (2018) A Comparative Review of Text Mining & Related Technologies. RTA-CSIT, Tirana, November 23-24, 2018, 1-10.
[40]  Wang, S. and Manning, C. D. (2012) Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, 8-14 July 2012, 90-94.
[41]  Selva Birunda, S. and Kanniga Devi, R. (2021) A Review on Word Embedding Techniques for Text Classification. In: Raj, J.S., Iliyasu, A.M., Bestak, R. and Baig, Z.A., Eds., Innovative Data Communication Technologies and Application, Springer, Singapore, 267-281. https://doi.org/10.1007/978-981-15-9651-3_23
[42]  Mikolov, T., Corrado, G., Chen, K. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. Proceedings of the Workshop at ICLR, Scottsdale, 2-4 May 2013, 1-12.
[43]  Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013) Distributed Representations of Words and Phrases and their Compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z. and Weinberger, K., Eds., Advances in Neural Information Processing Systems, Vol. 26, Curran Associates, Inc., Red Hook, 3111-3119.
[44]  Pennington, J., Socher, R. and Manning, C. (2014) GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 25-29 October 2014, 1532-1543. https://doi.org/10.3115/v1/D14-1162
[45]  Shi, T. and Liu, Z. (2014) Linking GloVe with word2vec. https://arxiv.org/abs/1411.5595v2
[46]  Rong, X. (2014) Word2Vec Parameter Learning Explained. arxiv: 1411.2738.
[47]  Agarwal, B. and Mittal, N. (2015) Semantic Orientation-Based Approach for Sentiment Analysis. In: Agarwal, B. and Mittal, N., Eds., Prominent Feature Extraction for Sentiment Analysis, Springer International Publishing, Cham, 77-88. https://doi.org/10.1007/978-3-319-25343-5_6
[48]  Hailong, Z., Wenyan, G. and Bo, J. (2014) Machine Learning and Lexicon-Based Methods for Sentiment Classification: A Survey. 2014 11th Web Information System and Application Conference, Tianjin, 12-14 September 2014, 262-265. https://doi.org/10.1109/WISA.2014.55
[49]  Miller, G.A. (1995) WordNet: A Lexical Database for English. Communications of the ACM, 38, 39-41. https://doi.org/10.1145/219717.219748
[50]  Hu, M. and Liu, B. (2004) Mining and Summarizing Customer Reviews. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Seattle, 22-25 August 2004, 168-177. https://doi.org/10.1145/1014052.1014073
[51]  Park, S. and Kim, Y. (2016) Building Thesaurus Lexicon Using Dictionary-Based Approach for Sentiment Classification. 2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA), Towson, 8-10 June 2016, 39-44. https://doi.org/10.1109/SERA.2016.7516126
[52]  Mitchell, T. (1997) Machine Learning. McGraw Hill, New York.
[53]  Alpaydin, E. (2020) Introduction to Machine Learning. 4th Edition, MIT Press Academic, Cambridge.
[54]  Jurafsky, D. and Martin, J.H. (2019) Logistic Regression. In: Speech and Language Processing, 3rd Edition (Draft), 75-93. https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf
[55]  Jurafsky, D. and Martin, J.H. (2019) Naive Bayes and Sentiment Classification. In Speech and Language Processing, 3rd Edition (Draft), 56-74.
[56]  Sun, L., Fu, S. and Wang, F. (2019) Decision Tree SVM Model with Fisher Feature Selection for Speech Emotion Recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2019, Article No. 2. https://doi.org/10.1186/s13636-018-0145-5
[57]  Jurafsky, D. and Martin, J.H. (2019) Neural Networks and Neural Language Models. In: Speech and Language Processing, 3rd Edition (Draft), 123-142.
[58]  Farzi, R. and Bolandi, V. (2016) Estimation of Organic Facies Using Ensemble Methods in Comparison with Conventional Intelligent Approaches: A Case Study of the South Pars Gas Field, Persian Gulf, Iran. Modeling Earth Systems and Environment, 2, Article No. 105. https://doi.org/10.1007/s40808-016-0165-z
[59]  Vo, Q.-H., Nguyen, H.-T., Le, B. and Nguyen, M.-L. (2017) Multi-channel LSTM-CNN Model for Vietnamese Sentiment Analysis. 2017 9th International Conference on Knowledge and Systems Engineering (KSE), Hue, 19-21 October 2017, 24-29. https://doi.org/10.1109/KSE.2017.8119429
[60]  Jurafsky, D. and Martin, J.H. (2019) Sequence Processing with Recurrent Networks. In: Speech and Language Processing, 3rd Edition (Draft), 169-190.
[61]  Sutskever, I., Martens, J. and Hinton, G.E. (2011) Generating Text with Recurrent Neural Networks. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, 28 June-2 July 2011, 1017-1024.
[62]  Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computing, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
[63]  Ko, C.-R. and Chang, H.-T. (2021) LSTM-Based Sentiment Analysis for Stock Price Forecast. PeerJ Computer Science, 7, e408. https://doi.org/10.7717/peerj-cs.408
[64]  Zhang, H., Gan, W. and Jiang, B. (2014) Machine Learning and Lexicon-Based Methods for Sentiment Classification: A Survey. 11th Web Information System and Application Conference, Tianjin, 12-14 September 2014, 262-265. https://doi.org/10.1109/WISA.2014.55
[65]  Chicco, D. and Jurman, G. (2020) The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics, 21, Article No. 6. https://doi.org/10.1186/s12864-019-6413-7
[66]  Vaghela, V.B. and Jadav, B.M. (2016) Analysis of Various Sentiment Classification Techniques. International Journal of Computer Applications, 140, 22-27. https://doi.org/10.5120/ijca2016909259
[67]  Sadiku, J. and Biba, M. (2012) Automatic Stemming of Albanian through a Rule-Based Approach. Journal of International Scientific Publications: Language, Individual Society, 6, 173-190.
[68]  Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980
[69]  Ma, S., Sun, X., Lin, J. and Ren, X. (2018, July) A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification. Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, 9-19 July 2018, 4251-4257. https://doi.org/10.24963/ijcai.2018/591
[70]  Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011) Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
[71]  Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 15, 1929-1958.

Full-Text


comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413