In the race to harvest the power of Artificial Intelligence (AI) in virtually every field, researchers and practitioners are faced with an ever increasing supply of novel tools that have not undergone domain-specific tests. This paper informs the methodological choices of researchers in economics and finance by comparing the performance of three Natural Language Processing (NLP) methods at an important task, namely using text analysis for portfolio diversification. Portfolio management can benefit from analysing text data in the form of company descriptions, since the returns of companies with similar descriptions tend to be correlated and consequently, portfolios of dissimilar companies should have lower risk. In this paper, three NLP methods are used to construct so-called minimum semantic concentration portfolios, which are designed to leverage the semantic diversity of the business descriptions of constituent companies to reduce portfolio volatility. Two widely used large language models (BERT and GPT) and an alternative AI solution inspired by neuroscience, called semantic fingerprinting are put to the test of comparing meaningfully the business descriptions of the S&P 500 and respectively Europe 600 constituents in order to derive actionable investment insights. The results show that all three NLP methods are able to extract relevant information from company descriptions: the minimum semantic concentration portfolios have significantly lower volatility than portfolios constructed with randomly chosen weights. While no NLP method is able to claim absolute superiority over its peers, semantic fingerprinting appears the most consistent and robust performer, since BERT and GPT demonstrate not only their potential but also a caveat, as their performances are volatile even across very similar tasks.
References
[1]
Alexeev, V. V., & Tapon, F. (2013). Equity Portfolio Diversification: How Many Stocks Are Enough? Evidence from Five Developed Markets. SSRNElectronicJournal. https://doi.org/10.2139/ssrn.2182295
[2]
Ash, E., & Hansen, S. (2023). Text Algorithms in Economics. AnnualReviewofEconomics,15, 659-688. https://doi.org/10.1146/annurev-economics-082222-074352
[3]
Chen, L., Zaharia, M., & Zou, J. (2023). FrugalGPT:Howto Use LargeLanguageModels While Reducing Cost and Improving Performance. Working Paper, Under Review as a Conference Paper at ICLR 2024.
[4]
De Sousa Webber, F. (2016). Semantic Folding Theory and Its Applications in Semantic Fingerprinting. White Paper, arXiv: 1511.08855.
[5]
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171-4186). Association for Computational Linguistics.
[6]
Dodge, J., Prewitt, T., Tachet des Combes, R., Odmark, E., Schwartz, R., Strubell, E. et al. (2022). Measuring the Carbon Intensity of AI in Cloud Instances. In 2022ACMConferenceonFairness,Accountability,andTransparency (pp. 1877-1894). Association for Computing Machinery. https://doi.org/10.1145/3531146.3533234
[7]
Hawkins, J. (2021). AThousandBrains:ANewTheoryofIntelligence. Hachette.
[8]
Hawkins, J., Ahmad, S., & Cui, Y. (2017). A Theory of How Columns in the Neocortex Enable Learning the Structure of the World. FrontiersinNeuralCircuits,11, Article 81. https://doi.org/10.3389/fncir.2017.00081
[9]
Ibriyamova, F., Kogan, S., Salganik-Shoshan, G., & Stolin, D. (2017). Using Semantic Fingerprinting in Finance. AppliedEconomics,49, 2719-2735. https://doi.org/10.1080/00036846.2016.1245844
[10]
Ibriyamova, F., Kogan, S., Salganik-Shoshan, G., & Stolin, D. (2019). Predicting Stock Return Correlations with Brief Company Descriptions. AppliedEconomics,51, 88-102. https://doi.org/10.1080/00036846.2018.1494377
[11]
Pungulescu, C. (2022a). Bilateral Home Bias: A New Measure of Proximity. JournalofNeuroscience,Psychology,andEconomics,15, 163-177. https://doi.org/10.1037/npe0000162
[12]
Pungulescu, C. (2022b). Using Textual Analysis to Diversify Portfolios. TheEconomicsandFinanceLetters,9, 87-98. https://doi.org/10.18488/29.v9i1.3028
[13]
Pungulescu, C. (2024). Predicting Return Correlations inEuropean Stocks Using NLP. Working Paper.
[14]
Pungulescu, C., & Stolin, D. (2023). Measuring Document Similarity: A Comparative Analysis of NLP Methods in Finance. Mendeley Data. https://doi.org/10.17632/kmb89v8yhz.1