Sentiment Analysis with LLMs for Predicting Trends in Bitcoin
DOI:
https://doi.org/10.54097/6t14fr82Keywords:
Large Language Model, Data Science, Sentiment Analysis, FinBERT, GDELT, Bitcoin, Financial AnalysisAbstract
This project uses LLMs to perform sentiment analysis on financial news headlines to predict Bitcoin price trends. First we replicated FinBERT’s performance and retrained it on GDELT subset to improve its accuracy from 64.8% to 73.8%. Next, three sentiment scores were extracted from GDELT news dataset using retranied FinBERT model and the results were aggregated to develop multiple sentiment signals. Then we calculate Bitcoin returns from Bitcoin price dataset and construct multiple return signals. By calculating Pearson correlation coefficient, we find that the continuous sum sigmoid sentiment signal demonstrates the strongest correlation with Bitcoin returns. Based on this finding, we develop several trading strategies. Quantitative analysis shows that the second sentiment based strategy has an average of 20 percentage points higher return than the buy-and- hold strategy for most of the time. Moreover, this strategy still generates positive returns given the overall downward trend and the high volatility of the Bitcoin price. This work contributes to both academic research and practical applications by demonstrating the effectiveness of Large Language Models in enhancing financial market analysis through sentiment based methods.
Downloads
References
[1] Shobayo, O., Adeyemi-Longe, S., Popoola, O. and Ogunleye, B.: Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4, and Logistic Regression: A Data-Driven Approach (2024).
[2] Araci, D.: FinBERT: Financial Sentiment Analysis With Pre-Trained Language Models, arXiv:1908.10063 (2019).
[3] Al-Mansour, B.Y.: Cryptocurrency Market: Behavioral Finance Perspective, The Journal of Asian Finance, Economics and Business, Vol. 7 (2020) No.12, p.159-168.
[4] Wójcik-Czerniawska, A.: Cryptocurrency and Its Influence on Global Financial Markets, Zeszyty Naukowe Wyzszej Szkoly Bankowej w Poznaniu, Vol. 84 (2019) No.1, p.109-120.
[5] Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System (2008).
[6] Urquhart, A.: The Inefficiency of Bitcoin (2016).
[7] Melvin, M. and Yin, X.: Public Information Arrival, Exchange Rate Volatility, and Quote Frequency, The Economic Journal, Vol. 110 (2000) No.465, p.644-661.
[8] Wüthrich, B., Permunetilleke, D., Leung, S., Lam, W., Cho, V. and Zhang, J.: Daily Prediction of Major Stock Indices From Textual WWW Data, HKIE Transactions, Vol. 5 (1998) No.3, p.151-156.
[9] Chan, S.W. and Chong, M.W.: Sentiment Analysis in Financial Texts, Decision Support Systems, Vol. 94 (2017), p.53-64.
[10] Alonso, F. and Sicilia, M.A.: Cryptocurrency Curated News Event Database From GDELT, Research Square (2022).
[11] Lee, H., Choi, Y. and Kwon, Y.: Quantifying Qualitative Insights: Leveraging LLMs to Market Predict (2024).
[12] Raiaan, M.A.K., Mukta, M.S.H., Fatema, K., Fahad, N.M., Sakib, S., Mim, M.M.J., Ahmad, J., Ali, M.E. and Azam, S.: A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges, IEEE Access, Vol. 12 (2024), p.26839-26874.
[13] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N. and Mian, A.: A Comprehensive Overview of Large Language Models (2023).
[14] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L. and Polosukhin, I.: Attention Is All You Need (2017).
[15] Chair, D.: Language Models Are Unsupervised Multitask Learners (2019).
[16] Toraman, C., Yilmaz, E.H., Sahinuc, F. and Ozcelik, O.: Impact of Tokenization on Language Models: An Analysis for Turkish, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 22 (2023) No.4, p.1-21.
[17] Grefenstette, G.: Tokenization, Text, Speech and Language Technology (1999), p.117-133.
[18] Chai, Y., Fang, Y., Peng, Q. and Li, X.: Tokenization Falling Short: On Subword Robustness in Large Language Models (2024), p.1582-1599.
[19] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S. and Drame, M.: Transformers: State-of-the-Art Natural Language Processing (2020).
[20] Wang, Y.-A. and Chen, Y.-N.: What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding, arXiv:2010.04903 (2020).
[21] Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y.N.: Convolutional Sequence to Sequence Learning, Proceedings of the 34th International Conference on Machine Learning, Vol. 70 (2017), p.1243-1252.
[22] Niu, Z., Zhong, G. and Yu, H.: A Review on the Attention Mechanism of Deep Learning, Neurocomputing, Vol. 452 (2021), p.48-62.
[23] Valencia, F., Gómez-Espinosa, A. and Valdés-Aguirre, B.: Price Movement Prediction of Cryptocurrencies Using Sentiment Analysis and Machine Learning, Entropy, Vol. 21 (2019) No.6.
[24] Colianni, S., Rosales, S. and Signorotti, M.: Algorithmic Trading of Cryptocurrency Based on Twitter Sentiment Analysis (2015).
[25] Yahoo Finance: Bitcoin USD (BTC-USD) Historical Data. Available at: https://finance.yahoo.com/
[26] Huang, X., Zhang, W., Tang, X., Zhang, M., Surbiryala, J., Iosifidis, V., Liu, Z. and Zhang, J.: LSTM Based Sentiment Analysis for Cryptocurrency Prediction, Database Systems for Advanced Applications, Vol. 12683 (2021), p.617-621.
[27] Leetaru, K. and Schrodt, P.: GDELT: Global Data on Events, Location and Tone, 1979-2012 (2013).
[28] Ward, M., Beger, A., Cutler, J., Dickenson, M., Dorff, C. and Radford, B.: Comparing GDELT and ICEWS Event Data, Analysis, Vol. 21 (2013), p.267-297.
[29] GDELT Project: GDELT 2.0: Our Global World in Realtime. Available at: https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/ (Accessed 2025-06-24).
[30] CoinMarketCal: CoinMarketCal – Cryptocurrency Calendar. Available at: https://coinmarketcal.com/.
[31] Gadi, M.F.A. and Sicilia, M.: A Sentiment Corpus for the Cryptocurrency Financial Domain: The Cryptolin Corpus, Language Resources and Evaluation, Vol. 59 (2024) No.2, p.871-889.
[32] Module 4: Data-Driven Innovation. Available at: https://innovation.lv/wp-content/uploads/2019/02/4_Module_Eng_pdf.pdf (Accessed 2025-06-24).
[33] Asuero, A.G., Sayago, A. and González, A.G.: The Correlation Coefficient: An Overview, Critical Reviews in Analytical Chemistry, Vol. 36 (2006) No.1, p.41-59.
[34] Chang, P.-C., Liao, T.W., Lin, J.-J. and Fan, C.-Y.: A Dynamic Threshold Decision System for Stock Trading Signal Detection, Applied Soft Computing, Vol. 11 (2011) No.5, p.3998-4010.
[35] Chen, Y. and Hao, Y.: A Novel Framework for Stock Trading Signals Forecasting, Soft Computing, Vol. 24 (2020) No.16, p.12111-12130.
[36] Luo, L. and Chen, X.: Integrating Piecewise Linear Representation and Weighted Support Vector Machine for Stock Trading Signal Prediction, Applied Soft Computing, Vol. 13 (2013) No.2, p.806-816.
[37] Saud, A.S. and Shakya, S.: Technical Indicator Empowered Intelligent Strategies to Predict Stock Trading Signals, Journal of Open Innovation: Technology, Market, and Complexity, Vol. 10 (2024) No.4, p.100398.
[38] ProsusAI: FinBERT: Financial Sentiment Analysis With BERT. Available at: https://github.com/ProsusAI/finBERT.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computer Science and Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








