Predicting daily stock market direction : NLP-driven approach integrating sentiment analysis and topic modelling : case: Amazon Inc.
Shrestha, Prashant (2025)
Pro gradu -tutkielma
Shrestha, Prashant
2025
School of Business and Management, Kauppatieteet
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe20251228125300
https://urn.fi/URN:NBN:fi-fe20251228125300
Tiivistelmä
The thesis investigates the predictive power of financial news on the daily stock direction of Amazon Inc. (AMZN). While traditional approaches often rely on daily aggregated sentiment, this study evaluates the efficacy of context-aware features such as topic-specific and trading window-specific sentiment features. The study uses Reuters news and the daily stock dataset related to Amazon Inc. from 9 January 2023 to 31 July 2024.
The study employs a hybrid approach combining Latent Dirichlet Allocation (LDA) and sentence-transformer embeddings (all-MiniLM-L6-v2) for topic modelling, alongside FinBERT for sentiment analysis. It compares two Random Forest models: a baseline model based on daily aggregated sentiment and a proposed model incorporating statistically significant lagged topic and window-specific features, which are identified through Granger Causality-based feature screening and further refined using the SelectKBest method.
The results show that the context-aware model significantly outperforms the daily aggregated baseline. The proposed model achieves an accuracy of 62.8% and an AUC of 0.61, compared to 52.6% accuracy and an AUC of 0.51 for the baseline model. These findings indicate that integrating thematic context and publication timing provides a more informative predictive signal compared to broad sentiment aggregation. Overall, the study contributes to the financial forecasting literature by providing empirical evidence that contextual sentiment features enhance stock trend prediction.
The study employs a hybrid approach combining Latent Dirichlet Allocation (LDA) and sentence-transformer embeddings (all-MiniLM-L6-v2) for topic modelling, alongside FinBERT for sentiment analysis. It compares two Random Forest models: a baseline model based on daily aggregated sentiment and a proposed model incorporating statistically significant lagged topic and window-specific features, which are identified through Granger Causality-based feature screening and further refined using the SelectKBest method.
The results show that the context-aware model significantly outperforms the daily aggregated baseline. The proposed model achieves an accuracy of 62.8% and an AUC of 0.61, compared to 52.6% accuracy and an AUC of 0.51 for the baseline model. These findings indicate that integrating thematic context and publication timing provides a more informative predictive signal compared to broad sentiment aggregation. Overall, the study contributes to the financial forecasting literature by providing empirical evidence that contextual sentiment features enhance stock trend prediction.
