Michelle Michalowski, a student in the Master in Big Data & Business Analytics, explains how she and her teammates created an application that can give sound financial trading advice using sentiment analysis of a single reddit forum.
GameStop, a retail chain that sells computer games, has struggled over years as the gaming industry shifted online and gamer stores started doing worse and worse financially. GameStop’s stock has consistently fallen, especially since the Covid-19 pandemic.
But in early 2021, something very unexpected happened. The price of the stock exploded while hedge funds were betting against it, and all this happened due to a single subreddit (reddit forum) called r/Wallstreetbets. Private investors and reddit users alike pumped money into GameStop shares, driving the value up exponentially.
This is just one example of how the sentiments of a single community can dictate a stock’s performance. Inspired by this event and other similar stories, I had the idea to start a project with some of my classmates in our Natural Language Processing (NLP) class. Together, we spent three weeks working on a tool that is able to analyze the sentiments for different stocks, as well as provide a concrete trading strategy.
Sentiment analysis: how it works
So how does it work? In the first step, we ingested data from r/Wallstreetbets using a reddit API. After that, we extracted the so-called “cashtags” from the text, which show which stock a certain comment is referring to. After some basic data cleaning, we were ready to analyze the sentiment on each stock discussed on the forum.
Here, we used two different approaches: one based on deep learning and one based on a lexicon. The latter is the oldest and simplest method. It relies on a lexicon that assigns each word a polarity score, representing whether it has a positive, neutral, or negative connotation. Here, we used an open-source, rule-based sentiment analysis tool specifically trained using sentiments expressed on social media.
Of course, the language used on r/Wallstreetbets doesn’t correspond to “proper” financial language—it’s mostly slang. To account for that, we enhanced the default lexicon by incorporating a list of 300 slang words and emojis, which significantly increased our performance.
After running our sentiment analysis with both approaches, we gathered a data set of around 40,000 entries identifying different sentiments for the stocks. Continuing from here, we tried to somehow validate how well our approach worked. We did so by implementing a backtesting strategy, which buys a stock if the number of positive sentiments increased compared to the previous day, and sells it if the amount of positive sentiments decreased.
This is what we got:
This project has been a great learning experience, and we felt very supported by our NLP professor every step of the way. If you have any further questions about our project, feel free to reach out!
Born and raised in western Germany but with Romanian roots, Michelle Michalowski studied Economics with a major in Statistics at the University of Mannheim before coming to Madrid to study the Master in Big Data & Business Analytics. Apart from being interested in tech and data, she is also a fitness junkie and a travel lover. Connect with her on Linkedin.