Sentiment Analysis in NFL Fanbases

Vicente Riquelme
8 min readNov 30, 2020

“Well that was a uh… game? I guess. Offense took a fat dump on the field and my heart.”

“VICTORY THREAD, EVERYONE GET IN HERE!!!”

“It’s just not fun to watch this team. The offense is garbage, especially now that teams have film on Allen. The defense has no motivation to play since anything they do just gets vomited up by our shitty offense.”

“BRONCOS WIN!!! MY DOG HAS NO IDEA WHY I’M SO HAPPY BUT HE’S HAPPY TOO”

These are some of the comments you can find in the subreddit devoted to the Denver Broncos football team. After each game in the season, a reddit post is made to discuss the match. After a victory, reddit commenters are frequently ecstatic, joyously expounding the glory of their team and match. After a loss, however, the comments turn sour, with calls to fire coaches, quarterbacks, and with claims that everything is doomed. And, as you can see from the comments, people are passionate. But how passionate?

There are 12 teams in the NFL that make playoffs every season, and each of them have their own subreddit, with similar post-game threads. While each of them are filled with colorful language and melodramatic claims, the question arises — who’s really the most passionate? Which fans care the most? Who gets the angriest? The happiest? And how much does winning or losing affect a team’s fans? These are the questions we set out to answer two weeks ago.

We began our search for answers with a search on Reddit. More specifically, we used the Reddit API to search the Denver Broncos subreddit for all posts with the text “Post Game Thread” in the title. Luckily, these were chronologically sorted, which meant that we could examine the last ones in the thread to get that team’s most recent matches.

Finding and parsing Post Game Threads with the help of Reddit’s search engine

With Reddit posts collected, we then had to gather some more information about the games themselves. Who won? And when? By scraping the official National Football League’s (NFL) website and using some nifty html parsing software, we collected the dates and got a record of whether that team had won or lost, along with the date of the game. We worried that this step was going to be more difficult than it was, but it turned out the next step was even more difficult — matching the reddit post data with the records from the NFL’s website. Reddit’s API let us access the date that posts had been created, but these posts were not posted at precise, consistent times like the NFL’s game records had. With some dicey date and time juggling, we matched each NFL game’s scores to any post-game thread that was created within 48 hours of the game occurring.

Finally, we had to analyze the Reddit posts for the opinions being expressed in the comments. This is generally known as sentiment analysis. Sentiment analysis works by iterating through words in a sentence before assigning a value representing how “positive”, “negative”, or “neutral” the sentence was. There are two main ways to conduct sentiment analysis. First, one can assign some words a positive or negative value, and look at how many positive versus negative versus unrated words there are in a sentence. This is called polarity-based sentiment analysis. Alternatively, you can use valence-based sentiment analysis, which also accounts for the intensity of the words be used. On top of this, there are varying levels of complexity in analysis algorithms.

In our first stab at sentiment analysis, we used the largely popular nltk.vader library, which was trained on manually rated evaluations of tweets. Although this program did work, it was not very accurate. Reddit comments frequently used metaphors and other idioms, which the analyzer could not pick up on. For example, the two comments “Please help, my qb [quarter back] is dead” and “This team is garbage”, were rated as positive and neutral, respectively.

Fortunately, the field of sentiment analysis is a rapidly growing one, and other, free options are available. We next turned to TextBlob, another library similar to vader. It used a similar technique for word classification, however, and did not fare much better. Vader and TextBlob both use naive bayes classification systems, which are not as advanced as other classification systems out there. This is why, when we eventually found it, we turned to Flair.

Flair is a library that was trained using LSTM neural networks on IBDM’s huge database of movie reviews. Due to the complexity of neural networks, it is able to pick up on far more nuance in language than either vader or TextBlob, and after applying it to reddit comments, the results were immediately apparent. “Please help, my qb is dead” went from positive to negative, and “This team is garbage” went from neutral to very negative.

With a powerful sentiment analyzer at our side, we set to work analyzing reddit posts. After a mountain of debugging and waiting for data to be collected (it took over 3 hours to collect and run analysis on all of the data!), we got some pretty interesting results.

First, we simply plotted all the teams’ positive sentiment over time to get a sense of the data. We see a pretty wide range of sentiment over time for all teams, but the minimum and maximum seems mostly uniform. There is too much data here to inference properly.

Secondly, we could visualize a team’s positive sentiment in post game threads to look at how it changes over time . Here is the Denver Broncos positive sentiments over the past 32 games (data points from the 2018, 2019, and 2020 seasons):

Predictably, when the Broncos win, their fans rejoice and the positive sentiment values spike up. On the contrary, a loss (especially to division rivals, the lowest scores on the graph) their fans are rather bitter and their positive values drop down. To figure out whether certain teams are more positive/negative than others, we graphed five NFL team’s overall average positive sentiment (red), average positive sentiment after a win (orange), and average positive sentiment after a loss (green), and the average positive sentiments of their post game threads (blue) side by side over the past 32 games. For this visual, we intentionally chose five teams with varying levels of success. Over the past three seasons, the Chiefs have been one of the best teams, the Jets one of the worst, and the Browns, Patriots, and Broncos performing in the middle of the pack.

The overall positive sentiment averages make sense: the Chiefs, with the highest, were Super Bowl Champions last year, while the Jets, with the lowest, have been truly terrible the past few years are currently winless this season. It is interesting to look at the average positive sentiments after a win or a loss. For example, the Broncos have the highest positive sentiment after a win, and the lowest negative sentiment after a loss. It is fun to speculate why: are they fair weather fans? A little bit knee jerky? Too emotionally invested? Below is the same information, but displayed as a bar graph.

The final visualization we made is a box-and-whisker plot of the positive sentiments values for a thread’s comments. We did this to see the spread of sentiment in comments — do they all converge around the average for the thread, or do they range greatly from one another? Below, we graphed the positive sentiments of each post game thread’s comments for the five past games for the New England Patriots. We see that the entire distribution of the sentiment shifts depending on whether they won or lost.

Now is a good time to point a couple of observations with this graph and the others. First, all teams respond predictably to wins and losses. When a team loses the fans are much more negative, and when a team wins its fans are more positive. Fans are the most negative in losses in important games (late season, playoff implications) and the most positive when they win these types of games (the highest observed positive value was the Chief’s post super bowl thread). Second, most teams in the NFL seem to have similar positive sentiment averages after wins or losses, but what changes the overall average positive sentiment is the frequency they win games. A team like the Chiefs wins more games and thus has a higher positive sentiment value over all games. The opposite occurs for the Jets, who have similar positive sentiment averages after wins or losses to the Chiefs but lose much more often, so their overall average gets dragged down. Finally, as the box-and-whisker plots show above, fans always have something to complain about. Win or loss, Patriots fans consistently had threads with lots of negative comments.

We were happy with the outcome of this project, but there are a few things we thought would be interesting to look at if we had more time. One thing would be having a more complete dataset with all 32 NFL teams. Sentiment analysis and web parsing are not fast, and even with just five teams we were looking at 100,000+ Reddit comments. Our JSONs took a very long time to generate as a result. With more time/data, I would love to complete analysis on all 32 teams and get definitive results: the most negative team, the most positive team, etc. I also think it would be interesting to analyze games with particularly low or high sentiment scores and figure out if there are any common factors that cause great zeal or intense negativity seen on a few threads. Right now we can speculate a few factors, but I think we could perform actual statistical analysis and generate even more data to get these answers.

This project was completed for our Computational Analysis of Big Data Class. The link to our GitHub code can be found here.

--

--