Dissecting Trump’s Most Rabid Online Following

By Trevor Martin | Art by yesyesno

Published Mar. 23, two thousand seventeen

Editor’s note: The story below contains two slurs that emerge in the names of subreddits. Links to Reddit may also contain offensive material.

President Donald Trump’s administration, in its turbulent very first months, has drawn fire from both the left and the right, including the ACLU, government ethics accountability groups and former Thicket administration officials. But one group has shown nothing but unbridled enthusiasm for the president’s deeds thus far: the over 380,000 members of r/The_Donald, one of the thousands of comment boards on Reddit, the fifth-most-popular website in the U.S.

The subreddit, where posters refer to President Trump as the “God Emperor” and “daddy,” is arguably the epicenter of Trump fervor on the internet. Its membership has grown steadily since the two thousand sixteen presidential election, however its members were especially active during the campaign. They mobilized to comb through the hacked Democratic National Committee emails published on WikiLeaks and played a large role in spreading information and theories about those emails. More broadly, they waged the “Great Meme War”: an effort to get Trump elected by bombarding the internet with social-media-ready content promoting Trump or bashing Democratic candidate Hillary Clinton. Some of those memes played on Clinton’s campaign gaffes, such as her use of the phrase “basket of deplorables,” while others involved an emerging pro-Trump iconography centered around pics of Pepe the Frog — a cartoon character with a convoluted history that gained especial prominence after it was co-opted by white nationalists as a sort of unofficial mascot. Members of r/The_Donald like to say they “shitposted” Donald Trump into office; regardless of whether the flood of memes swung the election, it did overwhelm the front page of Reddit to such an extent that the site’s CEO rushed to deploy a switch in Reddit’s algorithm that boundaries the influence of any single subreddit. One

What can we say about the animating force behind r/The_Donald? For one, it’s not universal among Trump supporters; almost sixty three million Americans voted for Trump, and the 382,000 members of r/The_Donald represent less than one percent of that. But in the subreddit’s vocal and dedicated membership, you can find an influential strain of Trump boosterism. According to former staffers, the Trump campaign team monitored the subreddit for messages that resonated, and Trump himself participated in an “Ask Me Anything” on r/The_Donald in July. Since the election, the subreddit has continued to serve as a conduit through which fringe conspiracy theories — often embarked on sites like 4chan.org, a freewheeling image-based message board best known for creating memes, posting stolen celebrity nudes and birthing the hacker collective Anonymous — come in a larger online discourse. The most striking example has been “Pizzagate,” the false idea that a pizza salon in Washington, D.C., is the center of a child-trafficking ring involving Clinton campaign manager John Podesta, which prompted a man from North Carolina to “self-investigate” the shop, where he fired a rifle several times and threatened an employee.

r/The_Donald has repeatedly been accused of suggesting a safe harbor where racists and white nationalists can congregate and express their views, much the same way that Trump’s campaign is said to have mobilized and emboldened those same groups. And indeed, r/The_Donald is home to some pretty vile comment threads. The subreddit’s moderators declined to talk to us about their community and accused FiveThirtyEight of being “fake news.” Regardless, we think there’s a way to get at the nature of r/The_Donald that is more rigorous than doing a quick scan of its comments (and certainly more objective than simply soliciting the opinions of the group’s fans and detractors).

We’ve adapted a mechanism that’s used in machine learning research — called latent semantic analysis — to characterize 50,323 active subreddits two based on 1.Four billion comments posted from Jan. 1, 2015, to Dec. 31, 2016, in a way that permits us to quantify how similar in essence one subreddit is to another. At its heart, the analysis is based on commenter overlap: Two subreddits are deemed more similar if many commenters have posted often to both. This also makes it possible to do what we call “subreddit algebra”: adding one subreddit to another and watching if the result resembles some third subreddit, or subtracting out a component of one subreddit’s character and observing what’s left. (There’s a detailed explanation of how this analysis works at the bottom of the article).

Here’s a plain example: Using our mechanism, you can add the primary subreddit for talking about the NBA (r/nba) to the main subreddit for the state of Minnesota (r/minnesota) and the closest result is r/timberwolves, the subreddit dedicated to Minnesota’s pro basketball team. Similarly, you can take r/nba and subtract r/sports, and the result is r/Sneakers, a subreddit dedicated to the sneaker culture that is a prominent non-sport component of NBA fandom.

This may all seem pretty abstract, but that same algebra can be applied to r/The_Donald. What happens when you break r/The_Donald up into subgroups using subreddit subtraction? What happens when you add unrelated subreddits to r/The_Donald? Before we get into those questions, let’s take a look at the subreddits that are most similar to r/The_Donald, according to our analysis three :

1. r/Conservative 0.741 Discussion of conservative philosophy

Trio. r/HillaryForPrison 0.675 Extreme anti-Clinton commentary

Four. r/uncensorednews 0.661 News with a concentrate on far-right-wing views

Five. r/AskThe_Donald 0.634 Q&A subreddit run by r/The_Donald moderators

r/Conservative and r/AskTrumpSupporters top the list, followed by r/HillaryForPrison, a subreddit that refers to Hillary Clinton by the pronoun “it” and notes in bold on the sidebar that “Putting It behind bars is joy!” After that it’s r/uncensorednews, a subreddit commenced by white nationalist moderators who found the existing, utterly popular r/news subreddit to be too liberal.

So does this mean that users who comment on r/The_Donald comment on r/Conservative more than any other subreddit? No. Eight percent of r/The_Donald’s users have also commented on r/Conservative, which is about one-fifth the size of r/The_Donald, and conversely, fifty one percent of commenters on r/Conservative have commented on r/The_Donald. But the raw number of collective commenters isn’t very informative on its own because, for example, almost every subreddit will have a lot of overlap with big, truly popular subreddits such as r/AskReddit, which has over sixteen million members. Our analysis is a bit more subtle: We weight the overlaps in commenters according to, in essence, how surprising those overlaps are — that is, how much more two subreddits’ user bases overlap than we would expect them to based on chance alone. Since essentially every subreddit overlaps strenuously with super popular groups like r/AskReddit, that result is no longer surprising and gets a lower weight. What rises to the top, then, are the more unlikely results that are characteristic of a specific subreddit rather than those that are common to Reddit as a entire. And by looking at these weighted commenter overlap rankings across thousands of subreddits, we built a profile for each subreddit that helps capture what defines the average commenter on each specific subreddit.

There’s nothing too exposing in that list above — all of those subreddits are explicitly pro-Trump, anti-Clinton or politically conservative. So let’s use subreddit algebra to dissect r/The_Donald into its constituent parts. What happens when you filter out commenters’ general interest in politics? To figure that out, we can subtract r/politics from r/The_Donald. The result most closely matches r/fatpeoplehate, a now-banned subreddit that was dedicated to ridiculing and bullying overweight people.

1. r/fatpeoplehate 0.275 For sharing insults aimed at overweight people (now banned)

Two. r/TheRedPill 0.274 Virulently misogynistic subreddit, nominally faithful to “sexual strategy”

Three. r/Mr_Trump 0.266 Now-dormant subreddit formed during a moderator schism at r/The_Donald

Four. r/coontown 0.266 Open and enthusiastic racism against black people (now banned)

Five. r/4chan 0.253 Screenshots of 4chan.org posts

Subreddit algebra isn’t fairly as elementary as A – B = C. It’s more like A – B is closer to C than anything else, but it’s also pretty similar to D and not far off from E. So when you subtract r/politics from r/The_Donald, you actually get a list of every subreddit in our analysis, ranked in order of their similarity to the result of that subtraction. We’re showcasing just the top five.

And that top five isn’t exactly pretty, however it does support the theory that at least a subset of Trump’s supporters are motivated by racism. The presence of r/fatpeoplehate at the top of the list echoes some of President Trump’s own behavior, including his referring to one thousand nine hundred ninety six Miss Universe winner Alicia Machado as “Miss Piggy” and insulting Rosie O’Donnell about her weight. The second-closest result, r/TheRedPill, describes itself in its sidebar as a place for “discussion of sexual strategy in a culture increasingly lacking a positive identity for men”; named after a scene from the “The Matrix,” the group believes that women run the world and dudes are an oppressed class, and from that belief springs an ideology that has been described as “the heart of modern misogyny.” r/Mr_Trump self-describes as “the #1 Alt-Right, most uncucked subreddit” — referring to a populist white-nationalist movement and an increasingly all-purpose insult meant to denigrate others’ masculinity — and the appallingly named r/coontown is the now-banned but previously central home to unrepentant racism on Reddit. Eventually, coming in at No. Five is r/4chan, a subreddit dedicated to posting screenshots of threads found on 4chan, where many users supported Trump for president and where the /pol/ board in particular has a strongly racist arched.

We dissected r/The_Donald in a bunch of other ways using subreddit algebra. Here are some of the more interesting results:

1. r/CFB 0.269 For college football discussion

Two. r/nfl 0.255 For NFL discussion

Three. r/TrumpMinnesota 0.244 Puny subreddit for Trump supporters in Minnesota

1. r/european 0.781 Now-private subreddit that hosted racist and anti-Semitic commentary on European affairs

Two. r/worldnews 0.768 Main subreddit for discussion of world affairs

Three. r/syriancivilwar 0.688 For discussion of the conflict in Syria

1. r/KotakuInAction 0.676 Main hub of Gamergate discussion on Reddit

Two. r/gaming 0.619 Largest general gaming subreddit

Three. r/Cynicalbrit 0.586 Unofficial fanpage for the internet personality TotalBiscuit

So even adding innocuous subreddits, such as r/europe and r/Games, to r/The_Donald can result in something ugly or hate-based — r/european frequently hosts anti-Semitism and racism, while r/KotakuInAction is Reddit’s main home for the misogynistic Gamergate movement. Which raises a question: Are these hateful communities linked specifically to Trump’s supporters on Reddit, or are they common to politically active Reddit users in general? To get at that question, let’s attempt subtracting r/politics from r/conservative:

1. r/Mary 0.265 Subreddit for devotees of the biblical Mary

Two. r/RCIA 0.264 For those considering converting to Catholicism (RCIA means “rite of Christian initiation for adults”)

Trio. r/ak47 0.241 For discussing the AK-47 rifle

Four. r/TelaIgne 0.240 A space where Catholic redditors plead for other redditors (the name is Latin for “web on fire”)

Five. r/ChristianJewishRoots 0.240 For discussion of the relationship inbetween Christian and Jewish theology

When we do this, we find that the top result is a subreddit dedicated to the glorification of a biblical Mary, and the other related subreddits are similarly focused on Christianity, except for r/ak47, which is dedicated to the famous rifle.

So what about the other two thousand sixteen presidential candidates? How does Trump’s Reddit following compare to that of Hillary Clinton or Democratic primary candidate Bernie Sanders (whose r/SandersForPresident subreddit still has over 215,000 members)? This analysis lets us take any subreddit and say how “Trump-ish” it is vs. how “Clinton-ish” or “Sanders-ish” it is. Here’s a selection of subreddits plotted on a three-way spectrum from r/The_Donald to r/SandersForPresident to r/hillaryclinton.

Subreddits dedicated to politics and news are smack in the middle. r/Feminism is on the Sanders/Clinton side of the spectrum, tho’ slightly closer to Clinton, as is r/TheBluePill, a feminist parody of r/TheRedPill; r/BasicIncome (a subreddit advocating for a universal basic income) is also on the liberal side, however slightly closer to Sanders.

And all of those hate-based subreddits? They’re decidedly in r/The_Donald’s corner.

How does this work?Latent semantic analysis (LSA) — the technology from natural language processing that we’ve adapted for this analysis — is often used to determine how related one book, article or speech is to another. The basic idea is that documents using similar words with similar frequency are most likely closely related. But what about the words themselves? LSA also permits you to assess how similar words are by looking at the other words that showcase up around them. So, for example, two words that might uncommonly showcase up together (say “dog” and “cat”) but often have the same words nearby (such as “pet” and “vet”) are deemed closely related. The way this works is that every word in, say, a book is assigned a value based on its co-occurrence with every other word in that book, and the result is a set of vectors — one for each word — that can be compared numerically. On a very technical level, the way you determine how similar two words like “dog” and “cat” are is by looking at the angle inbetween their two vectors (there’s a visual guide to understanding these concepts below).Vectors are interesting because they can be enormous, multidimensional things that contain a large amount of information — but you can still use them to do grade-school arithmetic. When machine-learning researchers at Google attempted adding word vectors together or subtracting one from another, they discovered semantically meaningful relationships. Four For example, if you take the vector for “king,” subtract the vector for “man” and add the vector for “woman,” the closest result is the vector for “queen.” Slightly more subtle relationships were also exposed: e.g. “Rome” plus “Germany” equals “Berlin.” It turned out to be a very powerful way of analyzing language. Here, we are also using co-occurrence to attempt to uncover the nature of different subreddits and their relationships to one another.

The idea of co-occurrence is clear when we’re talking about words, but what does it mean for subreddits? We found relationships by looking at how many commenters various subreddits have in common — that’s our measure of co-occurrence. Here’s a simplified example of how this works:

Let’s say we want to see how subreddits in the world of health and exercise are related to one another. To do that, we can plot every subreddit in terms of two key subreddits — r/nutrition and r/Outdoors

Let’s embark with r/running. That subreddit has, let’s say, one commenter who has also commented in r/nutrition and three who have also commented in r/Outdoors. So we give it a vector of [1,Trio]

Now let’s add two more subreddits: r/weightlifting and r/Fitness. r/weightlifting has three commenters in common with r/nutrition and one with r/Outdoors, and r/Fitness has four and three, respectively.

Now we can do some addition by combining the vectors. If we add r/weightlifting to r/running, we get a third vector that looks similar to r/Fitness. The angle inbetween the two gives us a measure of just how similar.

So instead of (King – Man) + Woman = Queen, you get Running + Weightlifting = Fitness.

For over 50,000 subreddits that span a massive range of topics, it gets a bit more complicated. Instead of characterizing all of them in terms of just two subreddits — like r/Outdoors and r/nutrition above — we ranked all of the subreddits by the number of unique commenters and then pulled out the Two,133 subreddits whose unique commenter rank was inbetween two hundred and Two,201 (there are some ties). We used this subset of subreddits to characterize all active subreddits. Five We then combined all the resulting subreddit vectors into a big matrix with 50,323 rows and Two,133 columns and converted the raw co-occurrences to positive pointwise mutual information values. Six Similarity inbetween subreddits is based on the cosine similarity of their vectors — a measure of the angle inbetween them. To perform subreddit algebra, subreddit vectors are added and subtracted using standard linear algebra, and then the cosine similarities are calculated to rank subreddits by their similarity to the combination.

Are we sure this is meaningful?

To test our analysis, we looked at some cases of subreddit algebra where the results should be visible — like the example above where adding r/nba to r/minnesota should (and does) yield r/timberwolves as the best fit. Other combinations of a sport and a location similarly result in location-specific discussions of that sport.

We also looked at a test case involving a harder-to-see relationship. If you take the subreddit for managing money and investing, r/personalfinance, and subtract the subreddit for frugality, r/Frugal, the resulting most similar subreddit is r/wallstreetbets, a subreddit about taking extreme risks in the stock market.

The data and code behind this analysis

The Reddit comments data is from a collection hosted on Google’s BigQuery of 1.Four billion comments from January two thousand fifteen to December 2016. Seven The analysis itself was done in R. You can find the code here.

Development by Justin McCraw

Footnotes

The CEO of Reddit also became embroiled in controversy after “trolling the trolls” by taking negative comments posted about himself and exchanging his username out for the usernames of r/The_Donald’s moderators.

We define an active subreddit as one where at least one non-bot user has commented at least ten times.

As determined by their similarity scores, which range from zero (totally dissimilar) to one (exactly the same). The scores are a measure of how close together subreddit vectors are in vector space, which is calculated by measuring the angle inbetween them (the cosine similarity). Higher similarity scores mean vectors are closer together and therefore more similar. For example, the similarity score inbetween r/gaming and r/Games, two very similar subreddits, is 0.79.

Originally these word vectors were generated using a recently developed neural-network-based context model called word2vec (also see algorithms like GloVe), but research has shown that even ordinary co-occurrence models also encode semantic relationships.

We could have performed this characterization using all 50,323 subreddits, but in order to save time and storage space, we excluded the largest and smallest subreddits as they likely provide the least amount of relevant information.

The subreddit vectors are a unique fingerprint of commenter co-occurrence across thousands of subreddits. Also, each subreddit vector is normalized to have a length of one because we are most interested in their directions, not their lengths.

Related movie:

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>