Note: This article is the first in a series on recent research conducted by Walton College students
Social media has engrained itself in our society. Politicians and voters are now able
to engage with each other like never before on websites like Twitter and Facebook.
Historically, voters have had to take politicians on their word and rely on journalists
to report on their legislative actions. Now, the public has a near-complete record
on what politicians say on the internet to their followers and how they act legislatively.
Furthermore, we can analyze the similarities and differences of their words and their
political actions, potentially holding policy makers to an even higher level of accountability.
University of Arkansas seniors Sai Elagandula and Griffin Fulton noticed how politicians were using Twitter to campaign and communicate to the American public throughout recent election cycles. Elagandula and Fulton wanted to analyze how senators’ tweets stacked up with their legislative decisions. To do so they trained a classification model to track deviation between predicted political ideology based on tweets and ideology based on legislative actions.
They used a political ideology tracker published by GovTrack, a website that enables users to track bills and members of Congress, as a basis for comparison. From Jan. 3, 2019, through Jan. 3, 2020, GovTrack tracked 100 U.S. senators and assigned each of them a score from most politically right to least according to their legislative behavior. Topping GovTrack’s list for most politically right was Marsha Blackburn (R – TN) with a score of 1.00; at the other end of the spectrum was Bernie Sanders (I – VT) with a score of 0.0.
Building the Right Data Set
To identify the deviations between online discourse and legislative action, Elagandula
and Fulton trained their model to categorize senators’ tweets as Democratic or Republican
using tweets from the Democratic and Republican party official Twitter accounts as
well as the chairs of the DNC and GOP as a standard for partisan affiliation. They
employed snscrape, an open-source social network scraper, to collect tweets from official party accounts
for the data set.
The DNC data set contained 6,752 tweets and the GOP set contained 6,166 tweets. Elagandula and Fulton assigned each tweet the classification of Democrat or Republican accordingly. They then used BERT (Bidirectional Encoder Representations from Transformers), a transformer-based machine learning technique for language processing pre-training developed by Google, to turn tweets into encoded values. A percentage of words in the tweets were replaced with a masked token, and BERT attempted to predict these words. Elagandula and Fulton then sought to minimize the loss function by fine-tuning weights and biases. In the end, they were able to achieve an accuracy rate of 93%.
They applied the model to tweets those senators made during their time in the 117th Congress, from the start of the legislative session to April 20, 2022, and tracked how many were categorized as the opposite party. There are only 86 senators ranked, rather than the full 100, because tweets were collected from senators in the current congress (117th) and the GovTrack rankings were made at the end of the previous congress (116th). Senators not on either list were thus omitted. Using these statistics, they then calculated a match rate to derive their own ranked list of predicted senator political ideology based on tweets. Elagandula and Fulton then deployed a regression analysis to see how twitter characteristics, margin of victory and years in office to explain senators' match rate.
Fulton was especially intrigued by their regression analysis. The political ideology match rate almost always correlated with their established explanatory variables. They considered each politician’s average number of tweets per week, the number of followers each politician had on their account, their margin victory in the most recent election and the amount of time they had spent in office. They were fascinated by the factors they discovered contributing to the perceived discrepancy between what politicians tweet and the legislative actions they take.
When Elagandula and Fulton compared the list, they created with GovTrack’s rankings, they found themselves surprised at some of the senators’ placement. Some politicians were right on the money. For example, Senator Blackburn’s tweets were almost 100% Republican, and Ted Cruz, who ranked No. 4 for most-right legislative behavior, rose to No. 3 for his predicted ideology based on tweets. What is remarkable, however, is how Senator Sanders' ranking climbed from most left-leaning, No. 86 for political ideology based on legislation, to No. 61 based on his tweets. The researchers explained this is largely due to his discourse on Twitter about domestic policy issues, which are typically discussed by Republican Senators and traditionally resonate with the right, according to Elagandula.
Trust the Tweets
This research may be encouraging for younger people. Roughly 37% of those who identify
as Democrat on Twitter are between the ages of 18 and 29, and 22% of Republican users
also fall in that demographic, according to a 2020 Pew Research Center study. As the world moves online, these numbers are growing, and political campaigning
has largely transitioned to targeted ads on social media. Twitter is currently the
main platform policy makers are using to communicate to voters, Elagandula and Fulton
said, which is why the public should be wary of what they read online when it pertains
By tracking and recording what politicians say on Twitter, the public can better hold policy makers accountable. The discrepancy between what politicians say and do can now be explained, or at least monitored, by research using models and machine-based learning. Elagandula and Fulton’s project proves how accurate a text classification model can be, especially when compared with GovTrack’s political ideology chart. However, the range of political affiliation in the U.S. did create limiting factors.
Although Elagandula and Fulton had a short time frame and a narrow scope, they encourage others to further train their model to identify ideological subgroups. Politicians and their tweets could be considered alt-left or alt-right, but their current model, which used the official party accounts as a foundation for categorizing tweets as right or left, would not be able to identify them as such. The open-source, online community has proven to be useful and many tools for creating machine-based learning are free on the internet, according to Elagandula and Fulton. All of the tools they used when creating their model were found for free on websites like GitHub or snscrape. The senior’s project and methods can be viewed on their own GitHub portfolio. The accessibility of tools such as these speaks to the promise of continuing their research or creating new projects.
If researchers monitor the words and actions of politicians, transparency, accountability and trust between the government and the public will all increase naturally. They encourage Twitter users to take what they see on the platform with a grain of salt and believe their project can be a helpful tool when deciding who to trust and how to cast your ballot.