Research

My research is centered on the online information ecosystem. My work spans Twitter disinformation campaigns by the Internet Research Agency, the YouTube recommendation algorithm, and the online conversation on presidential debates.

Research

Articles

SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases

With Fuqing Wu, Amy Xiao, Jianbo Zhang, Katya Moniz, Noriko Endo, Frederica Armas, Richard Bonneau, Mary Bushman, Peter R. Chai, Claire Duvallet, Timothy B. Erickson, Katelyn Foppe, Newsha Ghaeli, Xiaoqiong Gu, William P. Hanage, Katherine H. Huang, Wei Lin Lee, Mariana Matus, Kyle A. MacElroy, Jonathan Nagler, Steven T. Rhode, Mauricio Santillana, Joshua A. Tucker, Stefan Wuertz, Shijie Zhao, Janelle Thompson, and Eric J. Alm

June 23, 2020

Current estimates of COVID-19 prevalence are largely based on symptomatic, clinically diagnosed cases. The existence of a large number of undiagnosed infections hampers population-wide investigation of viral circulation. Here, we use longitudinal wastewater analysis to track SARS-CoV-2 dynamics in wastewater at a major urban wastewater treatment facility in Massachusetts, between early January and May 2020. SARS-CoV-2 was first detected in wastewater on March 3. Viral titers in wastewater increased exponentially from mid-March to mid-April, after which they began to decline. Viral titers in wastewater correlated with clinically diagnosed new COVID-19 cases, with the trends appearing 4-10 days earlier in wastewater than in clinical data. We inferred viral shedding dynamics by modeling wastewater viral titers as a convolution of back-dated new clinical cases with the viral shedding function of an individual. The inferred viral shedding function showed an early peak, likely before symptom onset and clinical diagnosis, consistent with emerging clinical and experimental evidence. Finally, we found that wastewater viral titers at the neighborhood level correlate better with demographic variables than with population size. This work suggests that longitudinal wastewater analysis can be used to identify trends in disease transmission in advance of clinical case reporting, and may shed light on infection characteristics that are difficult to capture in clinical investigations, such as early viral shedding dynamics.

Full Article | The New York Times

Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube during the 2016 U.S. Presidential Election

With Yevgeniy Golovchenko, Cody Buntain, Gregory Eady, and Joshua A. Tucker

April 19, 2020

This paper investigates online propaganda strategies of the Internet Research Agency (IRA)—Russian “trolls”—during the 2016 U.S. presidential election. We assess claims that the IRA sought either to (1) support Donald Trump or (2) sow discord among the U.S. public by analyzing hyperlinks contained in 108,781 IRA tweets. Our results show that although IRA accounts promoted links to both sides of the ideological spectrum, “conservative” trolls were more active than “liberal” ones. The IRA also shared content across social media platforms, particularly YouTube—the second-most linked destination among IRA tweets. Although overall news content shared by trolls leaned moderate to conservative, we find troll accounts on both sides of the ideological spectrum, and these accounts maintain their political alignment. Links to YouTube videos were decidedly conservative, however. While mixed, this evidence is consistent with the IRA’s supporting the Republican campaign, but the IRA’s strategy was multifaceted, with an ideological division of labor among accounts. We contextualize these results as consistent with a pre-propaganda strategy. This work demonstrates the need to view political communication in the context of the broader media ecology, as governments exploit the interconnected information ecosystem to pursue covert propaganda strategies.

Full Article | Techstream | Medium

Data Reports

Debate Twitter: Mapping User Reactions to the 2020 Democratic Presidential Primary Debates

With Zhanna Terechshenko, Niklas Loynes, Tom Paskhalis, and Jonathan Nagler

March 3, 2020

We analyzed 11,286,346 tweets collected over the course of the first nine debates, which spanned across 11 nights from June 26, 2019 to February 19, 2020. We found that civil rights and healthcare were particularly popular policy issues amongst tweeters. Conservatives were more likely to tweet about immigration, and the economy, while liberals were more likely to tweet about civil rights, education, and the environment. Read the full report below.

Full Report | The Washington Post

Public Writing

Twitter put warning labels on hundreds of thousands of tweets. Our research examined which worked best.

With Zeve Sanderson, Jonathan Nagler, Richard Bonneau, and Joshua Tucker | December 9, 2020

Methods Supplement | Dataset

How Trump impacts harmful Twitter speech: A case study in three tweets

With Zeve Sanderson | October 22, 2020

Biden and Sanders are debating tonight. What got Twitter users buzzing during past Democratic debates?

With Zhanna Terechshenko, Niklas Loynes, Tom Paskhalis, and Jonathan Nagler | March 15, 2020

Open Source

Check out (or contribute to!) open source projects for collecting, analyzing, and modelling information about the online environment.

Open Source

Author

twitter_elections_public_interest

a dataset of public interest exception tweets by politicians during the 2020 election period

This dataset contains the public interest exception labels for tweets by various politicians and political organizations during the 2020 election period. Tweets were labelled for whether they contained a "soft intervention," a "hard intervention," or "no intervention." For tweets that received an intervention, we report the intervention type, text, and URL.

GitHub | Analysis | Methods Supplement

youtube-data-api

a wrapper for the YouTube Data API

With Leon Yin

As the largest social media platform amongst American adults, YouTube is vital to understanding the online media ecosystem. This software package makes accessing YouTube data easier and faster with just a few lines of code.

PyPI | GitHub | Jupyter Notebook

Contributor

smaberta

a wrapper for huggingface transformer libraries

By Vishakh Padmakumar and Zhanna Terechshenko

Smaberta is a python wrapper for interacting with huggingface transformer models. Smaberta makes it easier to train, evaluate, predict, and finetune cutting-edge language models based on transformers.

PyPI | GitHub

urlexpander

a url expansion toolkit

By Leon Yin

urlExpander is inteded to be used by social media researchers who want to do analysis of links. Aside from collecting in-depth user engagement data, these services obfuscate the destination of the shortened URLs. urlExpander was created to address this challenge in a scalable and robust manner. It does so by providing utility functions to convert Tweets into link datasets, filter for known for link-shortening services (like bit.ly), resolve shortened links, and parse the title and meta description from webpages. urlExpander and offers multithreaded url expansion. The multithreaded url expansion was created to overcome the bottleneck of mass link expansion through parallelization, minimizating http requests, caching results, and chunking the input into smaller pieces.

PyPI | GitHub

Tutorials

Text Classification Using a Transformer-Based Model

With Zhanna Terechshenko and Vishakh Padmakumar | December 8, 2020