12 : Analyzing the Impact of Soft Moderation Interventions on Twitter

 

Background/Context

In March 2020, Twitter introduced new warning labels that will provide additional context and information on disputed or misleading information as a form of soft moderation intervention. These labels were added to address content that goes directly against guidance on COVID-19 from authoritative sources of global and local public health information. Now the range of labels and warning messages have been expanded to provide additional explanations on tweets which might mislead or confuse people. One such example:


Twitter classifies tweets into three categories:


  1. Misleading Information - statements or assertions that have been confirmed to be false or misleading by subject-matter experts.

  2. Disputed claims - statements or assertions in which the accuracy, truthfulness, or credibility of the claim is contest or unknown.

  3. Unverified claims - information (which could be true or false) that is unconfirmed at the time it is shared.

The way twitter responds to these claims are as follows:

 

Class of Tweet * 

Action visible to users

Action visible to the Tweeter

Description of class

Misleading Information

Label

Removal

Confirmed by experts

Disputed Claim

Label

Warning

Accuracy of the claim is contested

Unverified Claim

No action

No action

Tweets reported by users BUT pending review by Twitter

 

Moderate

Severe

 


Note: the classes are internally used by Twitter. They are not made visible explicitly to the users reading the tweet.

Related Work


There has been recent interest in investigating the effects of soft intervention methods on social media. Ling et al, 2021 (https://arxiv.org/abs/2201.07726) have studied the effect of warning labels on TikTok videos in the context of COVID-19 misinformation. 

Savvas, 2021 (https://arxiv.org/abs/2101.07183) have explored the effect of warning labels on Twitter users once they receive the label. Their study is focused on investigating the political leaning and content engagement of users interacting with a Tweet labeled with a warning message by Twitter. Interestingly, they find that tweets with warning labels are statistically more likely to get more engagement than tweets without warning labels.


Since warning labels have been introduced very recently by major social media companies, extensive research investigating the effects of these labels, especially in India is scarce. Most of the previous works have focused either on US Elections, Donal Trump or COVID-19 Tweets. This motivates us to investigate the effects of warning labels on Indian users.


Problem Statement


Given that there are over 290 million active twitter users worldwide, knowing the effect of these labels on users’ trust and perception of the tweet and tweeter is imperative.


Our project tries to investigate the effect of warning labels on perceived user trust by exploring the following research questions:

  1. User (Tweeter) Behaviour:

    1. Does the online behaviour of users whose tweets were labelled with Warning Label change before and after the labels were received?

  2. User Perception

    1. Does the presence of a Warning label after a tweet reduce users’ trust in that tweet?

    2. Does hiding the tweeter’s identity for tweets with warning labels impact users’ trust in that tweet?

Data

We first identified tweets with warning labels originating from Indian politicians, influencers and personalities. We searched for news articles between 2020 and 2022 mentioning Twitter warning labels and India. We then curated a list of Indian personalities who received the warning labels over the last two years. An example of such a news articles is below:



The following table shows the list of Twitter users from India who received a warning label in one of their tweets.



We then scraped the full timeline including tweets, timestamps and meta data like likes, retweets, etc. of each user from the above list. Since previous works in investigating warning labels have largely overlooked its effects in India, this data resource is a novel contribution of our project that could help future works investigating such phenomena in India. 


In total, we were able to collect 581,689 tweets for all 10 users mentioned above. 

Methodology

RQ1

To answer RQ1, we identified the tweets which were labeled by Twitter from each user’s timeline. We then calculated various metrics highlighting the engagement and activity of the user. Since this is a naturally temporal data, we were able to split the user timeline in two parts - Before the user received a warning label and After the user received a warning label. This setting is naturally suited for a quasi-experimental Interrupted Time Series Analysis which would help us make causal inferences on the effect of warning labels on the online behaviour of users.


RQ2

To answer RQ2, we relied on traditional social science methods of interviews and surveys. We first conducted a pilot study where we conducted interviews to characterize the general perception, activity and trust in Twitter among people.  We talked to 5 people (Males = 3, Females = 2) with mean age 22.6 and variance 2.5. The following image shows the broad questions we asked in the interview. 

 

Two broad themes emerged from the pilot study:

  • Some people think Twitter labels are biased themselves and follow a certai agenda.

  • When people encounter a tweet with warning label, they are more likely to fact check the tweet.


After the pilot interviews, we had a better understanding of the Twitter landscape in India and its perception among users. To quantitatively study the effect of warning labels on user perception and make causal claims, we conducted a Randomized Controlled Trial for our survey. 


The trial included splitting participants randomly in a Control (CG) and a Treament/Experiment Group (EG). The participants are displayed a variety of tweets that have received warning labels by Twitter in reality. However, CG users are displayed tweets without the actual warning label and EG users are shown the tweets with the warning label. Here, the warning label acts as the Treatment. 

Subsequently, for each tweet, the participants are asked to rate on a Likert-scale how trustworthy they think the content of the tweet is. Additionally, we asked the users to give basic demographic information like age and gender along with filling the Divergent Association Task (DAT). The DAT (https://www.datcreativity.com/) which asks users to submit 10 random words which are as different from each other as possible is a recently proposed task leveraging NLP models like GloVe which was proven to be correlated with creativity. We use the 3 variables - age, gender and DAT score to match users in the CG and EG for better claims and robust analysis.


The image below shows a high level schema of the conducted survey.

We also had to ensure the tweets chosen for the survey are representative of real tweets labeled by Twitter.  We found that tweets which received soft moderation from Twitter mainly fall under the following categories:

  • US Elections victory related

  • US Elections ballot count related

  • COVID vaccine/blame-game related

  • Indian politics

  • Unrelated to either of the categories above


In order to select tweets to be used in the survey, 

  • We wanted to ensure that the tweets used in the survey had varied properties:

    • Soft-moderated or not (some tweets involved in our survey DID not have a warning label attached by Twitter)

    • Extent of information provided by Twitter on the topic of the tweet

      • Corrective labels: had explicit information about credibility, including a corrective statement attached to them

      • Contextual labels: provided more information on a topic for a user to form their own opinion

    • Belonged to different categories

  • We manually went through the tweets in each of the categories and handpicked tweets from each to ensure that they had varied properties


The image below shows examples of Corrective and Conextual labels added by Twitter. 

Note: We sampled non-India specific tweets from the recent ICWSM work by Savvas, 2021 (https://arxiv.org/abs/2101.07183).



Specifically, we chose the following tweets covering a wide variety of domains and labels in ours survey.


 Illustrative samples of the tweets shown to CG and EG are below:

Note - We remove the identity of the user who posted the tweet to remove any confounding bias in user perception that might creep in.


The following graphs show the demographic data for both surveys:

(Treatment group: with label, without identity)

Forms response chart. Question title: Age (in Years). Number of responses: 16 responses.

Forms response chart. Question title: Gender. Number of responses: 16 responses.




(Control group: without label, without identity)

Forms response chart. Question title: Age (in Years). Number of responses: 17 responses.

Forms response chart. Question title: Gender. Number of responses: 17 responses.


After conducting the surveys, to get a deeper understanding of the reasoning behind the trust scores we conducted interviews for a smaller subset of people. We measure the trust using the Likert-scale and understand the reason behind the trust in 3 different settings:
  1. Form 1: Without warning label and without tweeter’s identity
  2. Form 2: With a warning label and without tweeter’s identity
  3. Form 3: With a warning label and with tweeter’s identity
For each setting we conduct 6 interviews of a disjoint set of people to prevent any experimental bias. The tweets used and the demographic information gathered in this iteration of experiments are the same as the survey. 

Observations

RQ1

To investigate the effect of labels on the twitter user, we relied on Interrupted Time Series Analysis. Here, we will focus on a case study highlighting the different effects of warning labels on users of different backgrounds. We focus on:

  • Sambit Patra - BJP politician with 6.1M followers on Twitter

  • Shefali Vaidya - Right wing influencer with 660k followers on Twitter


We split the user timelines in two - tweets before receiving warning label and tweets after receiving the warning label. We limit our analysis to short term effects of warning labels in the scope of this study. We then calculated the sentiment (using VADER) of tweets posted by each user 20 days before and after receiving the warning label on one of their tweets. 


We notice that all users tent to post less positive content in their tweets after receiving warning label. There is a sharp drop in the sentiment immediately after receiving the label for both users. The counterfactual shows the ideal trend of sentiment if the warning label was not received.



Next, we analyze the posting the frequency and the engagement with the user after receiving the label. 

We notice that popular users like Sambit Patra experience a drop in both engagement (likes) and average posting frequency (over 20 days).


On the other hand, relatively smaller users like Shefai Vaidya experience an immediate rise in impressions/engagement and they subsequently increase their posting frequency as well in their new found fame.

To sum up our investigation into RQ1, we make some causal observations that receiving a warning label significantly alters the behaviour and interactions of a user on Twitter. We connect this to The Labeling Theory (Becker, 1963) which elucidates the ways in which the stigmatized (users who received the label) change their behaviour (posting frequency) once labeled by an outside entity (Twitter). This has implications in how social science theories are relevant in studying modern social media behaviour.


RQ2

In order to study the effects of warning labels on user perception and trust, we conducted randomized controlled trials in the form of surveys.

For each of the 6 tweets (refer to the table in Methodology to see the specific tweets) shown to every participant, we asked them to rate their trust in the content of the tweet on a Likert scale.  This gave us a distribution of perceived user trust across the 6 tweets in both CG and EG. 

The following image shows the boxplot of the trust scores given by users to each tweet in both CG and EG.

Some inferences based on the above plot are:

  • Generally, there is large variation of user trust in both the groups.

  • Treatment group consistently has lower scores than Control group, thus highlighting the importance of warning labels in informing users and making them question the content they consume on social media. This has implications in preventing echo chambers and fake news propagation.


In order to disentangle the effect of tweeter’s identity from the effect of warning labels, we conducted unstructured interviews with a small set of people as described in the Methodology section. Some observations we made during the interviews are:
  • Adding a link to a tweet increases user trust
  • Blue tick profile increases trust but the warning label tends to nullify the increase in trust.
  • If tweeter is already known, people are more trusting/skeptical of the tweet content   depending on preconceived notions.
  • Presence of quirky elements and exclamation marks might suggest fake or sensational content.

Conclusion

  • In this project, we aimed to analyze the impact of soft moderation interventions like warning labels on Twitter in the Indian context using a two-pronged approach of investigating both the users consuming the tweet and the users posting the tweet. We observed significant behavioural changes in the users posting the tweet after they receive a label - additionally, this is influenced by their social status and popularity. We then connect this finding with Becker’ Labeling Theory to induce a social scientific approach to our empirical findings.  

  • We then conducted surveys and interviews to gauge user perception and the effect of warning labels on the trust of users who are consuming the tweet. We notice that adding a label does significantly change the user perception and trust in the contents of the tweet, although it my be biased by the user’s own preferences and leaning. 

  • The findings of this project have implications for social media companies and how they regulate content on their platforms. Warning labels can help prevent creation of echo chambers and fake news propagation while making the users more aware of the content they consume online.

Scope for future work

We focused on investigating the effects of warning labels on Twitter. This leaves various avenues for future work:

  • Utilizing Latin Square Design to verify if the order in which the images/stimuli are shown to participants in the survey itself biases the results.

  • Scaling up the study to investigate the effect of language of the tweet in determining the user perception.


Team Members

- Anmol Goel, 2021701045
- Anmol Agarwal, 2019101068
- Aditya Kadam, 2020121009
- Arvindh A, 2019111010
- Pratyush Priyadarshi, 2019101118
- Shrey Gupta, 2019101058
- Triansh Sharma, 2019101006
- Gokul Vamsi, 2019111009

Comments

Popular posts from this blog

23: Understanding the discourses around CAA NRC

Archaeological Data Analysis on Harappan Civilization

14 : Misinformation Spread in Social Networks