6 : Analysis of Polarisation during Election

 


 


Problem statement description


From the past six to seven years, Social Media has become increasingly popular in our day to day lives. It provides a forum for every age group and other peer groups where they can communicate with each other irrespective of the distance that separates them. Social media platforms such as twitter are being widely used by a large fraction of our society to express and share their views on various topics. The connectivity which is provided by social media(Twitter, Facebook) helps us to understand different perspectives but also makes us aware about what's happening around the globe. Because of the above reasons, Twitter is used by people to express their views on any particular topics which can impact themselves directly or indirectly. Twitter plays a major role in Indian politics as many political leaders and media houses can express their views by providing them with a large audience. Due to such a large audience, it has a large impact on Indian politics and many times parties and political leaders express their views.

In our project we are analysing the impact of the polarisation on Indian Politics especially during elections. Just to be on the same note let me clarify what polarisation is in our context. Polarisation is when something causes a split and makes the people divide in two groups.In our project, we are using twitter data to find polarising topics and perform various analyses to see how these things impact the political landscape of India. We also analyse how activities related to normal topics are different when compared to activities related to polarising topics.

Related work that they used


We started with understanding polarization in social media network [1] .Effect of polarization on twitter user behavior [2]. We collected tweets over a period of time for some specific politicians and started finding polarized topics that require the user to side exclusively with one position and finding if a user is polarised with respect to that topic[3]. After finding polarized tweets ,we determined if tweet by user is negative or positive with respect to polarized topic using LSTM[4].

References:


[1] Pedro H. Calais Guerra, Wagner Meira Jr,Claire Cardie and Robert Kleinberg “A Measure of Polarization on Social Media Networks Based on Community Boundaries” https://www.cs.cornell.edu/home/cardie/papers/ICWSM13-Polarization.pdf

[2]Yichen Wang, Richard Han, Tamara Lehman, Qin Lv, and Shivakant Mishra “Analysing Behavioural Changes of Twitter Users After Exposure to Misinformation” https://arxiv.org/pdf/2111.00700.pdf

[3] Mauro Coletto, Claudio Lucchese, Salvatore Orlando and Raffaele Perego “Polarized User and Topic Tracking in Twitter”

https://www.researchgate.net/publication/309484467_Polarized_User_and_Topic_Tracking_in_Twitter

[4] Shri Varsheni R “Sentiment Analysis on tweets with LSTM”

https://www.analyticsvidhya.com/blog/2021/12/sentiment-analysis-on-tweets-with-lstm-for-beginners/

Research Questions

 
The following are the research questions which our paper tries to do analysis upon.


RQ1: The trends of analysis.


RQ2: What are the topics about which people generally talked about?

RQ3: Spread of negative & positive hashtags.

RQ4: Co-relation between different layer users.

RQ5: Which topics were most talked about in what time period?

Methodology pipeline

  • In order to analyse the Indian political landscape on twitter we first divided twitter users in three classes. 
    • The first class consisted of 27 users whose followers were greater than 50 lakhs. 
    • The second class consisted of retweeters of the tweets of first class users whose followers were greater than 10 lakhs.
    • The Third class consisted of retweeters of the tweets of first and second class users whose followers were between 75,000 and 1 lakh.
  • We devised a method using which we can tell whether a tweet is for the government or for the opposition. 
  • We have the user tweets on 10 topics such that those were balanced between pro government and pro opposition. 
  • We then analysed topics like vaccines, farm laws, petrol prices, etc using our methodology and determined the extent of polarization of such topics.

  • We then analysed topics like vaccines, farm laws, petrol prices, etc using our methodology and determined the extent of polarization of such topics.

Frequently Used Terms throughout Document(Assumptions):

Class A

List of classified topics leaning towards Pro Ruiling Party

Class B

List of classified topics leaning towards Anti Ruiling Party

Timeline 1

Timeperiod from January 2021 to June 2021.

Timeline 2

Timeperiod from October 2021 to March 2022.

Layer 1

Set of relevant political twitter user having followers more than 5M.

Layer 2

Retweeters having followers more than 1M and are retweeters of Layer 1 user’s tweets.

Layer 3

Retweeters having followers between 75K to 100K and are retweeters of Layer 2 user’s tweets.

 

Experimental design and results


  • Polarising topic v/s Non-polarising topic

       

                    Fig 1: Polarising topic timeline(Vaccine)

   

                    Result: 4705 positive tweets out of 9273


                     Fig 2: Non-polarising timeline(Petrol)


                      Result : 179 positive tweets out of 615


When we compare the results of the above two topics, we can see that the polarising topic has an equal number of positive and negative tweets, whereas the non-polarising topic has more negative tweets, indicating that the majority of people were unhappy with rising gasoline prices, which is why they tweeted negatively. However, people were polarised on the topic of vaccines as different seeders attempted to manipulate other layer voters with their own ideology.

Also, we can observe that the polarising topic has the maximum tweet frequency of 400, whereas the other topic has the highest tweet frequency of 50, indicating that polarised topics are more widely circulated than non-polarized topics.'

  • Response to Positive & Negative hashtag


            Fig 3: Retweets trend of a negative sentiment

  

     

Fig 4 : Likes trend of a negative sentiment


   

        Fig 5 : Likes distribution for Negative Sentiment (Farm Protest)



    Fig 6 : Retweets distribution for Negative Sentiment(Farm Protest)


Fig 3 & Fig 5 represents likes & Fig 4 & Fig 6 represents retweets distribution of negative sentiments topics like vaccine shortage & FarmProtest, the following graphs shows that the negative sentiment is trending for whole year & highest likes goes for 50000 & highest retweets goes for 14000.

 

   

           Fig 7 : Retweets for positive sentiment(One year vaccine)


            Fig 8 : like distribution for positive sentiment(One year Vaccine)

   

        Fig 9 : Retweets for positive sentiment(Farm Law Repealed)

   

       

            Fig 10: Likes for positive sentiment(Farm Law Repealed)


Fig 8, Fig 10, represents likes distribution & Fig 7,Fig 9 represents retweets distribution which are #FarmLawsRepealed & #OneYearVaccine, as we can see the likes & retweets of the negative hashtags has spread more over the period of 4 months than positive hashtags like, which generally is tagged & tweeted about only for one month period & not even active on the social network giant for the rest of the months, which goes on to show that people no matter how important/unimportant the topic is they will talk about it if the sentiment is negative. Although, the likes & retweets of the positive topic is much more than negative topics but because of the widespread of the negative topics over the months the number of responses to positive topics hardly add anything to it.


  Fig 11: Tweet Frequency for various Media Houses


Fig 11 represents the distribution of tweeting frequency for 6 Media houses.Over first 6 months Aajtak has tweeted most number of tweets and times of india has tweeted least number of tweets.For last 6 months TimesNow has tweeted most number of tweets and times of india has tweeted least number of tweets.The spike in graph represents occurrence of some events and that has helped us in selecting the relevant topics.For example,increase in number of tweets from april to may represents occurrence of multiple events such as fire broke at bharucha Gujarat and spike in October represents Lakhimpur Case.



  Fig 12: Venn diagram of Tweet topics


Fig 12 shows venn diagram of topics for tweets from various media houses over a period of 12 months.There are 442 topics on which all media houses have tweeted.There are 10744 topics for which only one media house has tweeted.These topics have been further considered by us for our analysis.We removed some topics which were not relevant such as russia-ukraine,cricket related topics,Tesla,Winter Olympics etc.We tried to remove only those topic which are not at all relevant for our analysis.





    Fig 13 - layer2_ClassA_first : 13                           Fig. 14 - layer_2_class_A_sec: 7


In class A we consider the topics that constitute of pro-rulling party like modi,yogi, amitshah etc. and in class B we considered topics like ‘rahulgandhi’ , ‘priyankagandhi’, basically we have divided our topics in pro-ruling & pro-oppostion topics & named them as Class A & Class B.

Fig 13 shows all tweets nature of the users towards class A topics & they have followers greater than 1 million. In it red tweet signifies the tweet is favouring those topics and this is a result from the initial 6 months whereas Fig 14 shows the same thing from last 6 months close to elections dates. This is to show how the nature of user tweets changes when elections are nearby.


 

       Fig 15 - layer_2_classB_first: 8                Fig 16 - layer_2_class_B_sec: 8

 

 Fig 15 shows the tweets nature for class B users in which red shows the tweets done for the ruling party by users retweeting on pro-opposition topics in the starting 6 months where at the same time Fig 16 shows the same thing for the last 6 month close to election. This is to show how the nature of user tweets changes when elections are nearby.


       

                   

           Fig 17 - layer_3_class_A_first: 10                 Fig 18 - layer_3_class_A_sec: 16


Fig 17 shows the tweets nature of layer-3 users which came under class-A. Layer 3 consists of users which have mainly followers greater than 75K to 1L and most of the times these are the retweeters of layer-2 user tweets. It shows how the layer-3 user tweets nature changes during election time as compared to normal time like in Fig-17 red shows the tweets of positive nature done by ruling party users in starting 6 month and Fig-18 shows the same time done during last 6 months just before the election.

     

    Fig 19 - layer_3_class_B_first: 21            Fig 20 - layer_3_class_B_sec: 30


Fig 19 shows the tweets nature done by users on class B Topics in which mainly consists of topics favouring the opposition . In it red shows the tweets of positive nature starting 6 months whereas Fig 20 shows the same thing for the last 6 months close to election. It is mainly shown to compare how the nature of tweets done by particular layers of user changes close to election as compared to normal time.


            Fig 21 - Representation of Polarized Account on Level 2 and 3


This Figure represents the result of analysis we have performed starting from level 1(marked in yellow boxes),Polarized account are displayed at Level2 and Level 3.At level 1 there are accounts with followers more than 5M considered as top seeders and are mainly used for our analysis.At level 2 we have identified the polarized account which is CNBC-TV18.In Timeline 1 the tweets from this account for Class A were positive but in Timeline 2 the tweets were not positive for Class A.
For level 3 Accounts (marked in Blue box),the tweets from these accounts were neutral with respect to Class B topics in Timeline 1.In Timeline 2 the tweets from these accounts were highly positive towards Class B topics.

Media-houses show 70% alignment toward class A in first half of timeline but in the second half that alignment reduced to 30%. For layer 3, it shows positive or neutral ( arround 63% and 65%) for Class A in both the timeline whereas it show neutral/positive toward(45 %) Class B in the first half and highly positive in second half( 93 %).

Ablations - what we tried initially and what didn't work


Initially there was a lot of confusion in understanding the project topic then we searched about it, also took help from TA. We start thinking about which approach will be best to get to know about change in polarisation. These are the methods that we tried but we failed
  • LDA : LDA is a topic modelling tool, it extracts words from given sentences and categorises them into groups. Each word has some probability of belonging to some group. Number of groups will be provided by us and LDA will try to make that many categories. We want to extract topics that are more important to our experiment and filter out those topics which are not relevant. This method fails because we need to decide how many categories are required and we were not able to map back the words from which document it came from, also not giving better results.
  • N-gram: LDA that we were thinking of implementing that was not giving accurate results that we were expecting so we tried N-gram. N-gram club words by their semantics, like chilled water, both words are related and may come up many times in our text data so N-gram groups them. But it also failed because it was not giving accuracy that we were expecting and as we were increasing the value of N it was taking exponential space and already we had around 1.3 M tweets, it was very hard to compute as we did not have a high computing power CPU.

As both ways of finding popular topics failed, we decided to do it manually. So we manually picked topics like farm bill, corruption, vaccine, etc, and checked if these topics were discussed throughout the year because these topics are more likely to be polarising and we want to record the behaviour of people to these topics that helps in finding how much polarisation happens. We divided this behaviour of people in two halves, one close to the election and one on normal days and see if one gets diverted to some topic or not.


Deductions/Discussion 


Our first discussion took place for topic selection for this project. Looking at the effect of polarisation on the 2016 US election we decided to do the same thing in the Indian context. The second challenge was how to do this. After a lot of discussion we came up with an idea, now one of the most important things was data collection for getting the desired results. So for that we used twitter as a platform. We then decided to scrape tweets of users present in different levels.We collected tweets over a period of time for some specific user. One of the challenge which we faced was to select the topic for polarisation which we finally concluded through the hashtags of tweets and started finding polarized topics that require the user to side exclusively with one position and finding if a user is polarised with respect to that topic.After finding polarized tweets ,we determined if tweet by user is negative or positive with respect to polarized topic using LSTM. LSTM is special kind of recurrent neural network that is capable of learning long term dependencies in data. This is achieved because the recurring module of the model has a combination of four layers interacting with each other.


Conclusion

  • Fig 1 - 
    • Purpose - Analysis of polarising topics if any between Jan 2021 and March 2022
    • Analysis - 
      • Can be seen prevalent all throughout the year. Has High peaks but does show some decline during later part of the year
  • Fig2 -
    • Purpose - Analysis of non-polarising topics between Jan 2021 and march 2022
    • Analysis -
      • Can be seen prevalent all throughout the year, has high peaks and continues to be in discussion all throughout the year
  • Fig 1 v/s Fig 2 -
    • No of tweets involving a polarising topic like Covid Vaccine is significantly higher than the ones involving a non-polarising topic like petrol price hikes.
  • Fig 3 & Fig 4 -
    • Purpose - analyse the Likes and Retweet Trends of a hashtag with a negative sentiment
    • Analysis -
      • There is activity on theses tweets over a period of 4 months
      • No of likes and retweets for tweets containing only one Hashtag - #VaccineShortage are pretty high reaching upto 14000 retweets and likes - 50,000 in one month.
  • Fig 5 & Fig 6 -
    • Purpose - Analyse Lifespan of Tweets containing hashtags with some negative sentiment
    • Analysis -
      • Prevalent all over the year from Jan 2021 to Dec 2021
      • Multiple peaks during the year followed by sudden troughs denoting the trend of such tweets to be effective for small intervals with maximum impact 
  • Fig 7 & Fig 8 -
    • Purpose - Analyse retweets and likes for tweets with positive sentiments
    • Analysis -
      • Prevalent only for one or two months
      • Only one peak during Jan 2022
  • Fig 9 & Fig 10 -
    • Purpose - Analyse lifespan of tweets with hashtag #FarmLawRepealed
    • Analysis -
      • Prevalent only for two months in a year
      • No of likes and tweets is more than tweets recorded in Fig 5 & Fig 6.
  • Fig 11 -
    • Purpose - Tracking the activity of media houses
    • Analysis -
      • There were times when tweets from media houses were in huge numbers like from March to August 2021
      • Analysis showed that most of theses tweets were based on Vaccination drives and Farm Law Protests
      • The activity slowed drastically during months of Aug to November probably due to news of farm laws being repealed
      • The activity again saw a rise during the time when UP election dates were announced
      • These analysis helped use in selecting the topics for analysing polarisation.
  • Fig 13 & Fig 14 -
    • Purpose - Analyse Layer2 user behaviour before and during UP elections towards ruling party
    • Analysis -
      • Analysing first half (Jan 2021 to October 2021) of tweets from Layer2 users, we came to know that out of 13 users had a positive sentiment towards Class A users.
      • The same declined to 7 users towards second half(October 2021 to March 2022.
  • Fig 15 & Fig 16 -
    • Purpose - Analyse Layer2 user behaviour before and during UP elections towards opposition party
    • Analysis -
      • Analysing first half (Jan 2021 to October 2021) of tweets from Layer2 users, we came to know that out of 8 users had a positive sentiment towards opposition parties.
      • The number remained same towards second half(October 2021 to March 2022) but For Eg - User no 22 changed his/her perception towards anti-ruling party. User no 22 who somewhat had positive vibe towards opposition parties, now in second half seems to be more against opposition parties.
  • Fig 17 & 18 -
    • Purpose - Analyse Layer3 user behaviour before and during UP elections towards ruling party
    • Analysis -
      • Analysing first half (Jan 2021 to October 2021) of tweets from Layer3 users, we came to know that out of 10 users had a positive sentiment towards ruling party.
      • The number increased in second half(October 2021 to March 2022) which means some users from Layer3 did see some shift in their viewpoint.
  • Fig 19 & Fig 20 -
    • Purpose - Analyse Layer3 user behaviour before and during UP elections towards opposition party
    • Analysis -
      • Analysing first half (Jan 2021 to October 2021) of tweets from Layer3 users, we came to know that out of 21 users had a positive sentiment towards Class B users.
      • The number increased to 30 in second half(October 2021 to March 2022) which means some users from Layer3 did see some sort of positive shift towards opposition parties during election time.

Future work


Previous research on opinion polarisation in social networks has focused on themes that are previously known to cause polarisation; as a result, the structural features of polarised social networks are yet unknown.

We compare polarised and non-polarized networks in this paper, and propose a new metric for determining the degree of polarisation.

Instead than inventing issues ourselves, we strive to uncover polarising themes from scraped tweets, which gives us a new perspective to consider.

This permits us to experiment with fresh ideas rather than relying on old ones.

We intend to do analyses on areas other than politics in the future.

When numerous topics overlap in the timeline, we plan to undertake polarisation degree analysis as well.

Team Members:

  • Vishal Pandey
  • Mayank Mukundam
  • Karan Negi
  • Vineet Agrawal
  • Bhupendra Sharma
  • Shravan Sharma
  • Rahul Mishra
  • Yash Vardhan Sharma
  • Mainak Dhara
  • Aditya Sharma 

Visual Presentation:

Video : https://youtu.be/LY6En6SmvTQ


Comments

Popular posts from this blog

23: Understanding the discourses around CAA NRC

Archaeological Data Analysis on Harappan Civilization

14 : Misinformation Spread in Social Networks