6 : Analysis of Polarisation during Election
Problem statement description
From
the past six to seven years, Social Media has become increasingly
popular in our day to day lives. It provides a forum for every age group
and other peer groups where they can communicate with each other
irrespective of the distance that separates them. Social media platforms
such as twitter are being widely used by a large fraction of our
society to express and share their views on various topics. The
connectivity which is provided by social media(Twitter, Facebook) helps
us to understand different perspectives but also makes us aware about
what's happening around the globe. Because of the above reasons, Twitter
is used by people to express their views on any particular topics which
can impact themselves directly or indirectly. Twitter plays a major
role in Indian politics as many political leaders and media houses can
express their views by providing them with a large audience. Due to such
a large audience, it has a large impact on Indian politics and many
times parties and political leaders express their views.
In our
project we are analysing the impact of the polarisation on Indian
Politics especially during elections. Just to be on the same note let me
clarify what polarisation is in our context. Polarisation is when
something causes a split and makes the people divide in two groups.In
our project, we are using twitter data to find polarising topics and
perform various analyses to see how these things impact the political
landscape of India. We also analyse how activities related to normal
topics are different when compared to activities related to polarising
topics.
Related work that they used
We started with understanding polarization in social media network [1] .Effect of polarization on twitter user behavior [2]. We collected tweets over a period of time for some specific politicians and started finding polarized topics that require the user to side exclusively with one position and finding if a user is polarised with respect to that topic[3]. After finding polarized tweets ,we determined if tweet by user is negative or positive with respect to polarized topic using LSTM[4].
References:
[1]
Pedro H. Calais Guerra, Wagner Meira Jr,Claire Cardie and Robert
Kleinberg “A Measure of Polarization on Social Media Networks Based on
Community Boundaries” https://www.cs.cornell.edu/home/cardie/papers/ICWSM13-Polarization.pdf
[2]Yichen
Wang, Richard Han, Tamara Lehman, Qin Lv, and Shivakant Mishra
“Analysing Behavioural Changes of Twitter Users After Exposure to
Misinformation” https://arxiv.org/pdf/2111.00700.pdf
[3] Mauro Coletto, Claudio Lucchese, Salvatore Orlando and Raffaele Perego “Polarized User and Topic Tracking in Twitter”
https://www.researchgate.net/publication/309484467_Polarized_User_and_Topic_Tracking_in_Twitter
[4] Shri Varsheni R “Sentiment Analysis on tweets with LSTM”
https://www.analyticsvidhya.com/blog/2021/12/sentiment-analysis-on-tweets-with-lstm-for-beginners/
Research Questions
The following are the research questions which our paper tries to do analysis upon.
RQ1: The trends of analysis.
RQ2: What are the topics about which people generally talked about?
RQ3: Spread of negative & positive hashtags.
RQ4: Co-relation between different layer users.
RQ5: Which topics were most talked about in what time period?
Methodology pipeline
- In order to analyse the Indian political landscape on twitter we first divided twitter users in three classes.
- The first class consisted of 27 users whose followers were greater than 50 lakhs.
- The second class consisted of retweeters of the tweets of first class users whose followers were greater than 10 lakhs.
- The Third class consisted of retweeters of the tweets of first and second class users whose followers were between 75,000 and 1 lakh.
- We devised a method using which we can tell whether a tweet is for the government or for the opposition.
- We have the user tweets on 10 topics such that those were balanced between pro government and pro opposition.
- We then analysed topics like vaccines, farm laws, petrol prices, etc using our methodology and determined the extent of polarization of such topics.
- We
then analysed topics like vaccines, farm laws, petrol prices, etc using
our methodology and determined the extent of polarization of such
topics.
Frequently Used Terms throughout Document(Assumptions):
Experimental design and results
Polarising topic v/s Non-polarising topic
Fig 1: Polarising topic timeline(Vaccine)
Result: 4705 positive tweets out of 9273
Fig 2: Non-polarising timeline(Petrol)
Result : 179 positive tweets out of 615
When
we compare the results of the above two topics, we can see that the
polarising topic has an equal number of positive and negative tweets,
whereas the non-polarising topic has more negative tweets, indicating
that the majority of people were unhappy with rising gasoline prices,
which is why they tweeted negatively. However, people were polarised on
the topic of vaccines as different seeders attempted to manipulate other
layer voters with their own ideology.
Also, we can observe that
the polarising topic has the maximum tweet frequency of 400, whereas the
other topic has the highest tweet frequency of 50, indicating that
polarised topics are more widely circulated than non-polarized topics.'
Response to Positive & Negative hashtag
Fig 3: Retweets trend of a negative sentiment
Fig 4 : Likes trend of a negative sentiment
Fig 5 : Likes distribution for Negative Sentiment (Farm Protest)
Fig 6 : Retweets distribution for Negative Sentiment(Farm Protest)
Fig
3 & Fig 5 represents likes & Fig 4 & Fig 6 represents
retweets distribution of negative sentiments topics like vaccine
shortage & FarmProtest, the following graphs shows that the negative
sentiment is trending for whole year & highest likes goes for 50000
& highest retweets goes for 14000.
Fig 7 : Retweets for positive sentiment(One year vaccine)
Fig 8 : like distribution for positive sentiment(One year Vaccine)
Fig 9 : Retweets for positive sentiment(Farm Law Repealed)
Fig 10: Likes for positive sentiment(Farm Law Repealed)
Fig
8, Fig 10, represents likes distribution & Fig 7,Fig 9 represents
retweets distribution which are #FarmLawsRepealed & #OneYearVaccine,
as we can see the likes & retweets of the negative hashtags has
spread more over the period of 4 months than positive hashtags like,
which generally is tagged & tweeted about only for one month period
& not even active on the social network giant for the rest of the
months, which goes on to show that people no matter how
important/unimportant the topic is they will talk about it if the
sentiment is negative. Although, the likes & retweets of the
positive topic is much more than negative topics but because of the
widespread of the negative topics over the months the number of
responses to positive topics hardly add anything to it.
Fig 11: Tweet Frequency for various Media Houses
Fig 11 represents the distribution of tweeting frequency for 6 Media houses.Over first 6 months Aajtak has tweeted most number of tweets and times of india has tweeted least number of tweets.For last 6 months TimesNow has tweeted most number of tweets and times of india has tweeted least number of tweets.The spike in graph represents occurrence of some events and that has helped us in selecting the relevant topics.For example,increase in number of tweets from april to may represents occurrence of multiple events such as fire broke at bharucha Gujarat and spike in October represents Lakhimpur Case.
Fig 12: Venn diagram of Tweet topics
Fig 12 shows venn diagram of topics for tweets from various media houses over a period of 12 months.There are 442 topics on which all media houses have tweeted.There are 10744 topics for which only one media house has tweeted.These topics have been further considered by us for our analysis.We removed some topics which were not relevant such as russia-ukraine,cricket related topics,Tesla,Winter Olympics etc.We tried to remove only those topic which are not at all relevant for our analysis.
Fig 13 - layer2_ClassA_first : 13 Fig. 14 - layer_2_class_A_sec: 7
In class A we consider the topics that constitute of pro-rulling party like modi,yogi, amitshah etc. and in class B we considered topics like ‘rahulgandhi’ , ‘priyankagandhi’, basically we have divided our topics in pro-ruling & pro-oppostion topics & named them as Class A & Class B.
Fig 13 shows all tweets nature of the users towards class A topics & they have followers greater than 1 million. In it red tweet signifies the tweet is favouring those topics and this is a result from the initial 6 months whereas Fig 14 shows the same thing from last 6 months close to elections dates. This is to show how the nature of user tweets changes when elections are nearby.
Fig 15 - layer_2_classB_first: 8 Fig 16 - layer_2_class_B_sec: 8
Fig
15 shows the tweets nature for class B users in which red shows the
tweets done for the ruling party by users retweeting on pro-opposition
topics in the starting 6 months where at the same time Fig 16 shows the
same thing for the last 6 month close to election. This is to show how
the nature of user tweets changes when elections are nearby.
Fig 17 - layer_3_class_A_first: 10 Fig 18 - layer_3_class_A_sec: 16
Fig 17 shows the tweets nature of layer-3 users which came under class-A. Layer 3 consists of users which have mainly followers greater than 75K to 1L and most of the times these are the retweeters of layer-2 user tweets. It shows how the layer-3 user tweets nature changes during election time as compared to normal time like in Fig-17 red shows the tweets of positive nature done by ruling party users in starting 6 month and Fig-18 shows the same time done during last 6 months just before the election.
Fig 19 - layer_3_class_B_first: 21 Fig 20 - layer_3_class_B_sec: 30
Fig
19 shows the tweets nature done by users on class B Topics in which
mainly consists of topics favouring the opposition . In it red shows the
tweets of positive nature starting 6 months whereas Fig 20 shows the
same thing for the last 6 months close to election. It is mainly shown
to compare how the nature of tweets done by particular layers of user
changes close to election as compared to normal time.
Fig 21 - Representation of Polarized Account on Level 2 and 3
This
Figure represents the result of analysis we have performed starting
from level 1(marked in yellow boxes),Polarized account are displayed at
Level2 and Level 3.At level 1 there are accounts with followers more
than 5M considered as top seeders and are mainly used for our
analysis.At level 2 we have identified the polarized account which is
CNBC-TV18.In Timeline 1 the tweets from this account for Class A were
positive but in Timeline 2 the tweets were not positive for Class A.
For
level 3 Accounts (marked in Blue box),the tweets from these accounts
were neutral with respect to Class B topics in Timeline 1.In Timeline 2
the tweets from these accounts were highly positive towards Class B
topics.
Media-houses show 70% alignment toward class A in first
half of timeline but in the second half that alignment reduced to 30%.
For layer 3, it shows positive or neutral ( arround 63% and 65%) for
Class A in both the timeline whereas it show neutral/positive toward(45
%) Class B in the first half and highly positive in second half( 93 %).
Ablations - what we tried initially and what didn't work
Initially there was a lot of confusion in understanding the project topic then we searched about it, also took help from TA. We start thinking about which approach will be best to get to know about change in polarisation. These are the methods that we tried but we failed
- LDA : LDA is a topic modelling tool, it extracts words from given sentences and categorises them into groups. Each word has some probability of belonging to some group. Number of groups will be provided by us and LDA will try to make that many categories. We want to extract topics that are more important to our experiment and filter out those topics which are not relevant. This method fails because we need to decide how many categories are required and we were not able to map back the words from which document it came from, also not giving better results.
- N-gram:
LDA that we were thinking of implementing that was not giving accurate
results that we were expecting so we tried N-gram. N-gram club words by
their semantics, like chilled water, both words are related and may come
up many times in our text data so N-gram groups them. But it also
failed because it was not giving accuracy that we were expecting and as
we were increasing the value of N it was taking exponential space and
already we had around 1.3 M tweets, it was very hard to compute as we
did not have a high computing power CPU.
As both ways of finding popular topics failed, we decided to do it manually. So we manually picked topics like farm bill, corruption, vaccine, etc, and checked if these topics were discussed throughout the year because these topics are more likely to be polarising and we want to record the behaviour of people to these topics that helps in finding how much polarisation happens. We divided this behaviour of people in two halves, one close to the election and one on normal days and see if one gets diverted to some topic or not.
Deductions/Discussion
Our first discussion took place for topic selection for this project. Looking at the effect of polarisation on the 2016 US election we decided to do the same thing in the Indian context. The second challenge was how to do this. After a lot of discussion we came up with an idea, now one of the most important things was data collection for getting the desired results. So for that we used twitter as a platform. We then decided to scrape tweets of users present in different levels.We collected tweets over a period of time for some specific user. One of the challenge which we faced was to select the topic for polarisation which we finally concluded through the hashtags of tweets and started finding polarized topics that require the user to side exclusively with one position and finding if a user is polarised with respect to that topic.After finding polarized tweets ,we determined if tweet by user is negative or positive with respect to polarized topic using LSTM. LSTM is special kind of recurrent neural network that is capable of learning long term dependencies in data. This is achieved because the recurring module of the model has a combination of four layers interacting with each other.
Conclusion
- Fig 1 -
- Purpose - Analysis of polarising topics if any between Jan 2021 and March 2022
- Analysis -
- Can be seen prevalent all throughout the year. Has High peaks but does show some decline during later part of the year
- Fig2 -
- Purpose - Analysis of non-polarising topics between Jan 2021 and march 2022
- Analysis -
- Can be seen prevalent all throughout the year, has high peaks and continues to be in discussion all throughout the year
- Fig 1 v/s Fig 2 -
- No
of tweets involving a polarising topic like Covid Vaccine is
significantly higher than the ones involving a non-polarising topic like
petrol price hikes.
- Fig 3 & Fig 4 -
- Purpose - analyse the Likes and Retweet Trends of a hashtag with a negative sentiment
- Analysis -
- There is activity on theses tweets over a period of 4 months
- No of likes and retweets for tweets containing only one Hashtag - #VaccineShortage are pretty high reaching upto 14000 retweets and likes - 50,000 in one month.
- Fig 5 & Fig 6 -
- Purpose - Analyse Lifespan of Tweets containing hashtags with some negative sentiment
- Analysis -
- Prevalent all over the year from Jan 2021 to Dec 2021
- Multiple peaks during the year followed by sudden troughs denoting the trend of such tweets to be effective for small intervals with maximum impact
- Fig 7 & Fig 8 -
- Purpose - Analyse retweets and likes for tweets with positive sentiments
- Analysis -
- Prevalent only for one or two months
- Only one peak during Jan 2022
- Fig 9 & Fig 10 -
- Purpose - Analyse lifespan of tweets with hashtag #FarmLawRepealed
- Analysis -
- Prevalent only for two months in a year
- No of likes and tweets is more than tweets recorded in Fig 5 & Fig 6.
- Fig 11 -
- Purpose - Tracking the activity of media houses
- Analysis -
- There were times when tweets from media houses were in huge numbers like from March to August 2021
- Analysis showed that most of theses tweets were based on Vaccination drives and Farm Law Protests
- The activity slowed drastically during months of Aug to November probably due to news of farm laws being repealed
- The activity again saw a rise during the time when UP election dates were announced
- These analysis helped use in selecting the topics for analysing polarisation.
- Fig 13 & Fig 14 -
- Purpose - Analyse Layer2 user behaviour before and during UP elections towards ruling party
- Analysis -
- Analysing first half (Jan 2021 to October 2021) of tweets from Layer2 users, we came to know that out of 13 users had a positive sentiment towards Class A users.
- The same declined to 7 users towards second half(October 2021 to March 2022.
- Fig 15 & Fig 16 -
- Purpose - Analyse Layer2 user behaviour before and during UP elections towards opposition party
- Analysis -
- Analysing first half (Jan 2021 to October 2021) of tweets from Layer2 users, we came to know that out of 8 users had a positive sentiment towards opposition parties.
- The number remained same towards second
half(October 2021 to March 2022) but For Eg - User no 22 changed
his/her perception towards anti-ruling party. User no 22 who somewhat
had positive vibe towards opposition parties, now in second half seems
to be more against opposition parties.
- Fig 17 & 18 -
- Purpose - Analyse Layer3 user behaviour before and during UP elections towards ruling party
- Analysis -
- Analysing first half (Jan 2021 to October 2021) of tweets from Layer3 users, we came to know that out of 10 users had a positive sentiment towards ruling party.
- The number increased in second half(October 2021
to March 2022) which means some users from Layer3 did see some shift in
their viewpoint.
- Fig 19 & Fig 20 -
- Purpose - Analyse Layer3 user behaviour before and during UP elections towards opposition party
- Analysis -
- Analysing first half (Jan 2021 to October 2021) of tweets from Layer3 users, we came to know that out of 21 users had a positive sentiment towards Class B users.
- The number increased to 30 in second half(October 2021 to March 2022) which means some users from Layer3 did see some sort of positive shift towards opposition parties during election time.
Future work
Previous
research on opinion polarisation in social networks has focused on
themes that are previously known to cause polarisation; as a result, the
structural features of polarised social networks are yet unknown.
We
compare polarised and non-polarized networks in this paper, and propose
a new metric for determining the degree of polarisation.
Instead
than inventing issues ourselves, we strive to uncover polarising themes
from scraped tweets, which gives us a new perspective to consider.
This permits us to experiment with fresh ideas rather than relying on old ones.
We intend to do analyses on areas other than politics in the future.
When numerous topics overlap in the timeline, we plan to undertake polarisation degree analysis as well.
Team Members:
- Vishal Pandey
- Mayank Mukundam
- Karan Negi
- Vineet Agrawal
- Bhupendra Sharma
- Shravan Sharma
- Rahul Mishra
- Yash Vardhan Sharma
- Mainak Dhara
- Aditya Sharma

Comments
Post a Comment