02: USING TWITTER ANALYTICS TO PREDICT VIOLENT PUBLIC PROTESTS

OVERVIEW & PURPOSE

Citizens participate in mass demonstrations to express themselves and exercise their democratic rights. By means of protests, people express their interests, needs, approval or disapproval of a particular situation, and try to bring a better future to their society. Even though a majority of protests have been reported to be peaceful, because of the large number of participants in demonstrations, protests may lead to violence and destruction, causing financial damages and/or psychological effects on the society, and hence can be costly in many dimensions.

Now a days people have more "to me" thinking which leads them to express themselves on online media platform like twitter and Facebook. This create a huge dataset available publicly to get some meaningful results. This course have taught us to use this type of dataset which is very common and easily available and apply computational methodologies to get deep insights and results on that.

We have applied those principles to get some sense of predicting these(protests) event before they can occur, we take the analogy from anomaly detection system of Cisco and other companies use for device anomaly detection.

Many studies have proven the impact of some event on these platform is quite unusual which makes it a entity that could be predicted for similar type of network behaviors.

On which our project lay down its basic idea of predicting these type of events before happening.

RELATED WORK

These are some of the great works we have mentioned in the overview area each participating something valuable to the final conclusion we reached Predicting_Democratic_Protests_Paradigm_from_Twitter_Using_Deep-learning_BiLSTM_Model: This paper is based on hypothesis that unrestricted data available on Twitter has potential power to predict day and location of major protests when deep learning and data mining is applied to its unstructured data. The prediction model used in the study is Bidirectional Long Short Term Memory (BiLSTM) Model. The results of the experiment shows that the model performs better than existing baselines models achieving an accuracy of 82.02% on testing data and 95.28% on training data. Overall, the aim of the paper is to study the efficacy of data mining techniques on publicly available data on Twitter in predicting mass protests and demonstrations. Sentiment analysis and classification of Indian farmers’ protest using twitter data: In this research, data is gathered from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level. They used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest. They conducted the analysis using Bag of Words and TF-IDF and discovered that Bag of Words performed better than TF-IDF. In addition, they also used Naive Bayes, Decision Trees, Random Forests, and Support Vector Machines and also discovered that Random Forest had the highest classification accuracy. Capitol (Pat)riots: A comparative study of Twitter and Parler: The paper presents a contrast between the trending content on Parler and Twitter around the time of riots. The data was collected from both platforms based on the trending hashtags and drew comparisons based on what are the topics being talked about, who are the people active on the platforms and how organic is the content generated on the two platforms. While the content trending on Twitter had strong resentments towards the event and called for action against rioters and inciters, Parler content had a strong conservative narrative echoing the ideas of voter fraud similar to the attacking mob. There also was a disproportionately high manipulation of traffic on Parler when compared to Twitter. Twitter Reveals: Using Twitter Analytics to Predict Public Protests: The aim of this paper is to predict protests by means of machine learning algorithms. In particular, they consider the case of protests against the then-president-elect Mr. Trump after the results of the presidential election were announced in November 2016. They first identify the hashtags calling for demonstration from Trending Topics on Twitter, and download the corresponding tweets. Then applied four machine learning algorithms to make predictions. The findings indicate that Twitter can be used as a powerful tool for predicting future protests with an average prediction accuracy of over 75 percent (up to 100 percent). They further validate the model by predicting the protests held in the U.S. airports after President Trump's executive order banning citizens of seven Muslim countries from entering the U.S. An important contribution of the study is the inclusion of event specific features for prediction purposes which helps to achieve high levels of accuracy.

RESEARCH QUESTIONS

Is it possible to predict a protest before it happens using data which contains bias ?

Are predictions even possible to be accurate ?

Is it possible to predict geolocation of the protest using data without geotagging ?

METHODOLOGY PIPELINE

Data collection and Preprocessing

We have collected over 4M tweets for training and verification of the methodology.

Generating Embeddings & Topic Modeling on tweets

We have use context and perspective based topic modeling for the tweets.

Spike pattern recognition and anomaly detection.

This step is much similar to anomaly detection for electronic devices with their device log pattern.

BiLSTM modeling to our data for critical probability.

This is the main machine learning algorithm we use for the prediction of the time span the event could occur.

Predicting over spike prediction and critical probability.

The final conclusions were amazing as we have seen pattern that can eventually predict the protest events within a valid time limit.

ANALYSIS

The conclusions we have got:

1. Delhi Protest(CAA/NRC)

2. Farmer Protest

3. Capitol Attack

The results are showing a high correlation between the final prediction and sentiment plot of the tweets per day.

EXPERIMENTAL DESIGN AND RESULTS

ABLATIONS: We aimed to predict the most precise locations for the protests to happen but this has some problems with the type of data we had.

GEOTAGS: As our tweet audience was Indian population the tweet datasets we have face a major issue, they are not geotagged.

The tweets are not geotagged as location access is not possible all the time for twitter to update it, which creates a problem of not having an accurate description of the location.
We solve this issue by doing a network analysis on data of users involved in the tweets dataset to guess the geo locations using spatial inspection.
This creates an approximate estimation of the cities the protest could happen which give us probabilistic data.
Guessing the estimation of size: When we have started the end goal was to predict exact estimation of the size and the location we have reached the exact day and location but the estimation would require much more data than expected.

CONCLUSION

Network analysis have shown a high correlation between sentiment anomaly in tweets posted per day and protest events, we have shown that in the analysis section side by side.

Geo Location optimization are also performed well in average spread-out tweet data e.g. tweets collected for a city with less than 5% of Geotagged tweets.

Location of events also have a centric correlation with cities used in user tweets we have extracted, which is done using a network simulation over normalized city information in tweets and the city of the event.

FUTURE WORK

The tweet data modelling have shown us various result that are good beyond expected which can also be extended to analysis/prediction of the size of the event.

We can also work on post protest analysis like recoil or some type of bigger event that could break the prediction.

Video and PPT Link

Search This Blog

CS9.435.S22