19 : Fighting an Infodemic

 CSS Team 19 : Fighting an Infodemic

Problem statement description : 

Social media is used by everyone nowadays for sharing information or getting news about the ongoing policies and protest in a India.
Information spread faster than news channels through images and text shared on social media. Images are the most preceived social
visual non-verbal information which speaks more about scenario as compared to verbal/textual information. Everywhere fake news
are studied on social media. But images also have equal impact of how a news or protest is being shown on social media. Through
this project we are trying to analyse, how a protest is being depicted on social media through images. We have seen in the past
about the farmes protest, hijab protest etc. and how videos and images shape the prespective of people/viewers towards a protest.
Images of the protest can be manipulated and a image of a protest an be taken far away from what it really is. Thats why our project
aims to classify images into categories and in future it aims to classify images as manipulated or not by verifying from ground reality. 

Related work that we used : 

  • Sandra Gonzalez-Bailon, Javier Borge-Holthoefer, and Yamir Moreno. 2013. Broadcasters and Hidden Influentials in Online Protest Diffusion. American Behavioral Scientist 57, 7 (mar 2013), 943–965. https://doi.org/10.1177/0002764213479371

  • Dana R. Fisher. 2014. Studying Large-Scale Protest: Understanding Mobilization and Participation at the People’s Climate March. (2014).

  • Georgios Petkos, Symeon Papadopoulos, Emmanouil Schinas, and Yiannis Kompatsiaris. 2014. Graph-based multimodal clustering for social event detection in large collections of images in International Conference on Multimedia Modeling.

    

Research Questions :

  • Given a protest instance image, can we compute certain visual attributes/detail that describe protest aspects - Violence and Protesters' demographics? 

  • We aim to evaluate how protest is being shown over the Internet.

Methodology pipeline :

  • We trained a CNN model that take whole image(visual data) as input and outputs a series of prediction scores including the binary image category (i.e., protest or non-protest), visual attributes, and perceived violence.

  • Scraped images from internet for different protest with there keywords.Gathered data randomly from social media to remove bias from a particular company or social media agency. We were a team of 6 members annotated approx 6200 images in a timespan of one month.



Experimental design and results :

Repeated Data - Social media bias :

  • While annotating the data, we found that the dataset was having some repeated images.

  • We trained our model on both original dataset and dataset after removing the replicated images. 

  • We got a different result for both these experiments. 

  • We found that the percentage of violence in the dataset increases when the images are replicated that means violent images are shared more number of times than non-violent.



Observation :

  • Raw Scraped data contains 17% of violent img while on remove repeated images, it’s percentage reduced to 13.77.

  • Replicated violent images : 5.4% and Replicated non-violent images : 0.8% (Though it can be very much dependent on dataset).



 Pie-Chart for Protest with Flags

  • Most of the protest in which Flags are used are very less frequent. This can be reasoned due to involvement very large number of individual to join the movement. 

  • Model output for Flag confidence : 65%




Ablations :

  • We initially used different methods of classifications like SVM, Multinomial Naive Bayes and K Nearest Neighbors for classifying our data but, classification using Neural Networks gave us the best accuracy.

  • Classifying Violence as a binary value gave us low precision as every image had different levels of violence and binary classification of violence did not explain the images with moderate violence.

  • Initially we tried to include extra features like happy, sad, etc., which were not necessary for our work and upon that increased our model’s complexity. Hence, we worked on important features and used only those features that correctly define our work.

Deductions and Discussion :

  • Visual language is more universal than spoken language

  • Negative, violent & sad images spread faster over the internet as compared to non-violent images for a protest.

  • Our ML model can test on a particular protest and reveal, how a particular protest is being shown on social media through images.

Future Work :

  • Classification of comments/tweets about  incidents, via ML model trained on visual  dataset of that incident could be explored  that will help in verification of NEWS :- The so called Fake News detection.

  • Making it real time on industrial aspect!

  • Also, once it’s done coupling of correctness  of verbal/textual and non-verbal/visual can  be explored in real life.

  • In addition, We may also look for any bogus  information about the protests that has  circulated on the internet, but that is a task  for the future.



 

 


Comments

Popular posts from this blog

23: Understanding the discourses around CAA NRC

Archaeological Data Analysis on Harappan Civilization

14 : Misinformation Spread in Social Networks