Team 20 : Election Candidate Profiling
Election Candidate Profiling
Motivation
We wanted to build a system that can help voters to know and understand the candidates better , compare their peers and past candidates.
Why Knowing Candidate is crucial ?
We have conducted a preliminary survey through twitter where 59.6% of the people preferred voting on the basis of the party rather than the candidate. When asked about the importance of knowing the candidate before the voting, 92.7% voted for YES.
We observe majority of voters usually votes for party irrespective of which candidate is running for the post in their constituency. Elected candidate is going to represent their constituency and going to play very important role is echoing constituencies needs and concerns. Elected candidates plays crucial roles in passing bills and other major government decisions for half a decade.
Even though candidates declare their criminal and financial details, often these details are shadowed by campaigns , media support and corruption etc.
Why this is a CSS Problem ?
- Election is most crucial social event and it decides course of nation/state/regions progress.
- Polarization of political views , mass manipulation, artificial hype of candidate's capability by election campaigns. media houses coverage and bias often shadows voter's ability to pick right candidate for their constituency
- Building a system to aid voters do due diligence before voting, gives power back to voters.
- Any form of Positive nudge to voters to perform their duty will have positive impact on our society's governance.
Challenges
- Affidavit Data in NIC archives are scanned copy of actual handwritten forms
- All PDF contains data in regional language, there is no standard format , everyone types out affidavit in their own way.
- No way to cross check errors in affidavit data.
Ethics and Privacy
Data collected for this project is already available in public portals and our analysis is built based only on Affidavit data and data from official sources.
Constituency's statistics and other Meta data such as education , crime are collected from government agencies
All data given in affidavits are already made available in https://affidavit.eci.gov.in/ and https://affidavitarchive.eci.nic.in/ sites.
Data Collection
The data is collected from various sources:
- Affidavit data (from myneta and lokdhaba by ashoka university )
- Criminal, Financial and Educational information of candidates who have contested, as also winners of elections to State Assemblies, the Parliament and a few local bodies
- Political Party Watch (PPW), started in 2008, data includes: Income-Expenditure Statement of various political parties (national, regional and unrecognized parties)
- Donations above Rs 20,000 that are received by Political Parties (National & Regional)
- National Crime Records Bureau (NCRB)
The details regarding crime rates in districts, constituencies and major cities are collected from NCRB website, which is an Indian government agency responsible for collecting and analyzing crime data as defined by the Indian Penal Code (IPC) and Special and Local Laws (SLL).
The details regarding different crimes committed are collected state wise, further district wise are as following: STATE/UT, DISTRICT, YEAR, MURDER, ATTEMPT TO MURDER, TOTAL IPC CRIMES etc. Since there was ambiguity in calculation of rate and rank in crimes, the total number of IPC cases are considered in each crime.
- Census India
- The details regarding literacy rate are collected state wise and district wise from the Census India website. It contains the following parameters
- Schools present in a district by category (Govt aided, private, Having sanitation facilities, Having mid-day meal facilities etc)
- Students enrolled in the schools by category (male, female)
- Teachers enrolled in the schools by category (primary, upper primary, secondary)
Our approach
- Instead of building OCRs to read PDF files, we focused on collecting data through myneta.info and other lokdhaba portal. We performed data cleaning and curation into single dataset.
- From Primary data , we have created derived dimensions that can help to interpret specific qualities of candidates
Outcomes
![]() |
| State wise trend data across the nation |
- As part of this project work , we also tried to create a Election data as a service.
- Though this Data service , we can Democratizing election data by providing easy interface
- We have built a easy rest api interface using AWS Lambda and serverless framework
- This allows future researcher to consume the normalized election related data. and build their applications/research on top of this.
Possible future works :
- Localization
- Combining more growth and impact related metrics - Welfare, infrastructure and other growth indicators
- Disaster and pandemic recovery statistics
- Pivot information for Party
- Combining Social data of candidate - Flagged/Soft moderated posts, Extreme speech content ratio. mentions in viral posts/topics.
Summary
- Combining Multiple data source gives simpler ways to interpret candidature of candidates contesting in election
- Curating basic candidate and constituency data , gives ability to voters to choose better representative
- District and state wise trend shows quality of politicians in the region
- e.g. Candidate profile from North east region in India shows better candidate profiles
Team Members:
- Issac Balaji
- Karthikeyan Arumugam
- Arun Kumar Padakanti
- Dustakar Prasanth Rao
- Haridasu Yaswanth
- Siddhant Chandra Kulshrestha
- Sourav Kumar Singh
- Vambaravelli Tharun Sai
References :
[1] - https://cmsindia.org/sites/default/files/2019-05/Poll-Expenditure-the-2019-elections-cms-report.pdf
[2] - https://affidavit.eci.gov.in/ - Election Commission official Website for Affidavits for current elections
[3] - https://affidavitarchive.eci.nic.in/- Election Commission official Website for Affidavits for all elections
[4] - https://www.myneta.info/ - Portal built by https://adrindia.org/ which contains translated/parsed affidavit data
[5] - https://lokdhaba.ashoka.edu.in/ - Lok Dhaba is a repository of Indian election results built by Ashoka University
[6] https://ncrb.gov.in/en - National Crime records bureau of India



Comments
Post a Comment