Motivation
Gender inequality is the social phenomenon in which men and women are not treated equally. The treatment may arise from distinctions regarding biology, psychology, or cultural norms prevalent in the society. Some of these distinctions are empirically grounded, while others appear to be social constructs. It is indeed prevalent in various fields and in various aspects - some of which are relatively known and common to us. Like gender pay gap and positions held in workplaces, gender roles at home and society, etc. Similarly, there is enough evidence which depicts sexism in academia. Likewise, we try to depict if and how gender bias is present in research academia and the various ways by which we can get some meaningful conclusions regarding the same.
Introduction
Here, in this project, we will be working on ‘Gender Bias in Research Acadmeia’. We try to look upon various aspects like the publications that are made by females, the research and academic positions that females hold, analysis using collaborative networks based on the dtaa of papers collected,etc.
Gender bias is a social problem as we discussed earlier. We try to that we are analyzing and understanding using computational tools, which is why we consider this is a CSS problem.
We aim to make use of various parameters like women representation in tenured academic positions, women representation in publications, comparison of male and female h-indices, citation counts, comparing citations of similar controlled papers written by male and female authors, etc. to demonstrate and make findings about this bias
Related Works
Some related works have been done which too indicate the fact that there is indeed a gender bias in different areas in academics itself. Similar research has been done by Teele and Thelen et.al.[2], where the author talks about how there is gender gap in publications in the field of political science. The paper also discusses about how women are generally underrepresented in many disciplines’ top journals. Also, for example, as one can observe in Fig.1, the study shows how women in academia have less chance of making it to the top of the hierarchy[7]. One more paper and blog, by Hermant & Selvaratnam et al.[8], which discusses about how women have been consistently been represented less and the number of faculty in various fields of science are also biased towards men. There are many such studies which have been done to understand gender bias in academia.
Research Questions
In this Project, we will be trying to answer the following questions:
- Female representation in professors? Researchers?
- In how many of their citations are authors first or last?
- Do same genders tend to collaborate more?
- Do female authored papers get into higher class journals /conferences?
- How influential are women authored papers?
- Are male authors cited more for similar papers in similar conferences?
Methodology Pipeline
Data Extraction
The data we needed for the project can be broadly classified into two categories:
Researchers
This contains data on researchers from various esteemed universities and institutions across India. This data has been extracted from individual websites of universities and colleges which are publicly available and accessible.
Research Papers
After we have data from researchers across the country, we mapped them with their publications. For this, we used Google Scholar, which is also open to the public.
We did similar data extraction for 5 different prestigious research-oriented institutes across the country.
BeautifulSoup, Selenium, and curl were used for both the tasks. One issue with extracting data from different websites is that it is not consistent, for example, the attribute of gender(of the researcher) was not available for many universities’ researchers.
Gender API
To solve the above mentioned issue with the extracted data, we needed to use APIs which can predict the gender of a person with 95% accuracy, by using both rule-based and neural-network-based models.
API: https://gender-api.com/en/api-docs
After using the API to get the gender of all of those researchers whose data was missing, we updated the data and proceeded with the decided analyses.
Analysis
Based on our research questions, we came up with 3 different types of analyses that can answer them properly.
- Gender Representation Statistics
- Citation Analysis (On similar research papers)
- Collaborative Network
Gender Representation Statistics
This analysis helped us to answer a couple of questions such as:
- Do female have better representation as tenured professors?
- Does gender have an affect on citations of a research paper?
- How does the distribution of genders in research ranks look like?
We can see how the gender representation is in different colleges across the country.
Similar Research paper Citation Analysis
Just comparing the citations for papers across researchers will not result in optimized understanding of the bias that exists in the community, so we need to consider papers that are similar in terms of topic of interest & the level of conference the paper is published in.
So, the data required for the task should contain information regarding the conference level along with other trivial information like author(s), topic and year of publishing. We chose to extract data separately, as mapping the numerous versions of the conference names’ was extremely difficult.
Therefore, we chose a conference(IEEE, in our case) and extracted research papers from various years.
After the data extraction, we used an encoder model to measure the similarity by creating a similarity matrix based on "paper abstract". The questions that we were able to answer:
- Do female researchers have less representation in prestigious conferences?
- Do similar papers from different genders get cited differently?
Results
Female authors were not well represented in these conferences.
16% of papers in these conferences were headed by female authors.
However, on average papers supervised by female authors were just as much cited as male authors’ work. 45% of female authors’ papers were cited more
Collaborative Networks
To answer questions such as: Do female researchers tend to collaborate more with female researchers? And to visually represent topics of interest via graphs, we analysed the data using Collaborative networks.
We used networkx library to create graph representation of the collaborative network - the graph where authors who co-author papers are connected by an edge. We perform analysis of various measures like degree coefficients,, betweenness centrality, eigenvalue centrality, closeness centrality, etc. We also used Louvain communities to analyze the clustering/community formation in the network. What follows from here onwards are the samples of the results we got on analyzing the data of IISc bangalore papers - one of the institutes on which we did our analysis.
Degree Analysis
We find that the maximum male degree is 10398 while the maximum female degree is 500. The average male degree was found to be 7.774381095273818 while the average female degree was 6.220233139050791. Thus, we see that males tend to collaborate with more people in comparison to females.
Here are two graphs comparing the male vs female degrees in the collaborative network:(To get better visualization with the graphs, we show the numbers till 100 degrees.)
Betweenness Centrality
Betweenness centrality indicates fraction of shortest paths between all pairs of nodes that pass through a node. The betweenness of a node indicates the node’s ability to funnel the flow in the network. In this network, the author
with a high betweenness has a large influence in transferring the information from one part of the network to another.
The average male betweenness centrality was found to be 8.613493831837764e-05 while average female betweenness centrality was 3.875705053835589e-05. However, these averages are skewed because of one male having a high outlier value. From the histogram distribution, we can see that females have a relatively higher tendency to form junctions between different researchers.
Closeness Centrality
Closeness centrality indicates how close scholars are from others. Mathematically, it is sum of all the shortest paths between a node to all other nodes. If a shortest path between node u to v is d(u, v) and the total number of nodes in the graph is denoted by N , closeness centrality of the node u is defined as follows:
where N − 1 in the nominator normalizes the measure so that it becomes size independent. Scholars with high closeness centrality are on average closer to other nodes in the network.
The average male closeness centrality was found to be 0.3187652385823074 while average female closeness centrality was 0.3145980279219301. Thus, we can see that males have other researchers closer to them, and thus, have better reach within the network.
Eigen Vector Centrality
Eigenvector Centrality is an algorithm that measures the transitive influence of nodes. Relationships originating from high-scoring nodes contribute more to the score of a node than connections from low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.
The average male eigen-vector centrality was found to be 0.003374652539742752 while average female eigen-vector centrality was 0.0028032596918257293. Thus, we can see that males have other researchers have more influential connections, and thus, have better influence and importance within the network.
Community Analysis
The Louvain method is an algorithm to detect communities in large networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.
It is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.
We analyzed the individual Louvain communities that were formed to analyze community formation in the graph. In the graph below, each color represents the nodes belonging to a single community.
The most important communities can be seen below. We see that in most of the communities that are formed, the number of males severely outweigh the number of females.
Total such 230 communities were formed, but to visualize better, we have 20 communities per graph - thus a total of around 12 graphs.
It is also apparent that most of the communities in the graph contain only male researchers. It's possible that this could have happened only because of fewer female nodes, but it still indicates that the similar gender researchers tend to cluster together.
Along with male-only communities, we also observe some female-only communities and also some communities with higher percentages of female nodes. But, such female communities had very few researchers.
To get better observation and visualizations, we used percentage and normalizations to get better graphs:
1. Percentage per Community
This graph depicts the percentage of males and females community wise. This helps us to visualize the results better as the total members in different communities varied hugely. Hence percentage per community gives a better idea about the overall representation in each community.
2. Normalized Values
Here, we can better visualize the data because of normalizations in terms of the size of each community. However, it is still biased in terms of the lesser female to male representation, that we must normalize as seen above. Here, we can clearly see a good number of female friendly communities that have comparable levels of normalized representations. These normalizations were done by dividing the counts of each community with the total counts of females and males.
3. Percentage of male-females overall
Here we are get the values by the percentage of males and females respectively to their whole list of values - and not like community wise as done earlier
Failed Approaches
We tried scraping using different APIs and websites. However, most of them had a cap on the number of requests, and so, we ended up manually scraping the websites.
Earlier, we used different APIs for finding genders of users. However, they gave erroneous results for Indians, because of which we used the Indian Gender API(python module ‘guess-indian-gender’) and the gender api as discussed above.
Louvain analysis python modules were not working, and it is because of this that we had to use modules from different resources.
Assortativity values, which measure similarity of connections in the graph, were not calculated properly for the given data of males and females.
Conclusion
Thus, we can see that gender bias is highly prevalent in research academia on a National level. This was concluded on the basis of high bias of males in terms of the representation as tenured professors, researchers in institutes, and on the distribution and influence in citation networks. However, when looking at the data on an International level, namely in terms of the difference in citations of similar research papers published in similar conferences, we see NO bias in the results of the study.
This leads us to believe that India still has more bias in research academia in comparison to the International standards, and thus, we need to improve our understanding about gender representation and make policies to combat the same.
References
[1] Ghiasi, Gita; Larivière, Vincent; Sugimoto, Cassidy R. (2015-12-30). "On the Compliance of Women Engineers with a [8] Hermant, N., & Selvaratnam, N. (2018, March 11). Women in academia take aim at sexism in university research fields. ABC News. Retrieved May 4, 2022, from https://www.abc.net.au/news/2018-03-11/women-in-academia-take-aim-at-sexism-university-research-fields/9522500 Gendered Scientific System". PLOS ONE. 10 (12): e0145931. doi:10.1371/journal.pone.0145931. ISSN 1932-6203. PMC 4696668. PMID 26716831.
[2] Teele, Dawn Langan; Thelen, Kathleen (April 2017). "Gender in the Journals: Publication Patterns in Political Science". PS: Political Science & Politics. 50 (2): 433–447. doi:10.1017/S1049096516002985. ISSN 1049-0965.
[3] Jin, Shi & Peng, Yufang & Fantinato, Marcelo & Chen, Jing. (2017). A study on the author collaboration network in big data. Information Systems Frontiers. 19. 10.1007/s10796-017-9771-1.
[4] Fariba Karimi, Philipp Mayr and Fakhri Momeni: Analyzing the network structure and gender differences among the members of the Networked Knowledge Organization Systems (NKOS) community https://arxiv.org/pdf/1803.04225.pdf
[5] Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008(10), P10008 (2008)
[6] Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1(3), 215–239 (1978)
[7] Marley Doyle, M.D. er al. Academic Psychiatry, June 2018; Association of American Medical Colleges
[8] Hermant, N., & Selvaratnam, N. (2018, March 11). Women in academia take aim at sexism in university research fields. ABC News. Retrieved May 4, 2022, from https://www.abc.net.au/news/2018-03-11/women-in-academia-take-aim-at-sexism-university-research-fields/9522500
Comments
Post a Comment