Vrinda Malhotra - Academia.edu (original) (raw)

Uploads

Papers by Vrinda Malhotra

Research paper thumbnail of Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step, is to generate debiased stories. The proposed debiasing algorithm extracts gender biased graphs from unstructured piece of text in stories from movies and de-bias these graphs to generate plausible unbiased stories.

Research paper thumbnail of Graph Neural Networks for Malware Classification

Research paper thumbnail of Bollywood Movie Corpus for Text, Images and Videos

arXiv (Cornell University), Oct 11, 2017

In past few years, several data-sets have been released for text and images. We present an approa... more In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1970-2017. The corpus contains csv files with the following data about each movie-Wikipedia title of movie, cast, plot text, co-referenced plot text, soundtrack information, link to movie poster, caption of movie poster, number of males in poster, number of females in poster. In addition to that, corresponding to each cast member the following data is available-cast name, cast gender, cast verbs, cast adjectives, cast relations, cast centrality, cast mentions. We present some preliminary results on the task of bias removal which suggest that the data-set is quite useful for performing such tasks.

Research paper thumbnail of Analyzing Gender Stereotyping in Bollywood Movies

arXiv (Cornell University), Oct 11, 2017

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood). We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at inter-sentence and intrasentence level. Different features like occupation, introduction of cast in text, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. We derive a semantic graph and compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. Furthermore, we explore the movie trailers to estimate on-screen time for males and females and also study the portrayal of emotions by gender in them. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken.

Research paper thumbnail of A Comparison of Graph Neural Networks for Malware Classification

arXiv (Cornell University), Mar 21, 2023

Managing the threat posed by malware requires accurate detection and classification techniques. T... more Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the wellknown MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.

Research paper thumbnail of Bollywood Movie Corpus for Text, Images and Videos

ArXiv, 2017

In past few years, several data-sets have been released for text and images. We present an approa... more In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1970-2017. The corpus contains csv files with the following data about each movie - Wikipedia title of movie, cast, plot text, co-referenced plot text, soundtrack information, link to movie poster, caption of movie poster, number of males in poster, number of females in poster. In addition to that, corresponding to each cast member the following data is available - cast name, cast gender, cast verbs, cast adjectives, cast relations, cast centrality, cast mentions. We present some preliminary resu...

Research paper thumbnail of Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step...

Research paper thumbnail of Analyzing Gender Stereotyping in Bollywood Movies

ArXiv, 2017

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood). We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at inter-sentence and intra-sentence level. Different features like occupation, introduction of cast in text, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereo- type in movies. We derive a semantic graph and compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. Furthermore, we explore the movie trailers to estimate on-screen time for males and females and also study the portrayal of emotions by gender in them. The silver lining is that our...

Research paper thumbnail of A comparison of graph neural networks for malware classification

Journal of Computer Virology and Hacking Techniques

Managing the threat posed by malware requires accurate detection and classification techniques. T... more Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the wellknown MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.

Research paper thumbnail of Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step, is to generate debiased stories. The proposed debiasing algorithm extracts gender biased graphs from unstructured piece of text in stories from movies and de-bias these graphs to generate plausible unbiased stories.

Research paper thumbnail of Graph Neural Networks for Malware Classification

Research paper thumbnail of Bollywood Movie Corpus for Text, Images and Videos

arXiv (Cornell University), Oct 11, 2017

In past few years, several data-sets have been released for text and images. We present an approa... more In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1970-2017. The corpus contains csv files with the following data about each movie-Wikipedia title of movie, cast, plot text, co-referenced plot text, soundtrack information, link to movie poster, caption of movie poster, number of males in poster, number of females in poster. In addition to that, corresponding to each cast member the following data is available-cast name, cast gender, cast verbs, cast adjectives, cast relations, cast centrality, cast mentions. We present some preliminary results on the task of bias removal which suggest that the data-set is quite useful for performing such tasks.

Research paper thumbnail of Analyzing Gender Stereotyping in Bollywood Movies

arXiv (Cornell University), Oct 11, 2017

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood). We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at inter-sentence and intrasentence level. Different features like occupation, introduction of cast in text, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. We derive a semantic graph and compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. Furthermore, we explore the movie trailers to estimate on-screen time for males and females and also study the portrayal of emotions by gender in them. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken.

Research paper thumbnail of A Comparison of Graph Neural Networks for Malware Classification

arXiv (Cornell University), Mar 21, 2023

Managing the threat posed by malware requires accurate detection and classification techniques. T... more Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the wellknown MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.

Research paper thumbnail of Bollywood Movie Corpus for Text, Images and Videos

ArXiv, 2017

In past few years, several data-sets have been released for text and images. We present an approa... more In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1970-2017. The corpus contains csv files with the following data about each movie - Wikipedia title of movie, cast, plot text, co-referenced plot text, soundtrack information, link to movie poster, caption of movie poster, number of males in poster, number of females in poster. In addition to that, corresponding to each cast member the following data is available - cast name, cast gender, cast verbs, cast adjectives, cast relations, cast centrality, cast mentions. We present some preliminary resu...

Research paper thumbnail of Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step...

Research paper thumbnail of Analyzing Gender Stereotyping in Bollywood Movies

ArXiv, 2017

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this... more The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (Bollywood). We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at inter-sentence and intra-sentence level. Different features like occupation, introduction of cast in text, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereo- type in movies. We derive a semantic graph and compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. Furthermore, we explore the movie trailers to estimate on-screen time for males and females and also study the portrayal of emotions by gender in them. The silver lining is that our...

Research paper thumbnail of A comparison of graph neural networks for malware classification

Journal of Computer Virology and Hacking Techniques

Managing the threat posed by malware requires accurate detection and classification techniques. T... more Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the wellknown MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.