Huilian Sophie Qiu | Brown University (original) (raw)
Papers by Huilian Sophie Qiu
Proceedings of the National Academy of Sciences of the United States of America, Jun 11, 2024
Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is asso... more Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is associated with negative outcomes such as stress and turnover. Automatic detection is one approach to prevent and mitigate interpersonal conflict. Two recent automatic detection approaches were developed in different settings: a toxicity detector using text analytics for open source issue discussions and a pushback detector using logs-based metrics for corporate code reviews. This paper tests how the toxicity detector and the pushback detector can be generalized beyond their respective contexts and discussion types, and how the combination of the two can help improve interpersonal conflict detection. The results reveal connections between the two concepts.
2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Proceedings of the 15th International Conference on Cooperative and Human Aspects of Software Engineering
The problem of low gender diversity in open-source software (OSS) has been reported and studied i... more The problem of low gender diversity in open-source software (OSS) has been reported and studied in recent years. However, prior studies found that gender bias theories in social sciences cannot help us effectively identify gender bias effects in OSS. Our study takes the first step toward finding new measures for gender bias in OSS. This paper attempts to employ linguistic theories to identify different collaboration patterns between different genders. Our contributions are twofold: we review linguistic literature on diversity and online collaboration, then we apply linguistic theories from our literature reviews to a random sample of code review conversations on GitHub. 1 INTRODUCTION The low gender diversity in the open-source software (OSS) community is a well-known phenomenon: among the GitHub users whose genders can be inferred, less than 10% are women [1, 6, 15, 30]. The low gender diversity is problematic as it can threaten OSS sustainability as a whole. Firstly, low gender diversity is suboptimal for project success: studies found that higher gender diversity is associated with fewer community smells [7, 38] and higher team performance [26, 34, 40]. Moreover, the highly imbalanced gender representation and the unwelcoming culture in some open-source projects [23] may discourage underrepresented groups from initial participation, which limits opportunities both for those individuals and for employers that use OSS as a talent pool [32, 33]. One of the reasons for women's low participation is gender bias [19, 23, 39]. Based on interviews with OSS developers, Nafus [23] pointed out that, in OSS, "sexist behavior is [...] as constant as it is extreme. " A quantitative study by Terrell et al. [39] reports that female contributors face unfair treatments when making code contributions. This piece of work builds upon a prior attempt on investigating gender bias effects in OSS by Imtiaz et al. [19]. In their paper, Imtiaz et al. adapted a gender bias framework by Williams and Dempsey [42], which was developed for women in the workforce, to the context of OSS. The framework discusses four effects of gender bias women may face in the workforce. Prove-It-Again: women
2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Prior studies have shown that, in open-source software (OSS), diversity is a positive indicator o... more Prior studies have shown that, in open-source software (OSS), diversity is a positive indicator of productivity. Yet, code submissions from underrepresented groups are less successful. This mirrors the diversity-innovation paradox found in science-diverse groups produce more innovations, but historically underrepresented people have less successful careers in these groups. In this preliminary research, we want to investigate whether the effect of the diversity-innovation paradox is present in OSS. We define software innovation as a novel co-usage of two packages in the same project. Using World of Code, we identified JavaScript projects' innovations from late 2008 to early 2014. We intend to calculate diversity measures for the authors who produced the innovations and build models to test the presence of the diversity-innovation paradox in OSS.
arXiv (Cornell University), Nov 16, 2021
The emergence of streaming data or “data in motion” has motivated the development of new “streami... more The emergence of streaming data or “data in motion” has motivated the development of new “streaming” algorithms that provide up-to-date answers to continuous queries; that is, queries that are issued once and then run continuously as new data streams in. For example, in the context of network traffic management, continuous queries over streaming Netflow data may be used to detect anomalies in the network as they happen (e.g., performance degradation, onset of an attack). One of the most popular approaches for detecting unusual patterns in the network is frequent itemset mining (FIM). Answers produced by many FIM algorithms are often high-dimensional and packed with rich information. As the rate of data arrival may be rapid, interpreting the output in real time can be challenging. The main objective of this thesis is to introduce a new visualization method that can visualize the continuous stream of answers produced by existing streaming algorithms in an intuitive and meaningful mann...
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Open-source software projects have become an integral part of our daily life, supporting virtuall... more Open-source software projects have become an integral part of our daily life, supporting virtually every software we use today. Since open-source software forms the digital infrastructure, maintaining them is of utmost importance. We present Climate Coach, a dashboard that helps open-source project maintainers monitor the health of their community in terms of team climate and inclusion. Through a literature review and an exploratory survey (N=18), we identifed important signals that can refect a project's health, and display them on a dashboard. We evaluated and refned our dashboard through two rounds of think-aloud studies (N=19). We then conducted a two-week longitudinal diary study (N=10) to test the usefulness of our dashboard. We found that displaying signals that are related to a project's inclusion help improve maintainers' management strategies.
Zenodo (CERN European Organization for Nuclear Research), Jan 25, 2023
Open-source software projects have become an integral part of our daily life, supporting virtuall... more Open-source software projects have become an integral part of our daily life, supporting virtually every software we use today. Since open-source software forms the digital infrastructure, maintaining them is of utmost importance. We present Climate Coach, a dashboard that helps open-source project maintainers monitor the health of their community in terms of team climate and inclusion. Through a literature review and an exploratory survey (N=18), we identified important signals that can reflect a project's health, and display them on a dashboard. We evaluated and refined our dashboard through two rounds of think-aloud studies (N=19). We then conducted a two-week longitudinal diary study (N=10) to test the usefulness of our dashboard. We found that displaying signals that are related to a project's inclusion help improve maintainers' management strategies.
Proceedings of the ACM on Human-Computer Interaction
Open source software represents an important form of digital infrastructure as well as a pathway ... more Open source software represents an important form of digital infrastructure as well as a pathway to technical careers for many developers, but women are drastically underrepresented in this setting. Although there is a good body of literature on open source participation, there is very little understanding of the participation trajectories and contribution experiences of women developers, and how they compare to those of men developers, in open source software projects. In order to understand their joining and participation trajectories, we conducted interviews with 23 developers (11 men and 12 women) who became core in an open source project. We identify differences in women and men's motivations for initial contributions and joining processes (e.g. women participating in projects that they have been invited to) and sustained involvement in a project. We also describe unique negative experiences faced by women contributors in this setting in each stage of participation. Our res...
2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is asso... more Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is associated with negative outcomes such as stress and turnover. Automatic detection is one approach to prevent and mitigate interpersonal conflict. Two recent automatic detection approaches were developed in different settings: a toxicity detector using text analytics for open source issue discussions and a pushback detector using logs-based metrics for corporate code reviews. This paper tests how the toxicity detector and the pushback detector can be generalized beyond their respective contexts and discussion types, and how the combination of the two can help improve interpersonal conflict detection. The results reveal connections between the two concepts.
Sustained participation by contributors in open-source software is critical to the survival of op... more Sustained participation by contributors in open-source software is critical to the survival of open-source projects and can provide career advancement benefits to individual contributors. However, not all contributors reap the benefits of open-source participation fully, with prior work showing that women are particularly underrepresented and at higher risk of disengagement. While many barriers to participation in open-source have been documented in the literature, relatively little is known about how the social networks that open-source contributors form impact their chances of long-term engagement. In this paper we report on a mixed-methods empirical study of the role of social capital (i.e., the resources people can gain from their social connections) for sustained participation by women and men in open-source GitHub projects. After combining survival analysis on a large, longitudinal data set with insights derived from a user survey, we confirm that while social capital is benef...
I realised it might be more convenient if I put the compressed surv_data.csv in the data file ins... more I realised it might be more convenient if I put the compressed surv_data.csv in the data file instead of providing a g-drive link in the README.
2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), 2021
2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
Proceedings of the ACM on Human-Computer Interaction, 2019
While open-source software has become ubiquitous, its sustainability is in question: without a co... more While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor effort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GitHub, where visible cues on personal profile and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that influence the contributors' decision to join a GitHub project. We first interviewed 15 GitHub contributors about their project evaluation processes and identified the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to ...
AoIR Selected Papers of Internet Research, 2021
Mobile live streaming recently advanced as popular form of content creation for streamers on shor... more Mobile live streaming recently advanced as popular form of content creation for streamers on short-video apps like Douyin and TikTok to make money through virtual gifts from viewers. Virtual gifts are primarily used to reward streamers for their content or performance in a live stream, yet on Douyin social aspects seem to be more crucial for virtual gifting. We are looking at three types of live streams by young male hosts (= streamers) on Douyin to analyze the motivations of female viewers to watch and make virtual gifts to the hosts and to explore resulting power relations between hosts and viewers. We conducted 12 semi-structured interviews with female of all three live streams who sent virtual gifts to hosts. We find female viewers send virtual gifts, for example, because they wanted to make the host happy or to help them achieve an assigned virtual gifts task. We did not find aspects of power imbalances regarding the monetary or social value of virtual gifts. While interviewees...
Proceedings of the ACM on Human-Computer Interaction
While open-source software has become ubiquitous, its sustainability is in question: without a co... more While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor eort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GH, where visible cues on personal prole and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that inuence the contributors' decision to join a GH project. We rst interviewed 15 GH contributors about their project evaluation processes and identied the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to test out the impact of each signal based on the data of 9,977 GH projects. We reveal that many important pieces of information lack easily observable signals, and that some signals may be both attractive and unattractive. Our ndings have direct implications for open-source maintainers and the design of social coding environments, e.g., features to be added to facilitate better project searching experience. CCS Concepts: • Software and its engineering → Collaboration in software development; Open source model;
Proceedings of the National Academy of Sciences of the United States of America, Jun 11, 2024
Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is asso... more Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is associated with negative outcomes such as stress and turnover. Automatic detection is one approach to prevent and mitigate interpersonal conflict. Two recent automatic detection approaches were developed in different settings: a toxicity detector using text analytics for open source issue discussions and a pushback detector using logs-based metrics for corporate code reviews. This paper tests how the toxicity detector and the pushback detector can be generalized beyond their respective contexts and discussion types, and how the combination of the two can help improve interpersonal conflict detection. The results reveal connections between the two concepts.
2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Proceedings of the 15th International Conference on Cooperative and Human Aspects of Software Engineering
The problem of low gender diversity in open-source software (OSS) has been reported and studied i... more The problem of low gender diversity in open-source software (OSS) has been reported and studied in recent years. However, prior studies found that gender bias theories in social sciences cannot help us effectively identify gender bias effects in OSS. Our study takes the first step toward finding new measures for gender bias in OSS. This paper attempts to employ linguistic theories to identify different collaboration patterns between different genders. Our contributions are twofold: we review linguistic literature on diversity and online collaboration, then we apply linguistic theories from our literature reviews to a random sample of code review conversations on GitHub. 1 INTRODUCTION The low gender diversity in the open-source software (OSS) community is a well-known phenomenon: among the GitHub users whose genders can be inferred, less than 10% are women [1, 6, 15, 30]. The low gender diversity is problematic as it can threaten OSS sustainability as a whole. Firstly, low gender diversity is suboptimal for project success: studies found that higher gender diversity is associated with fewer community smells [7, 38] and higher team performance [26, 34, 40]. Moreover, the highly imbalanced gender representation and the unwelcoming culture in some open-source projects [23] may discourage underrepresented groups from initial participation, which limits opportunities both for those individuals and for employers that use OSS as a talent pool [32, 33]. One of the reasons for women's low participation is gender bias [19, 23, 39]. Based on interviews with OSS developers, Nafus [23] pointed out that, in OSS, "sexist behavior is [...] as constant as it is extreme. " A quantitative study by Terrell et al. [39] reports that female contributors face unfair treatments when making code contributions. This piece of work builds upon a prior attempt on investigating gender bias effects in OSS by Imtiaz et al. [19]. In their paper, Imtiaz et al. adapted a gender bias framework by Williams and Dempsey [42], which was developed for women in the workforce, to the context of OSS. The framework discusses four effects of gender bias women may face in the workforce. Prove-It-Again: women
2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Prior studies have shown that, in open-source software (OSS), diversity is a positive indicator o... more Prior studies have shown that, in open-source software (OSS), diversity is a positive indicator of productivity. Yet, code submissions from underrepresented groups are less successful. This mirrors the diversity-innovation paradox found in science-diverse groups produce more innovations, but historically underrepresented people have less successful careers in these groups. In this preliminary research, we want to investigate whether the effect of the diversity-innovation paradox is present in OSS. We define software innovation as a novel co-usage of two packages in the same project. Using World of Code, we identified JavaScript projects' innovations from late 2008 to early 2014. We intend to calculate diversity measures for the authors who produced the innovations and build models to test the presence of the diversity-innovation paradox in OSS.
arXiv (Cornell University), Nov 16, 2021
The emergence of streaming data or “data in motion” has motivated the development of new “streami... more The emergence of streaming data or “data in motion” has motivated the development of new “streaming” algorithms that provide up-to-date answers to continuous queries; that is, queries that are issued once and then run continuously as new data streams in. For example, in the context of network traffic management, continuous queries over streaming Netflow data may be used to detect anomalies in the network as they happen (e.g., performance degradation, onset of an attack). One of the most popular approaches for detecting unusual patterns in the network is frequent itemset mining (FIM). Answers produced by many FIM algorithms are often high-dimensional and packed with rich information. As the rate of data arrival may be rapid, interpreting the output in real time can be challenging. The main objective of this thesis is to introduce a new visualization method that can visualize the continuous stream of answers produced by existing streaming algorithms in an intuitive and meaningful mann...
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Open-source software projects have become an integral part of our daily life, supporting virtuall... more Open-source software projects have become an integral part of our daily life, supporting virtually every software we use today. Since open-source software forms the digital infrastructure, maintaining them is of utmost importance. We present Climate Coach, a dashboard that helps open-source project maintainers monitor the health of their community in terms of team climate and inclusion. Through a literature review and an exploratory survey (N=18), we identifed important signals that can refect a project's health, and display them on a dashboard. We evaluated and refned our dashboard through two rounds of think-aloud studies (N=19). We then conducted a two-week longitudinal diary study (N=10) to test the usefulness of our dashboard. We found that displaying signals that are related to a project's inclusion help improve maintainers' management strategies.
Zenodo (CERN European Organization for Nuclear Research), Jan 25, 2023
Open-source software projects have become an integral part of our daily life, supporting virtuall... more Open-source software projects have become an integral part of our daily life, supporting virtually every software we use today. Since open-source software forms the digital infrastructure, maintaining them is of utmost importance. We present Climate Coach, a dashboard that helps open-source project maintainers monitor the health of their community in terms of team climate and inclusion. Through a literature review and an exploratory survey (N=18), we identified important signals that can reflect a project's health, and display them on a dashboard. We evaluated and refined our dashboard through two rounds of think-aloud studies (N=19). We then conducted a two-week longitudinal diary study (N=10) to test the usefulness of our dashboard. We found that displaying signals that are related to a project's inclusion help improve maintainers' management strategies.
Proceedings of the ACM on Human-Computer Interaction
Open source software represents an important form of digital infrastructure as well as a pathway ... more Open source software represents an important form of digital infrastructure as well as a pathway to technical careers for many developers, but women are drastically underrepresented in this setting. Although there is a good body of literature on open source participation, there is very little understanding of the participation trajectories and contribution experiences of women developers, and how they compare to those of men developers, in open source software projects. In order to understand their joining and participation trajectories, we conducted interviews with 23 developers (11 men and 12 women) who became core in an open source project. We identify differences in women and men's motivations for initial contributions and joining processes (e.g. women participating in projects that they have been invited to) and sustained involvement in a project. We also describe unique negative experiences faced by women contributors in this setting in each stage of participation. Our res...
2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is asso... more Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is associated with negative outcomes such as stress and turnover. Automatic detection is one approach to prevent and mitigate interpersonal conflict. Two recent automatic detection approaches were developed in different settings: a toxicity detector using text analytics for open source issue discussions and a pushback detector using logs-based metrics for corporate code reviews. This paper tests how the toxicity detector and the pushback detector can be generalized beyond their respective contexts and discussion types, and how the combination of the two can help improve interpersonal conflict detection. The results reveal connections between the two concepts.
Sustained participation by contributors in open-source software is critical to the survival of op... more Sustained participation by contributors in open-source software is critical to the survival of open-source projects and can provide career advancement benefits to individual contributors. However, not all contributors reap the benefits of open-source participation fully, with prior work showing that women are particularly underrepresented and at higher risk of disengagement. While many barriers to participation in open-source have been documented in the literature, relatively little is known about how the social networks that open-source contributors form impact their chances of long-term engagement. In this paper we report on a mixed-methods empirical study of the role of social capital (i.e., the resources people can gain from their social connections) for sustained participation by women and men in open-source GitHub projects. After combining survival analysis on a large, longitudinal data set with insights derived from a user survey, we confirm that while social capital is benef...
I realised it might be more convenient if I put the compressed surv_data.csv in the data file ins... more I realised it might be more convenient if I put the compressed surv_data.csv in the data file instead of providing a g-drive link in the README.
2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), 2021
2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
Proceedings of the ACM on Human-Computer Interaction, 2019
While open-source software has become ubiquitous, its sustainability is in question: without a co... more While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor effort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GitHub, where visible cues on personal profile and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that influence the contributors' decision to join a GitHub project. We first interviewed 15 GitHub contributors about their project evaluation processes and identified the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to ...
AoIR Selected Papers of Internet Research, 2021
Mobile live streaming recently advanced as popular form of content creation for streamers on shor... more Mobile live streaming recently advanced as popular form of content creation for streamers on short-video apps like Douyin and TikTok to make money through virtual gifts from viewers. Virtual gifts are primarily used to reward streamers for their content or performance in a live stream, yet on Douyin social aspects seem to be more crucial for virtual gifting. We are looking at three types of live streams by young male hosts (= streamers) on Douyin to analyze the motivations of female viewers to watch and make virtual gifts to the hosts and to explore resulting power relations between hosts and viewers. We conducted 12 semi-structured interviews with female of all three live streams who sent virtual gifts to hosts. We find female viewers send virtual gifts, for example, because they wanted to make the host happy or to help them achieve an assigned virtual gifts task. We did not find aspects of power imbalances regarding the monetary or social value of virtual gifts. While interviewees...
Proceedings of the ACM on Human-Computer Interaction
While open-source software has become ubiquitous, its sustainability is in question: without a co... more While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor eort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GH, where visible cues on personal prole and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that inuence the contributors' decision to join a GH project. We rst interviewed 15 GH contributors about their project evaluation processes and identied the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to test out the impact of each signal based on the data of 9,977 GH projects. We reveal that many important pieces of information lack easily observable signals, and that some signals may be both attractive and unattractive. Our ndings have direct implications for open-source maintainers and the design of social coding environments, e.g., features to be added to facilitate better project searching experience. CCS Concepts: • Software and its engineering → Collaboration in software development; Open source model;