Bogdan Vasilescu | Carnegie Mellon University (original) (raw)

Papers by Bogdan Vasilescu

Research paper thumbnail of Among the Machines

Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16, 2016

Research paper thumbnail of Social aspects of collaboration in online software communities

Additional members of the reading committee:

Research paper thumbnail of Quality and productivity outcomes relating to continuous integration in GitHub

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of Developer onboarding in GitHub: the role of prior social links and language experience

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of By no means

Proceeding of the 2nd international workshop on Emerging trends in software metrics - WETSoM '11, 2011

Fault prediction models usually employ software metrics which were previously shown to be a stron... more Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for defects, e.g., SLOC. However, metrics are usually defined on a microlevel (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, and Hoover indices have been proposed. In this paper we wish to understand whether the aggregation technique influences the presence and strength of the relation between SLOC and defects. Our results indicate that correlation is not strong, and is influenced by the aggregation technique.

Research paper thumbnail of Perceptions of Diversity on Git Hub: A User Survey

2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering, 2015

Research paper thumbnail of Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation

Empirical Software Engineering, 2014

Maintaining a productive and collaborative team of developers is essential to Open Source Softwar... more Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One's online social footprint is arguably easier to ascertain and gather than one's technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers' patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one's likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10%

Research paper thumbnail of Men at work: the StackOverflow case

Research paper thumbnail of Visualizing the Complexity of Software Module Upgrades

Research paper thumbnail of Software developers are humans, too!

Research paper thumbnail of Human aspects, gamification, and social media in collaborative software engineering

Software engineering is inherently a collaborative venture. In open-source software (OSS) develop... more Software engineering is inherently a collaborative venture. In open-source software (OSS) development, such collaborations almost always span geographies and cultures. Because of the decentralised and self-directed nature of OSS as well as the social diversity inherent to OSS communities, the success of an OSS project depends to a large extent on the social aspects of distributed collaboration and achieving coordination over distance. The goal of this dissertation research is to raise our understanding of how human aspects (e.g., gender or cultural diversity), gamification and social media (e.g., participation in social environments such as Stack Overflow or GitHub) impact distributed collaboration in OSS.

Research paper thumbnail of Seeing the Forest for the Trees with New Econometric Aggregation Techniques

Research paper thumbnail of FLOSS 2013: A Survey Dataset about Free Software Contributors: Challenges for Curating, Sharing, and Combining

In this data paper we describe a data set obtained by means of performing an on-line survey to ov... more In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free/Libre/Open Source Software (FLOSS) contributors. The survey includes questions related to personal characteristics (gender, age, civil status, nationality, etc.), education and level of English, professional status, dedication to FLOSS projects, reasons and motivations, involvement and goals. We describe as well the possibilities and challenges of using private information from the survey when linked with other, publicly available data sources. In this regard, an example of data sharing will be presented and legal, ethical and technical issues will be discussed.

Research paper thumbnail of EnTagRec: An enhanced tag recommendation system for software information sites

Software engineers share experiences with modern technologies by means of software information si... more Software engineers share experiences with modern technologies by means of software information sites, such as STACK OVERFLOW. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.

Research paper thumbnail of Lean GHTorrent: GitHub data on demand

Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014

In recent years, GITHUB has become the largest code host in the world, with more than 5M develope... more In recent years, GITHUB has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GITHUB as their host and have migrated their code base to it. GITHUB offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GITHUB data is, to date, largely underexplored. To facilitate studies of GITHUB, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GITHUB REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GITHUB repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GITHUB data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GITHUB studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).

Research paper thumbnail of Formalizing correspondence rules for automotive architecture views

Architecture views have long been used in software industry to systematically model complex syste... more Architecture views have long been used in software industry to systematically model complex systems by representing them from the perspective of related stakeholder concerns. However, consensus has not been reached for the architecture views between automotive architecture description languages and automotive architecture frameworks. Therefore, this paper presents the automotive architecture views based on an elaborate study of existing automotive architecture description techniques. Furthermore, we propose a method to formalize correspondence rules between architecture views to enforce consistency between architecture views. The approach was implemented in a Java plugin for IBM Rational Rhapsody and evaluated in a case study based on the Adaptive Cruise Control system. The outcome of the evaluation is considered to be a useful approach for formalizing correspondences between different views and a useful tool for automotive architects.

Research paper thumbnail of Security and emotion: sentiment analysis of security discussions on GitHub

Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014

Application security is becoming increasingly prevalent during software and especially web applic... more Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that securityrelated discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere.

Research paper thumbnail of Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub

2014 IEEE International Conference on Software Maintenance and Evolution, 2014

Continuous integration is a software engineering practice of frequently merging all developer wor... more Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day.

Research paper thumbnail of Analysis of Advanced Aggregation Techniques for Software Metrics Bogdan Vasilescu

Abstract A popular approach to assessing software maintainability and predicting its evolution in... more Abstract A popular approach to assessing software maintainability and predicting its evolution involves collecting and analyzing code metrics. However, metrics are usually defined on a micro-level (eg, method, class), and should therefore be aggregated in order ...

Research paper thumbnail of Gender and Tenure Diversity in GitHub Teams

CHI 2015

Software development is usually a collaborative venture. Open Source Software (OSS) projects are ... more Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; in- deed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS?
Using GITHUB, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GITHUB data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.

Research paper thumbnail of Among the Machines

Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16, 2016

Research paper thumbnail of Social aspects of collaboration in online software communities

Additional members of the reading committee:

Research paper thumbnail of Quality and productivity outcomes relating to continuous integration in GitHub

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of Developer onboarding in GitHub: the role of prior social links and language experience

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of By no means

Proceeding of the 2nd international workshop on Emerging trends in software metrics - WETSoM '11, 2011

Fault prediction models usually employ software metrics which were previously shown to be a stron... more Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for defects, e.g., SLOC. However, metrics are usually defined on a microlevel (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, and Hoover indices have been proposed. In this paper we wish to understand whether the aggregation technique influences the presence and strength of the relation between SLOC and defects. Our results indicate that correlation is not strong, and is influenced by the aggregation technique.

Research paper thumbnail of Perceptions of Diversity on Git Hub: A User Survey

2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering, 2015

Research paper thumbnail of Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation

Empirical Software Engineering, 2014

Maintaining a productive and collaborative team of developers is essential to Open Source Softwar... more Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One's online social footprint is arguably easier to ascertain and gather than one's technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers' patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one's likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10%

Research paper thumbnail of Men at work: the StackOverflow case

Research paper thumbnail of Visualizing the Complexity of Software Module Upgrades

Research paper thumbnail of Software developers are humans, too!

Research paper thumbnail of Human aspects, gamification, and social media in collaborative software engineering

Software engineering is inherently a collaborative venture. In open-source software (OSS) develop... more Software engineering is inherently a collaborative venture. In open-source software (OSS) development, such collaborations almost always span geographies and cultures. Because of the decentralised and self-directed nature of OSS as well as the social diversity inherent to OSS communities, the success of an OSS project depends to a large extent on the social aspects of distributed collaboration and achieving coordination over distance. The goal of this dissertation research is to raise our understanding of how human aspects (e.g., gender or cultural diversity), gamification and social media (e.g., participation in social environments such as Stack Overflow or GitHub) impact distributed collaboration in OSS.

Research paper thumbnail of Seeing the Forest for the Trees with New Econometric Aggregation Techniques

Research paper thumbnail of FLOSS 2013: A Survey Dataset about Free Software Contributors: Challenges for Curating, Sharing, and Combining

In this data paper we describe a data set obtained by means of performing an on-line survey to ov... more In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free/Libre/Open Source Software (FLOSS) contributors. The survey includes questions related to personal characteristics (gender, age, civil status, nationality, etc.), education and level of English, professional status, dedication to FLOSS projects, reasons and motivations, involvement and goals. We describe as well the possibilities and challenges of using private information from the survey when linked with other, publicly available data sources. In this regard, an example of data sharing will be presented and legal, ethical and technical issues will be discussed.

Research paper thumbnail of EnTagRec: An enhanced tag recommendation system for software information sites

Software engineers share experiences with modern technologies by means of software information si... more Software engineers share experiences with modern technologies by means of software information sites, such as STACK OVERFLOW. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.

Research paper thumbnail of Lean GHTorrent: GitHub data on demand

Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014

In recent years, GITHUB has become the largest code host in the world, with more than 5M develope... more In recent years, GITHUB has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GITHUB as their host and have migrated their code base to it. GITHUB offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GITHUB data is, to date, largely underexplored. To facilitate studies of GITHUB, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GITHUB REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GITHUB repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GITHUB data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GITHUB studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).

Research paper thumbnail of Formalizing correspondence rules for automotive architecture views

Architecture views have long been used in software industry to systematically model complex syste... more Architecture views have long been used in software industry to systematically model complex systems by representing them from the perspective of related stakeholder concerns. However, consensus has not been reached for the architecture views between automotive architecture description languages and automotive architecture frameworks. Therefore, this paper presents the automotive architecture views based on an elaborate study of existing automotive architecture description techniques. Furthermore, we propose a method to formalize correspondence rules between architecture views to enforce consistency between architecture views. The approach was implemented in a Java plugin for IBM Rational Rhapsody and evaluated in a case study based on the Adaptive Cruise Control system. The outcome of the evaluation is considered to be a useful approach for formalizing correspondences between different views and a useful tool for automotive architects.

Research paper thumbnail of Security and emotion: sentiment analysis of security discussions on GitHub

Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014

Application security is becoming increasingly prevalent during software and especially web applic... more Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that securityrelated discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere.

Research paper thumbnail of Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub

2014 IEEE International Conference on Software Maintenance and Evolution, 2014

Continuous integration is a software engineering practice of frequently merging all developer wor... more Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day.

Research paper thumbnail of Analysis of Advanced Aggregation Techniques for Software Metrics Bogdan Vasilescu

Abstract A popular approach to assessing software maintainability and predicting its evolution in... more Abstract A popular approach to assessing software maintainability and predicting its evolution involves collecting and analyzing code metrics. However, metrics are usually defined on a micro-level (eg, method, class), and should therefore be aggregated in order ...

Research paper thumbnail of Gender and Tenure Diversity in GitHub Teams

CHI 2015

Software development is usually a collaborative venture. Open Source Software (OSS) projects are ... more Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; in- deed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS?
Using GITHUB, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GITHUB data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.