Bogdan Vasilescu | Carnegie Mellon University (original) (raw)
Papers by Bogdan Vasilescu
Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16, 2016
Additional members of the reading committee:
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Proceeding of the 2nd international workshop on Emerging trends in software metrics - WETSoM '11, 2011
Fault prediction models usually employ software metrics which were previously shown to be a stron... more Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for defects, e.g., SLOC. However, metrics are usually defined on a microlevel (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, and Hoover indices have been proposed. In this paper we wish to understand whether the aggregation technique influences the presence and strength of the relation between SLOC and defects. Our results indicate that correlation is not strong, and is influenced by the aggregation technique.
2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering, 2015
Empirical Software Engineering, 2014
Maintaining a productive and collaborative team of developers is essential to Open Source Softwar... more Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One's online social footprint is arguably easier to ascertain and gather than one's technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers' patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one's likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10%
Software engineering is inherently a collaborative venture. In open-source software (OSS) develop... more Software engineering is inherently a collaborative venture. In open-source software (OSS) development, such collaborations almost always span geographies and cultures. Because of the decentralised and self-directed nature of OSS as well as the social diversity inherent to OSS communities, the success of an OSS project depends to a large extent on the social aspects of distributed collaboration and achieving coordination over distance. The goal of this dissertation research is to raise our understanding of how human aspects (e.g., gender or cultural diversity), gamification and social media (e.g., participation in social environments such as Stack Overflow or GitHub) impact distributed collaboration in OSS.
In this data paper we describe a data set obtained by means of performing an on-line survey to ov... more In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free/Libre/Open Source Software (FLOSS) contributors. The survey includes questions related to personal characteristics (gender, age, civil status, nationality, etc.), education and level of English, professional status, dedication to FLOSS projects, reasons and motivations, involvement and goals. We describe as well the possibilities and challenges of using private information from the survey when linked with other, publicly available data sources. In this regard, an example of data sharing will be presented and legal, ethical and technical issues will be discussed.
Software engineers share experiences with modern technologies by means of software information si... more Software engineers share experiences with modern technologies by means of software information sites, such as STACK OVERFLOW. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.
Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014
In recent years, GITHUB has become the largest code host in the world, with more than 5M develope... more In recent years, GITHUB has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GITHUB as their host and have migrated their code base to it. GITHUB offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GITHUB data is, to date, largely underexplored. To facilitate studies of GITHUB, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GITHUB REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GITHUB repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GITHUB data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GITHUB studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).
Architecture views have long been used in software industry to systematically model complex syste... more Architecture views have long been used in software industry to systematically model complex systems by representing them from the perspective of related stakeholder concerns. However, consensus has not been reached for the architecture views between automotive architecture description languages and automotive architecture frameworks. Therefore, this paper presents the automotive architecture views based on an elaborate study of existing automotive architecture description techniques. Furthermore, we propose a method to formalize correspondence rules between architecture views to enforce consistency between architecture views. The approach was implemented in a Java plugin for IBM Rational Rhapsody and evaluated in a case study based on the Adaptive Cruise Control system. The outcome of the evaluation is considered to be a useful approach for formalizing correspondences between different views and a useful tool for automotive architects.
Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014
Application security is becoming increasingly prevalent during software and especially web applic... more Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that securityrelated discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere.
2014 IEEE International Conference on Software Maintenance and Evolution, 2014
Continuous integration is a software engineering practice of frequently merging all developer wor... more Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day.
Abstract A popular approach to assessing software maintainability and predicting its evolution in... more Abstract A popular approach to assessing software maintainability and predicting its evolution involves collecting and analyzing code metrics. However, metrics are usually defined on a micro-level (eg, method, class), and should therefore be aggregated in order ...
CHI 2015
Software development is usually a collaborative venture. Open Source Software (OSS) projects are ... more Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; in- deed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS?
Using GITHUB, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GITHUB data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.
Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16, 2016
Additional members of the reading committee:
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Proceeding of the 2nd international workshop on Emerging trends in software metrics - WETSoM '11, 2011
Fault prediction models usually employ software metrics which were previously shown to be a stron... more Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for defects, e.g., SLOC. However, metrics are usually defined on a microlevel (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, and Hoover indices have been proposed. In this paper we wish to understand whether the aggregation technique influences the presence and strength of the relation between SLOC and defects. Our results indicate that correlation is not strong, and is influenced by the aggregation technique.
2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering, 2015
Empirical Software Engineering, 2014
Maintaining a productive and collaborative team of developers is essential to Open Source Softwar... more Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One's online social footprint is arguably easier to ascertain and gather than one's technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers' patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one's likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10%
Software engineering is inherently a collaborative venture. In open-source software (OSS) develop... more Software engineering is inherently a collaborative venture. In open-source software (OSS) development, such collaborations almost always span geographies and cultures. Because of the decentralised and self-directed nature of OSS as well as the social diversity inherent to OSS communities, the success of an OSS project depends to a large extent on the social aspects of distributed collaboration and achieving coordination over distance. The goal of this dissertation research is to raise our understanding of how human aspects (e.g., gender or cultural diversity), gamification and social media (e.g., participation in social environments such as Stack Overflow or GitHub) impact distributed collaboration in OSS.
In this data paper we describe a data set obtained by means of performing an on-line survey to ov... more In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free/Libre/Open Source Software (FLOSS) contributors. The survey includes questions related to personal characteristics (gender, age, civil status, nationality, etc.), education and level of English, professional status, dedication to FLOSS projects, reasons and motivations, involvement and goals. We describe as well the possibilities and challenges of using private information from the survey when linked with other, publicly available data sources. In this regard, an example of data sharing will be presented and legal, ethical and technical issues will be discussed.
Software engineers share experiences with modern technologies by means of software information si... more Software engineers share experiences with modern technologies by means of software information sites, such as STACK OVERFLOW. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.
Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014
In recent years, GITHUB has become the largest code host in the world, with more than 5M develope... more In recent years, GITHUB has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GITHUB as their host and have migrated their code base to it. GITHUB offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GITHUB data is, to date, largely underexplored. To facilitate studies of GITHUB, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GITHUB REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GITHUB repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GITHUB data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GITHUB studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).
Architecture views have long been used in software industry to systematically model complex syste... more Architecture views have long been used in software industry to systematically model complex systems by representing them from the perspective of related stakeholder concerns. However, consensus has not been reached for the architecture views between automotive architecture description languages and automotive architecture frameworks. Therefore, this paper presents the automotive architecture views based on an elaborate study of existing automotive architecture description techniques. Furthermore, we propose a method to formalize correspondence rules between architecture views to enforce consistency between architecture views. The approach was implemented in a Java plugin for IBM Rational Rhapsody and evaluated in a case study based on the Adaptive Cruise Control system. The outcome of the evaluation is considered to be a useful approach for formalizing correspondences between different views and a useful tool for automotive architects.
Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 2014
Application security is becoming increasingly prevalent during software and especially web applic... more Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that securityrelated discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere.
2014 IEEE International Conference on Software Maintenance and Evolution, 2014
Continuous integration is a software engineering practice of frequently merging all developer wor... more Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day.
Abstract A popular approach to assessing software maintainability and predicting its evolution in... more Abstract A popular approach to assessing software maintainability and predicting its evolution involves collecting and analyzing code metrics. However, metrics are usually defined on a micro-level (eg, method, class), and should therefore be aggregated in order ...
CHI 2015
Software development is usually a collaborative venture. Open Source Software (OSS) projects are ... more Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; in- deed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS?
Using GITHUB, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GITHUB data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.