Hoan Nguyễn - Academia.edu (original) (raw)
Papers by Hoan Nguyễn
Tạp chí Y học Việt Nam
Tổng quan và mục tiêu: U tế bào mầm là nhóm bệnh ác tính có nguồn gốc từ các tế bào sinh dục tron... more Tổng quan và mục tiêu: U tế bào mầm là nhóm bệnh ác tính có nguồn gốc từ các tế bào sinh dục trong quá trình phát triển và di chuyển. U có thể xuất phát từ đường sinh dục như tại tinh hoàn, buồng trứng, hay ngoài sinh dục như u nội sọ, u trung thất, cùng cụt, tử cung, âm đạo và chiếm 3,5% các loại ung thư trẻ em dưới 15 tuổi. Các biện pháp điều trị u tế bào mầm ác tính bao gồm phẫu thuật, hoá trị và xạ trị trong đó xạ trị ngày càng ít được sử dụng vì những hậu quả lâu dài do tia xạ gây ra trên trẻ em. Nghiên cứu này với mục tiêu mô tả tiên lượng sau ít nhất 2 năm điều trị của u tế bào mầm ác tính ngoài sọ và các yếu tố liên quan. Phương pháp nghiên cứu: Nghiên cứu tiến cứu mô tả hàng loạt ca ở trẻ Tất cả những bệnh nhi được chẩn đoán u tế bào mầm ác tính ngoài sọ tại Khoa Ung Bướu Huyết Học, Bệnh viện Nhi đồng 2 từ 01/01/2011 đến 31/07/2019. Số liệu được nhập liệu bằng phần mềm REDCap và phân tích bằng phần mềm SPSS 20.0. Đánh giá hiệu quả điều trị qua EFS và OS: sử dụng phương pháp...
arXiv (Cornell University), Jun 27, 2019
Modern software systems are increasingly including machine learning (ML) as an integral component... more Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To that end, this work reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikit-learn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. We classify these questions into seven typical stages of an ML pipeline to understand the correlation between the library and the stage. Then we study the questions and perform statistical analysis to explore the answer to four research objectives (finding the most difficult stage, understanding the nature of problems, nature of libraries and studying whether the difficulties stayed consistent over time). Our findings reveal the urgent need for software engineering (SE) research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. While there has been some early research on debugging, much more work is needed. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.
Sự hiện diện của nguồn phân tán DG (Distributed Generator) đã gây ra những thách thức đến việc du... more Sự hiện diện của nguồn phân tán DG (Distributed Generator) đã gây ra những thách thức đến việc duy trì độ tin cậy của những OCPR quá dòng OCPR (Over-current Protection Relay) khi hoạt động để bảo vệ lưới điện phân phối (LĐPP). Trong quá trình vận hành để đảm bảo cung cấp điện cho LĐPP, những đặc tính vận hành của nguồn DG đã làm thay đổi đáng kể giá trị dòng điện sự cố và đây là nguyên nhân chính dẫn đến những hiện tượng OCPR hoạt động không tin cậy, chẳng hạn như mất tính chọn lọc, giảm độ nhạy, hoạt động vượt cấp hoặc hoạt động đồng thời. Do đó, việc điều phối những OCPR thuộc hệ thống bảo vệ trên LĐPP có xem xét đến những đặc tính vận hành của nguồn DG nhằm đảm bảo tính phối hợp khi hoạt động là cần thiết. Trong nghiên cứu này, nhóm tác giả sẽ giới thiệu về một phương pháp điều phối bảo vệ OCPCO (Over-current Protection Coordination Optimization) dành cho hệ thống bảo vệ của LĐPP có tích hợp nguồn DG. Cụ thể, phương pháp OCPCO này được phát triển dựa vào việc sử dụng kết quả phân...
2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
Proceedings of the 40th International Conference on Software Engineering, 2018
Software developers often make use of the online forums such as StackOverflow (SO) to learn how t... more Software developers often make use of the online forums such as StackOverflow (SO) to learn how to use software libraries and their APIs. However, the code snippets in such a forum often contain undeclared, ambiguous, or largely unqualified external references. Such declaration ambiguity and external reference ambiguity present challenges for developers in learning to correctly use the APIs. In this paper, we propose STATTYPE, a statistical approach to resolve the fully qualified names (FQNs) for the API elements in such code snippets. Unlike existing approaches that are based on heuristics, STATTYPE has two well-integrated factors. We first learn from a large training code corpus the FQNs that often co-occur. Then, to derive the FQN for an API name in a code snippet, we use that knowledge and also leverage the context consisting of neighboring API names. To realize those factors, we treat the problem as statistical machine translation from source code with partially qualified names to source code with FQNs of the APIs. Our empirical evaluation on real-world code and StackOverflow posts shows that STATTYPE achieves very high accuracy with 97.6% precision and 96.7% recall, which is 16.5% relatively higher than the state-of-the-art approach. CCS CONCEPTS • Software and its engineering Software libraries and repositories; API languages;
2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER), 2017
Science & Technology Development Journal - Engineering and Technology, 2020
The penetration of distributed generators (DG ) into the distribution etworks (DN) greatly improv... more The penetration of distributed generators (DG ) into the distribution etworks (DN) greatly improves the reliability of electricity supply and reduces power loss. However, the operation of these DGs can also make the protection of distribution etwork more complex. This paper will examine the effects of two DG types on the protection of DN by analyzing the solution called Fault Location, Isolation and Service Restoration (FLISR). The FLISR approach considers DGs as auxiliary sources for the post-fault restoration plans in order to minimize the number of interrupted customers and unserved energy. Moreover, the combination of setting value of the overcurrent relay and the statuses of switching device and the loss voltage warning signal are used to detect and identify types of incidents in the distribution etwork with DGs. A two-constrained objective function will be solved to find possible plans for fault isolation and service restoration. There are six performance indices (PIs) selecte...
Science & Technology Development Journal - Engineering and Technology, 2020
Short-term load forecasting has an extremely important role in the design, operation and planning... more Short-term load forecasting has an extremely important role in the design, operation and planning of power system, especially on a power grid of Ho Chi Minh City (HCMC) - an active city has the highest power demand in Vietnam. Through the data survey, the load power in the HCMC area changes suddenly so that it causes disturbances in the load data. Accordingly, the reliability assessment of the load data will be essential in the processing stage of data-filtering before implementing load forecasting models. This study introduces a novel statistical data-filtering method that takes into account the reliability of the input-data source by analyzing many different confidence levels. Results of the proposed data-filtering method will be compared to previous data -iltering methods (such as Kalman, DBSCAN, Wavelet Transform and SSA filtering methods). The data source used in this study was collected from more than 50 substations uisng the SCADA system in Ho Chi Minh City's distribution...
MRS Proceedings, 1997
Improvements in the properties of Parylene may enable their use in high performance integrated ci... more Improvements in the properties of Parylene may enable their use in high performance integrated circuits. Parylenes are a class of polymers formed by chemical vapor deposition which nearly meet the high standards of the low-k triumvirate, namely, 1) adhesion, particularly to SiO2, 2) thermal stability above 400 Celsius, and 3) permittivity less than 2.7. Parylene-N has been incorporated into both aluminum-1 and copper-2 based metallization schemes, however, improvements in the adhesion and thermal stability are still needed to simplify and increase the robustness of the integration schemes. Additionally, a reduction in the permittivity would be beneficial from both device performance and extendibility points-of-view. We have synthesized various Parylene-N-based copolymers with improved adhesion, thermal stability, and permittivity. We discovered that a copolymer of tetravinyl-tetramethyl-cyclotetrasiloxane and Parylene-N has a permittivity of close to 2.1 and both the adhesion to SiQ...
Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity, 2013
Mining source code has become a common task for researchers and yielded significant benefits for ... more Mining source code has become a common task for researchers and yielded significant benefits for the software engineering community. Mining source code however is a very difficult and time consuming task. The Boa language and infrastructure was designed to ease mining of project and revision metadata. Recently Boa was extended to support mining source code and currently contains source code for over 23k Java projects, including full revision histories. In this demonstration we pose source code mining tasks and give solutions using Boa. We then execute these programs via our web-based infrastructure and show how to easily make the results available for future researchers.
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity, 2012
Analyzing the wealth of information contained in software repositories requires significant exper... more Analyzing the wealth of information contained in software repositories requires significant expertise in mining techniques as well as a large infrastructure. In order to make this information more reachable for non-experts, we present the Boa language and infrastructure. Using Boa, these mining tasks are much simpler to write as the details are abstracted away. Boa programs also run on a distributed cluster to automatically provide massive parallelization to users and return results in minutes instead of potentially days.
Lecture Notes in Computer Science, 2010
Existing version control systems are often based on text line-oriented models for change represen... more Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation-based version control model that is able to support both coarse and fine levels of granularity in program source code. In OperV, a software system is represented by a project tree whose nodes represent all program entities, such as packages, classes, methods, etc. The changes of the system are represented via edit operations on the tree. OperV also provides the algorithms to differ, store, and retrieve the versions of such entities. These algorithms are based on the mapping of the nodes between versions of the project tree. This mapping technique uses 1) divide-and-conquer technique to map coarse-and fine-grained entities separately, 2) unchanged text regions to map unchanged leaf nodes, and 3) structure-based similarity of the sub-trees to map their root nodes bottom-up and then topdown. The empirical evaluation of OperV has shown that it is scalable, efficient, and could be useful in understanding program evolution.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, 2010
Abstract Previous research confirms the existence of recurring bug fixes in software systems. Ana... more Abstract Previous research confirms the existence of recurring bug fixes in software systems. Analyzing such fixes manually, we found that a large percentage of them occurs in code peers, the classes/methods having the similar roles in the systems, such as providing similar functions and/or participating in similar object interactions. Based on graph-based representation of object usages, we have developed several techniques to identify code peers, recognize recurring bug fixes, and recommend changes for code units from the bug ...
2013 35th International Conference on Software Engineering (ICSE), 2013
ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications.... more ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications. However, as a dynamic language, PHP may induce certain programming errors that reveal themselves only at run time. A common type of error is dangling references, which occur if the referred program entities have not been declared in the current program execution. To prevent the run-time errors caused by such dangling references, we introduce Dangling Reference Checker (DRC), a novel tool to statically detect those references in the source code of PHP-based Web applications. DRC first identifies the path constraints of the program executions in which a program entity appears and then matches the path constraints of the entity's declarations and references to detect dangling ones. DRC is able to detect dangling reference errors in several real-world PHP systems with high accuracy. The video demonstration for DRC is available at http://www.youtube.com/watch?v=3Dy_AKZYhLlU4.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, 2010
Abstract New software security vulnerabilities are discovered on almost daily basis and it is vit... more Abstract New software security vulnerabilities are discovered on almost daily basis and it is vital to be able to identify and resolve them as early as possible. Fortunately, many software vulnerabilities are recurring or very similar, thus, one could effectively detect and fix a vulnerability in a system by consulting the similar vulnerabilities and fixes from other systems. In this paper, we propose, SecureSync, an automatic approach to detect and provide suggested resolutions for recurring software vulnerabilities on multiple systems ...
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
The links between the bug reports in an issue-tracking system and the corresponding fixing change... more The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar. This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on realworld projects shows that MLink can improve the state-ofthe-art bug-to-fix link recovery methods by 11-18%, 13-17%, and 8-17% in F-score, recall, and precision, respectively.
ACM SIGPLAN Notices, 2010
Reusing existing library components is essential for reducing the cost of software development an... more Reusing existing library components is essential for reducing the cost of software development and maintenance. When library components evolve to accommodate new feature requests, to fix bugs, or to meet new standards, the clients of software libraries often need to make corresponding changes to correctly use the updated libraries. Existing API usage adaptation techniques support simple adaptation such as replacing the target of calls to a deprecated API, however, cannot handle complex adaptations such as creating a new object to be passed to a different API method, or adding an exception handling logic that surrounds the updated API method calls. This paper presents LIBSYNC that guides developers in adapting API usage code by learning complex API usage adaptation patterns from other clients that already migrated to a new library version (and also from the API usages within the library's test code). LIBSYNC uses several graph-based techniques (1) to identify changes to API decla...
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 2012
Tạp chí Y học Việt Nam
Tổng quan và mục tiêu: U tế bào mầm là nhóm bệnh ác tính có nguồn gốc từ các tế bào sinh dục tron... more Tổng quan và mục tiêu: U tế bào mầm là nhóm bệnh ác tính có nguồn gốc từ các tế bào sinh dục trong quá trình phát triển và di chuyển. U có thể xuất phát từ đường sinh dục như tại tinh hoàn, buồng trứng, hay ngoài sinh dục như u nội sọ, u trung thất, cùng cụt, tử cung, âm đạo và chiếm 3,5% các loại ung thư trẻ em dưới 15 tuổi. Các biện pháp điều trị u tế bào mầm ác tính bao gồm phẫu thuật, hoá trị và xạ trị trong đó xạ trị ngày càng ít được sử dụng vì những hậu quả lâu dài do tia xạ gây ra trên trẻ em. Nghiên cứu này với mục tiêu mô tả tiên lượng sau ít nhất 2 năm điều trị của u tế bào mầm ác tính ngoài sọ và các yếu tố liên quan. Phương pháp nghiên cứu: Nghiên cứu tiến cứu mô tả hàng loạt ca ở trẻ Tất cả những bệnh nhi được chẩn đoán u tế bào mầm ác tính ngoài sọ tại Khoa Ung Bướu Huyết Học, Bệnh viện Nhi đồng 2 từ 01/01/2011 đến 31/07/2019. Số liệu được nhập liệu bằng phần mềm REDCap và phân tích bằng phần mềm SPSS 20.0. Đánh giá hiệu quả điều trị qua EFS và OS: sử dụng phương pháp...
arXiv (Cornell University), Jun 27, 2019
Modern software systems are increasingly including machine learning (ML) as an integral component... more Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To that end, this work reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikit-learn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. We classify these questions into seven typical stages of an ML pipeline to understand the correlation between the library and the stage. Then we study the questions and perform statistical analysis to explore the answer to four research objectives (finding the most difficult stage, understanding the nature of problems, nature of libraries and studying whether the difficulties stayed consistent over time). Our findings reveal the urgent need for software engineering (SE) research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. While there has been some early research on debugging, much more work is needed. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.
Sự hiện diện của nguồn phân tán DG (Distributed Generator) đã gây ra những thách thức đến việc du... more Sự hiện diện của nguồn phân tán DG (Distributed Generator) đã gây ra những thách thức đến việc duy trì độ tin cậy của những OCPR quá dòng OCPR (Over-current Protection Relay) khi hoạt động để bảo vệ lưới điện phân phối (LĐPP). Trong quá trình vận hành để đảm bảo cung cấp điện cho LĐPP, những đặc tính vận hành của nguồn DG đã làm thay đổi đáng kể giá trị dòng điện sự cố và đây là nguyên nhân chính dẫn đến những hiện tượng OCPR hoạt động không tin cậy, chẳng hạn như mất tính chọn lọc, giảm độ nhạy, hoạt động vượt cấp hoặc hoạt động đồng thời. Do đó, việc điều phối những OCPR thuộc hệ thống bảo vệ trên LĐPP có xem xét đến những đặc tính vận hành của nguồn DG nhằm đảm bảo tính phối hợp khi hoạt động là cần thiết. Trong nghiên cứu này, nhóm tác giả sẽ giới thiệu về một phương pháp điều phối bảo vệ OCPCO (Over-current Protection Coordination Optimization) dành cho hệ thống bảo vệ của LĐPP có tích hợp nguồn DG. Cụ thể, phương pháp OCPCO này được phát triển dựa vào việc sử dụng kết quả phân...
2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
Proceedings of the 40th International Conference on Software Engineering, 2018
Software developers often make use of the online forums such as StackOverflow (SO) to learn how t... more Software developers often make use of the online forums such as StackOverflow (SO) to learn how to use software libraries and their APIs. However, the code snippets in such a forum often contain undeclared, ambiguous, or largely unqualified external references. Such declaration ambiguity and external reference ambiguity present challenges for developers in learning to correctly use the APIs. In this paper, we propose STATTYPE, a statistical approach to resolve the fully qualified names (FQNs) for the API elements in such code snippets. Unlike existing approaches that are based on heuristics, STATTYPE has two well-integrated factors. We first learn from a large training code corpus the FQNs that often co-occur. Then, to derive the FQN for an API name in a code snippet, we use that knowledge and also leverage the context consisting of neighboring API names. To realize those factors, we treat the problem as statistical machine translation from source code with partially qualified names to source code with FQNs of the APIs. Our empirical evaluation on real-world code and StackOverflow posts shows that STATTYPE achieves very high accuracy with 97.6% precision and 96.7% recall, which is 16.5% relatively higher than the state-of-the-art approach. CCS CONCEPTS • Software and its engineering Software libraries and repositories; API languages;
2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER), 2017
Science & Technology Development Journal - Engineering and Technology, 2020
The penetration of distributed generators (DG ) into the distribution etworks (DN) greatly improv... more The penetration of distributed generators (DG ) into the distribution etworks (DN) greatly improves the reliability of electricity supply and reduces power loss. However, the operation of these DGs can also make the protection of distribution etwork more complex. This paper will examine the effects of two DG types on the protection of DN by analyzing the solution called Fault Location, Isolation and Service Restoration (FLISR). The FLISR approach considers DGs as auxiliary sources for the post-fault restoration plans in order to minimize the number of interrupted customers and unserved energy. Moreover, the combination of setting value of the overcurrent relay and the statuses of switching device and the loss voltage warning signal are used to detect and identify types of incidents in the distribution etwork with DGs. A two-constrained objective function will be solved to find possible plans for fault isolation and service restoration. There are six performance indices (PIs) selecte...
Science & Technology Development Journal - Engineering and Technology, 2020
Short-term load forecasting has an extremely important role in the design, operation and planning... more Short-term load forecasting has an extremely important role in the design, operation and planning of power system, especially on a power grid of Ho Chi Minh City (HCMC) - an active city has the highest power demand in Vietnam. Through the data survey, the load power in the HCMC area changes suddenly so that it causes disturbances in the load data. Accordingly, the reliability assessment of the load data will be essential in the processing stage of data-filtering before implementing load forecasting models. This study introduces a novel statistical data-filtering method that takes into account the reliability of the input-data source by analyzing many different confidence levels. Results of the proposed data-filtering method will be compared to previous data -iltering methods (such as Kalman, DBSCAN, Wavelet Transform and SSA filtering methods). The data source used in this study was collected from more than 50 substations uisng the SCADA system in Ho Chi Minh City's distribution...
MRS Proceedings, 1997
Improvements in the properties of Parylene may enable their use in high performance integrated ci... more Improvements in the properties of Parylene may enable their use in high performance integrated circuits. Parylenes are a class of polymers formed by chemical vapor deposition which nearly meet the high standards of the low-k triumvirate, namely, 1) adhesion, particularly to SiO2, 2) thermal stability above 400 Celsius, and 3) permittivity less than 2.7. Parylene-N has been incorporated into both aluminum-1 and copper-2 based metallization schemes, however, improvements in the adhesion and thermal stability are still needed to simplify and increase the robustness of the integration schemes. Additionally, a reduction in the permittivity would be beneficial from both device performance and extendibility points-of-view. We have synthesized various Parylene-N-based copolymers with improved adhesion, thermal stability, and permittivity. We discovered that a copolymer of tetravinyl-tetramethyl-cyclotetrasiloxane and Parylene-N has a permittivity of close to 2.1 and both the adhesion to SiQ...
Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity, 2013
Mining source code has become a common task for researchers and yielded significant benefits for ... more Mining source code has become a common task for researchers and yielded significant benefits for the software engineering community. Mining source code however is a very difficult and time consuming task. The Boa language and infrastructure was designed to ease mining of project and revision metadata. Recently Boa was extended to support mining source code and currently contains source code for over 23k Java projects, including full revision histories. In this demonstration we pose source code mining tasks and give solutions using Boa. We then execute these programs via our web-based infrastructure and show how to easily make the results available for future researchers.
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity, 2012
Analyzing the wealth of information contained in software repositories requires significant exper... more Analyzing the wealth of information contained in software repositories requires significant expertise in mining techniques as well as a large infrastructure. In order to make this information more reachable for non-experts, we present the Boa language and infrastructure. Using Boa, these mining tasks are much simpler to write as the details are abstracted away. Boa programs also run on a distributed cluster to automatically provide massive parallelization to users and return results in minutes instead of potentially days.
Lecture Notes in Computer Science, 2010
Existing version control systems are often based on text line-oriented models for change represen... more Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation-based version control model that is able to support both coarse and fine levels of granularity in program source code. In OperV, a software system is represented by a project tree whose nodes represent all program entities, such as packages, classes, methods, etc. The changes of the system are represented via edit operations on the tree. OperV also provides the algorithms to differ, store, and retrieve the versions of such entities. These algorithms are based on the mapping of the nodes between versions of the project tree. This mapping technique uses 1) divide-and-conquer technique to map coarse-and fine-grained entities separately, 2) unchanged text regions to map unchanged leaf nodes, and 3) structure-based similarity of the sub-trees to map their root nodes bottom-up and then topdown. The empirical evaluation of OperV has shown that it is scalable, efficient, and could be useful in understanding program evolution.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, 2010
Abstract Previous research confirms the existence of recurring bug fixes in software systems. Ana... more Abstract Previous research confirms the existence of recurring bug fixes in software systems. Analyzing such fixes manually, we found that a large percentage of them occurs in code peers, the classes/methods having the similar roles in the systems, such as providing similar functions and/or participating in similar object interactions. Based on graph-based representation of object usages, we have developed several techniques to identify code peers, recognize recurring bug fixes, and recommend changes for code units from the bug ...
2013 35th International Conference on Software Engineering (ICSE), 2013
ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications.... more ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications. However, as a dynamic language, PHP may induce certain programming errors that reveal themselves only at run time. A common type of error is dangling references, which occur if the referred program entities have not been declared in the current program execution. To prevent the run-time errors caused by such dangling references, we introduce Dangling Reference Checker (DRC), a novel tool to statically detect those references in the source code of PHP-based Web applications. DRC first identifies the path constraints of the program executions in which a program entity appears and then matches the path constraints of the entity's declarations and references to detect dangling ones. DRC is able to detect dangling reference errors in several real-world PHP systems with high accuracy. The video demonstration for DRC is available at http://www.youtube.com/watch?v=3Dy_AKZYhLlU4.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, 2010
Abstract New software security vulnerabilities are discovered on almost daily basis and it is vit... more Abstract New software security vulnerabilities are discovered on almost daily basis and it is vital to be able to identify and resolve them as early as possible. Fortunately, many software vulnerabilities are recurring or very similar, thus, one could effectively detect and fix a vulnerability in a system by consulting the similar vulnerabilities and fixes from other systems. In this paper, we propose, SecureSync, an automatic approach to detect and provide suggested resolutions for recurring software vulnerabilities on multiple systems ...
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
The links between the bug reports in an issue-tracking system and the corresponding fixing change... more The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar. This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on realworld projects shows that MLink can improve the state-ofthe-art bug-to-fix link recovery methods by 11-18%, 13-17%, and 8-17% in F-score, recall, and precision, respectively.
ACM SIGPLAN Notices, 2010
Reusing existing library components is essential for reducing the cost of software development an... more Reusing existing library components is essential for reducing the cost of software development and maintenance. When library components evolve to accommodate new feature requests, to fix bugs, or to meet new standards, the clients of software libraries often need to make corresponding changes to correctly use the updated libraries. Existing API usage adaptation techniques support simple adaptation such as replacing the target of calls to a deprecated API, however, cannot handle complex adaptations such as creating a new object to be passed to a different API method, or adding an exception handling logic that surrounds the updated API method calls. This paper presents LIBSYNC that guides developers in adapting API usage code by learning complex API usage adaptation patterns from other clients that already migrated to a new library version (and also from the API usages within the library's test code). LIBSYNC uses several graph-based techniques (1) to identify changes to API decla...
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 2012