Ee-Peng LIM | Singapore Management University (original) (raw)
Papers by Ee-Peng LIM
Data & Knowledge Engineering, Sep 1, 2005
Google, Inc. (search). ...
There are many software projects started daily; some are successful, while others are not. Succes... more There are many software projects started daily; some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among them. We first form a collaboration graph for each software project. We then create a training set consisting of two graph databases corresponding to successful and failed projects respectively. A new data mining approach is then employed to extract discriminative rich patterns that appear frequently on the successful projects but rarely on the failed projects. We find that these automatically mined patterns are effective features to predict project outcomes. We experiment our solution on projects in SourceForge.Net, the largest open source software development portal, and show that under 10 fold cross validation, our approach could achieve an accuracy of more than 90% and an AUC score of 0.86. We also present and analyze some mined socio-technical patterns.
7th International Conference on Parallel and Distributed Systems (ICPADS'00)
Abstract The recent boom of weblogs and social media has attached increasing importance to the id... more Abstract The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection.
Abstract Dynamic Web service selection refers to determining a subset of component Web services t... more Abstract Dynamic Web service selection refers to determining a subset of component Web services to be invoked so as to orchestrate a composite Web service. Previous work in Web service selection usually assumes the invocations of Web service operations to be independent of one another. This assumption however does not hold in practice as both the composite and component Web services often impose some orderings on the invocation of their operations.
Correction and analysis. Ee-Peng Lim International Conference on Digital Libraries: Proceedings o... more Correction and analysis. Ee-Peng Lim International Conference on Digital Libraries: Proceedings of the 3 rd ACM/IEEE-CS joint conference on Digital libraries 2003, 2003. Abstract not available. 80 Computer Applications(General)(CI).
Elastic-perfectly plastic solids (or structures) subjected to loads quasi-statically varying with... more Elastic-perfectly plastic solids (or structures) subjected to loads quasi-statically varying within a specified domain are addressed in the framework of large displacements and the additive strain decomposition rule.
Abstract: In Service-Oriented Computing (SOC) environments, service clients interact with service... more Abstract: In Service-Oriented Computing (SOC) environments, service clients interact with service providers for consuming services. From the viewpoint of service clients, the trust level of a service or a service provider is a critical factor to consider in service selection, particularly when a client is looking for a service from a large set of services or service providers. However, a invoked service may be composed of other services.
DATA AND KNOWEDGE ENGINEERIN. Volume 54, Issue 3. pp. 277-393 (September 2005. Fifth ACM Internat... more DATA AND KNOWEDGE ENGINEERIN. Volume 54, Issue 3. pp. 277-393 (September 2005. Fifth ACM International Workshop on Web Information and Data Management (WIDM 2003). Pages 277-278. Roger HL Chiang, Alberto HF Laender and Ee-Peng Lim. Special papers. Clustering Web pages based on their structure. Pages 279-299. Valter Crescenzi, Paolo Merialdo and Paolo Missier. Clustering documents into a web directory for bootstrapping a supervised classification. Pages 301-325.
G-Portal [1] is a Web-based digital library that collects metadataof geospatial and georeferenced... more G-Portal [1] is a Web-based digital library that collects metadataof geospatial and georeferenced resources on the Weband provides digital library services to access them. It adoptsa map-based interface as its primary point of access to visualizeand manipulate the distributed geospatial and georeferencedcontent. A classification-based interface is alsoprovided to classify and visualize all resources. This interfaceis supported by a flexible classification language andthe backend classification engine.
Abstract Traditionally, data integration research has focused primarily on understanding integrat... more Abstract Traditionally, data integration research has focused primarily on understanding integration issues from the data instance and schema perspectives. However, when the integration of heterogeneous databases is performed without considering the semantics of local databases, an incorrectly integrated database may result. Moreover, most integration tasks must be performed manually.
Abstract Currently, fixed-price sale and online auction are two major sale modes in the applied e... more Abstract Currently, fixed-price sale and online auction are two major sale modes in the applied electronic commerce systems. Bilateral negotiation does not yet have a satisfying performance in the Internet-based transactions. In this paper, the time-independent feature of online negotiations is emphasized. Correspondingly, a formal mathematical model of online negotiation is established. We also present a flexible and feasible bilateral negotiation protocol which is used in an agent-based cooperative negotiation.
Data & Knowledge Engineering, Sep 1, 2005
Google, Inc. (search). ...
There are many software projects started daily; some are successful, while others are not. Succes... more There are many software projects started daily; some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among them. We first form a collaboration graph for each software project. We then create a training set consisting of two graph databases corresponding to successful and failed projects respectively. A new data mining approach is then employed to extract discriminative rich patterns that appear frequently on the successful projects but rarely on the failed projects. We find that these automatically mined patterns are effective features to predict project outcomes. We experiment our solution on projects in SourceForge.Net, the largest open source software development portal, and show that under 10 fold cross validation, our approach could achieve an accuracy of more than 90% and an AUC score of 0.86. We also present and analyze some mined socio-technical patterns.
7th International Conference on Parallel and Distributed Systems (ICPADS'00)
Abstract The recent boom of weblogs and social media has attached increasing importance to the id... more Abstract The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection.
Abstract Dynamic Web service selection refers to determining a subset of component Web services t... more Abstract Dynamic Web service selection refers to determining a subset of component Web services to be invoked so as to orchestrate a composite Web service. Previous work in Web service selection usually assumes the invocations of Web service operations to be independent of one another. This assumption however does not hold in practice as both the composite and component Web services often impose some orderings on the invocation of their operations.
Correction and analysis. Ee-Peng Lim International Conference on Digital Libraries: Proceedings o... more Correction and analysis. Ee-Peng Lim International Conference on Digital Libraries: Proceedings of the 3 rd ACM/IEEE-CS joint conference on Digital libraries 2003, 2003. Abstract not available. 80 Computer Applications(General)(CI).
Elastic-perfectly plastic solids (or structures) subjected to loads quasi-statically varying with... more Elastic-perfectly plastic solids (or structures) subjected to loads quasi-statically varying within a specified domain are addressed in the framework of large displacements and the additive strain decomposition rule.
Abstract: In Service-Oriented Computing (SOC) environments, service clients interact with service... more Abstract: In Service-Oriented Computing (SOC) environments, service clients interact with service providers for consuming services. From the viewpoint of service clients, the trust level of a service or a service provider is a critical factor to consider in service selection, particularly when a client is looking for a service from a large set of services or service providers. However, a invoked service may be composed of other services.
DATA AND KNOWEDGE ENGINEERIN. Volume 54, Issue 3. pp. 277-393 (September 2005. Fifth ACM Internat... more DATA AND KNOWEDGE ENGINEERIN. Volume 54, Issue 3. pp. 277-393 (September 2005. Fifth ACM International Workshop on Web Information and Data Management (WIDM 2003). Pages 277-278. Roger HL Chiang, Alberto HF Laender and Ee-Peng Lim. Special papers. Clustering Web pages based on their structure. Pages 279-299. Valter Crescenzi, Paolo Merialdo and Paolo Missier. Clustering documents into a web directory for bootstrapping a supervised classification. Pages 301-325.
G-Portal [1] is a Web-based digital library that collects metadataof geospatial and georeferenced... more G-Portal [1] is a Web-based digital library that collects metadataof geospatial and georeferenced resources on the Weband provides digital library services to access them. It adoptsa map-based interface as its primary point of access to visualizeand manipulate the distributed geospatial and georeferencedcontent. A classification-based interface is alsoprovided to classify and visualize all resources. This interfaceis supported by a flexible classification language andthe backend classification engine.
Abstract Traditionally, data integration research has focused primarily on understanding integrat... more Abstract Traditionally, data integration research has focused primarily on understanding integration issues from the data instance and schema perspectives. However, when the integration of heterogeneous databases is performed without considering the semantics of local databases, an incorrectly integrated database may result. Moreover, most integration tasks must be performed manually.
Abstract Currently, fixed-price sale and online auction are two major sale modes in the applied e... more Abstract Currently, fixed-price sale and online auction are two major sale modes in the applied electronic commerce systems. Bilateral negotiation does not yet have a satisfying performance in the Internet-based transactions. In this paper, the time-independent feature of online negotiations is emphasized. Correspondingly, a formal mathematical model of online negotiation is established. We also present a flexible and feasible bilateral negotiation protocol which is used in an agent-based cooperative negotiation.