Xusheng Xiao | Arizona State University (original) (raw)

Papers by Xusheng Xiao

arXiv (Cornell University), Feb 28, 2021

IEEE Transactions on Dependable and Secure Computing, 2021

Web3.0, often cited to drastically shape our lives, is ubiquitous. However, few literatures have ... more Web3.0, often cited to drastically shape our lives, is ubiquitous. However, few literatures have discussed the crucial differentiators that separate Web3.0 from the era we are currently living in. Via a thorough analysis of the recent blockchain infrastructure evolution, we capture a key invariant featuring the evolution, based on which we provide the first academic definition for Web3.0. Our definition is not the only way of understanding Web3.0, yet, it captures the fundamental and defining trait of Web3.0, and meanwhile it is has two desirable properties. Under this definition, we articulate three key categories of infrastructural enablers for Web3.0: individual smart-contract capable blockchains, federated or centralized platforms capable of publishing verifiable states, and an interoperability platform to hyperconnect those state publishers to provide a unified and connected computing platform for Web3.0 applications. While innovations in all categories are necessary to fully enable Web3.0, in this paper, we present a design for the third enabler, i.e., the first interoperability platform, namely HyperService, that advances the state-of-the-art by simultaneously delivers interoperability and programmability across heterogeneous blockchains and state publishers. HyperService is powered by two innovative designs: (i) a developer-facing programming framework that allows developers to build cross-chain applications in a unified programming model; and (ii) a secure blockchain-facing cryptography protocol that provably realizes those applications on blockchains. We implement a prototype of HyperService in approximately 62,000 lines of code to demonstrate its practicality, usability

2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2021

Mobile apps have been an integral part in our daily life. As these apps become more complex, it i... more Mobile apps have been an integral part in our daily life. As these apps become more complex, it is critical to provide automated analysis techniques to ensure the correctness, security, and performance of these apps. A key component for these automated analysis techniques is to create a graphical user interface (GUI) model of an app, i.e., a window transition graph (WTG), that models windows and transitions among the windows. While existing work has provided both static and dynamic analysis to build the WTG for an app, the constructed WTG misses many transitions or contains many infeasible transitions due to the coverage issues of dynamic analysis and over-approximation of the static analysis. We propose ProMal, a "tribrid" analysis that synergistically combines static analysis, dynamic analysis, and machine learning to construct a precise WTG. Specifically, ProMal first applies static analysis to build a static WTG, and then applies dynamic analysis to verify the transiti...

2022 IEEE 49th Photovoltaics Specialists Conference (PVSC)

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019

Abstract-Until 2017, Android smartphones occupied approximately 87% of the smartphone market. The... more Abstract-Until 2017, Android smartphones occupied approximately 87% of the smartphone market. The vast market also promotes the development of Android malware. Nowadays, the number of malware targeting Android devices found daily is more than 38,000. With the rapid progress of mobile application programming and anti-reverse-engineering techniques, it is harder to detect all kinds of malware. To address challenges in existing detection techniques, such as data obfuscation and limited code coverage, we propose a detection approach that directly learns features of malware from Dalvik bytecode based on deep learning technique (CNN). The average detection time of our model is0.22 seconds, which is much lower than other existing detection approaches. In the meantime, the overall accuracy of our model achieves over 93%.

Proceedings of the Tenth Asia-Pacific Symposium on Internetware, 2018

Routing the bug reports to potential fixers (i.e., bug triaging), is an integral step in software... more Routing the bug reports to potential fixers (i.e., bug triaging), is an integral step in software development and maintenance. However, manually inspecting and assigning bug reports is tedious and time-consuming, especially in those software projects that have a large amount of bug reports and developers. To make bug triaging more efficient, many machine learning and information retrieval based approaches have been proposed to automatically assign bug reports for suitable developers to fix. However, these techniques typically ignore two important facts in bug fixing. First, for some bug reports, the bug reporter himself/herself is one of the developers in the project, and he/she is likely to fix his/her reported bugs in the future. Second, for some bug reports, there may be a tossing sequence which contains several developers from the first potential fixer to the last actual fixer. Such tossing sequences encode valuable information such as the dependency of developers for the bug triaging task. To make use of the above facts, we propose a sequence to sequence model named SeqTriage to automatically route a given bug report to its responsible fixer. Evaluation results on three different open-source projects show that the proposed approach has significantly improved the accuracy of bug triaging compared with the state-of-the-art approaches (20% at best and 5% at least).

2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018

When tests fail (e.g., throwing uncaught exceptions), automatically inferred preconditions can br... more When tests fail (e.g., throwing uncaught exceptions), automatically inferred preconditions can bring various debugging benefits to developers. If illegal inputs cause tests to fail, developers can directly insert the preconditions in the method under test to improve its robustness. If legal inputs cause tests to fail, developers can use the preconditions to infer failure-inducing conditions. To automatically infer preconditions for better support of debugging, in this paper, we propose PREINFER, a novel approach that aims to infer accurate and concise preconditions based on symbolic analysis. Specifically, PREINFER includes two novel techniques that prune irrelevant predicates in path conditions collected from failing tests, and that generalize predicates involving collection elements (i.e., array elements) to infer desirable quantified preconditions. Our evaluation on two benchmark suites and two real-world open-source projects shows PREINFER's high effectiveness on precondition inference and its superiority over related approaches.

2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021

Log-based cyber threat hunting has emerged as an important solution to counter sophisticated atta... more Log-based cyber threat hunting has emerged as an important solution to counter sophisticated attacks. However, existing approaches require non-trivial efforts of manual query construction and have overlooked the rich external threat knowledge provided by open-source Cyber Threat Intelligence (OSCTI). To bridge the gap, we propose THREATRAPTOR, a system that facilitates threat hunting in computer systems using OSCTI. Built upon system auditing frameworks, THREATRAP-TOR provides (1) an unsupervised, lightweight , and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domainspecific query language, TBQL, to hunt for malicious system activities, (3) a query synthesis mechanism that automatically synthesizes a TBQL query for hunting, and (4) an efficient query execution engine to search the big audit logging data. Evaluations on a broad set of attack cases demonstrate the accuracy and efficiency of THREATRAPTOR in practical threat hunting.

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, 2020

With the rapid growth of Android devices, techniques that ensure high quality of mobile applicati... more With the rapid growth of Android devices, techniques that ensure high quality of mobile applications (i.e., apps) are receiving more and more attention. It is well-accepted that mutation analysis is an effective approach to simulate and locate realistic faults in the program. However, there exist few practical mutation analysis tools for Android apps. Even worse, existing mutation analysis tools tend to generate a large number of mutants, hindering broader adoption of mutation analysis, let alone the remaining high number of stillborn mutants. Additionally, mutation operators are usually pre-defined by such tools, leaving users less ability to define specific operators to meet their own needs. To address the aforementioned problems, we propose DROIDMUTATOR, a mutation analysis tool specifically for Android apps with configurability and extensibility. DROIDMUTATOR reduces the number of generated stillborn mutants through type checking, and the scope of mutation operators can be custo...

Open access to the Proceedings of

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017

Fault localization is a well-received technique for helping developers to identify faulty stateme... more Fault localization is a well-received technique for helping developers to identify faulty statements of a program. Research has shown that the coverages of faulty statements and its predecessors in program dependence graph are important for effective fault localization. However, app executions in Android split into segments in different components, i.e., methods, threads, and processes, posing challenges for traditional program dependence computation, and in turn rendering fault localization less effective. We present RunDroid, a tool for recovering the dynamic call graphs of app executions in Android, assisting existing tools for more precise program dependence computation. For each execution, RunDroid captures and recovers method calls from not only the application layer, but also between applications and the Android framework. Moreover, to deal with the widely adopted multi-threaded communications in Android applications, RunDroid also captures methods calls that are split among threads.

Journal of Computer Science and Technology, 2019

Bug triaging, which routes the bug reports to potential fixers, is an integral step in software d... more Bug triaging, which routes the bug reports to potential fixers, is an integral step in software development and maintenance. To make bug triaging more efficient, many researchers propose to adopt machine learning and information retrieval techniques to identify some suitable fixers for a given bug report. However, none of the existing proposals simultaneously take into account the following three aspects that matter for the efficiency of bug triaging: 1) the textual content in the bug reports, 2) the metadata in the bug reports, and 3) the tossing sequence of the bug reports. To simultaneously make use of the above three aspects, we propose iTriage which first adopts a sequence-to-sequence model to jointly learn the features of textual content and tossing sequence, and then uses a classification model to integrate the features from textual content, metadata, and tossing sequence. Evaluation results on three different open-source projects show that the proposed approach has significantly improved the accuracy of bug triaging compared with the state-of-the-art approaches.

Proceedings of the 33rd International Conference on Software Engineering, 2011

Achieving high structural coverage is an important goal of software testing. To ease the task of ... more Achieving high structural coverage is an important goal of software testing. To ease the task of manually producing high-covering test inputs that achieve high structural coverage, tools built based on automated test-generation approaches can be employed to automatically generate such test inputs. Although such tools can easily generate highcovering test inputs for simple programs, when applied on complex programs in practice, these tools face various problems, such as: dealing with method calls to external libraries, generating method-call sequences to produce desired object states, and exceeding defined boundaries of resources due to loops. Since these tools currently are not powerful enough to deal with these various problems, we propose cooperative developer testing, where developers provide guidance to help tools achieve higher structural coverage. To reduce the efforts of developers in providing guidance to the tools, we propose a novel approach, called Covana. Covana precisely identifies and reports problems that prevent the tools from achieving high structural coverage primarily by determining whether branch statements containing not-covered branches have data dependencies on problem candidates.

Proceedings of the 2014 International Symposium on Software Testing and Analysis, 2014

The ever-increasing reliance of today's society on software requires scalable and precise techniq... more The ever-increasing reliance of today's society on software requires scalable and precise techniques for checking the correctness, reliability, and robustness of software. Object-oriented languages have been used extensively to build largescale systems, including Java and C++. While many scalable static analysis approaches for C and Java have been proposed, there has been comparatively little work on the static analysis of C++ programs. In this paper, we provide an abstract representation to model C++ objects, containers, references, raw pointers, and smart pointers. Further, we present a new analysis called lifetime dependency analysis, which allows us to precisely track the complex lifetime semantics of temporary objects in C++. Finally, we propose an implementation of our techniques and present promising results on a large variety of open-source software.

ABSTRACT High structural coverage of the code under test is often used as an indicator of the tho... more ABSTRACT High structural coverage of the code under test is often used as an indicator of the thoroughness and the confidence level of testing. Dynamic symbolic execution is a testing tech-nique which explores feasible paths of the program under test by executing it with different generated test inputs to achieve high structural coverage. It collects the symbolic constraints along the path explored and negates one of the constraints to obtain a new path. However, due to the dif-ficulty of method sequence generation, long run loop and testability issues, it may not be able to generate the test inputs for every feasible path. These problems could be solved by involving developers' help to assist the generation of inputs for solving the constraints. To help the developers figure out the problems, reportig every issue encountered is not enough since browsing through a long list of reported issues and picking up the most related one for the prob-lem is not a easy task as well. In this paper, we propose an approach for carrying out the casual analysis of residual structural coverage in dynamic symbolic execution, which collects the reported issues and coverage information, filter out the unrelated ones and reports the non-covered branches with the associated issues. We conducted the evaluation on a set of open source projects and the result shows that our approach reported ?% related issues(may be 100% without false negative) and ?% less issues than the issues reported by Pex, an automated structural testing tool developed at Microsoft Research for .NET programs.

Tool automation to reduce manual efforts is important in software testing and analysis for improv... more Tool automation to reduce manual efforts is important in software testing and analysis for improving software quality. When dealing with complex software, cooperation that synergistically combines the strengths of users and tools is greatly needed and yet lacks support in state-of-the-art research and practice. This talk presents a methodology of cooperative testing and analysis, where users make informed decision when cooperating with software testing and analysis tools to accomplish tasks more effectively. This talk also presents a program-analysis technique on precisely identifying and reporting the problems that prevent test-generation tools from achieving high structural coverage. This technique enables users to help the tools address only the relevant problems, reducing users' efforts in providing guidance. Finally, this talk presents another program-analysis technique on computing information flows and classifying them as safe/unsafe based on a tamper analysis. This flow information explains how applications use permissions, enabling users to make informed decisions on using private data or anonymized data. Also, such information enables mobile platforms to provide default settings that only expose private data for safe flows, minimizing decision making required from users.