Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach (original) (raw)

Static Program Analysis for String Manipulation Languages

Electronic Proceedings in Theoretical Computer Science

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.

Static Analysis for ECMAScript String Manipulation Programs

Applied Sciences

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript sp...

Abstract interpretation of domain-specific embedded languages

2002

A domain-specific embedded language (DSEL) is a domain-specific programming language with no concrete syntax of its own. Defined as a set of combinators encapsulated in a module, it borrows the syntax and tools (such as type-checkers and compilers) of its host language; hence it is economical to design, introduce, and maintain. Unfortunately, this economy is counterbalanced by a lack of room for growth. DSELs cannot match sophisticated domain-specific languages that offer tools for domainspecific error-checking and optimisation. These tools are usually based on syntactic analyses, so they do not work on DSELs. Abstract interpretation is a technique ideally suited to the analysis of DSELs, due to its semantic, rather than syntactic, approach. It is based upon the observation that analysing a program is equivalent to evaluating it over an abstract semantic domain. The mathematical properties of the abstract domain are such that evaluation reduces to solving a mutually recursive set of equations. This dissertation shows how abstract interpretation can be applied to a DSEL by replacing it with an abstract implementation of the same interface; evaluating a program with the abstract implementation yields an analysis result, rather than an executable. The abstract interpretation of DSELs provides a foundation upon which to build sophisticated error-checking and optimisation tools. This is illustrated with three examples: an alphabet analyser for CSP, an ambiguity test for parser combinators, and a definedness test for attribute grammars. Of these, the ambiguity test for parser combinators is probably the most important example, due to the prominence of parser combinators and their rather conspicuous lack of support for the well-known LL(k) test. In this dissertation, DSELs and their signatures are encoded using the polymorphic lambda calculus. This allows the correctness of the abstract interpretation of DSELs to be proved using the parametricity theorem: safety is derived for free from the polymorphic type of a program. Crucially, parametricity also solves a problem commonly encountered by other analysis methods: it ensures the correctness of the approach in the presence of higher-order functions.

Domain Specific Languages for Interactive Web Services

2003

Copyright c 2003, Claus Brabrand. BRICS, Department of Computer Science University of Aarhus. All rights reserved. ... Reproduction of all or part of this work is permitted for educational or research use on condition that this copyright notice is included in any copy.

Malware detection using attribute-automata to parse abstract behavioral descriptions

Most behavioral detectors of malware remain specific to a given language and platform, mostly PE executables for Windows. The objective of this paper is to define a generic approach for behavioral detection based on two layers respectively responsible for abstraction and detection. The first abstraction layer remains specific to a platform and a language. This first layer interprets the collected instructions, API calls and arguments and classifies these operations as well as the involved objects according to their purpose in the malware lifecycle. The second detection layer remains generic and is totally interoperable between the different abstraction components. This layer relies on parallel automata parsing attributegrammars where semantic rules are used for object typing (object classification) and object binding (data-flow). To feed detection and to experiment with our approach we have developed two different abstraction components: one processing system call traces from native code and one processing the VBScript interpreted language. The different experimentations have provided promising detection rates, in particular for script files (89%), with almost none false positives. In the case of process traces, the detection rate remains significant (51%) but could be increased by more sophisticated collection tools.

An Analytic Framework for JavaScript

2011

Abstract As the programming language of the web, JavaScript deserves a principled yet robust framework for static analysis. To achieve both aims simultaneously, we start from an established reduction semantics for JavaScript and systematically derive its intensional abstract interpretation. Our first step is to transform the semantics into an equivalent low-level abstract machine: the JavaScript Abstract Machine (JAM). We then derive the systematic abstraction of the entire low-level machine.

Cerberus: Query-driven Scalable Vulnerability Detection in OAuth Service Provider Implementations

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022

OAuth protocols have been widely adopted to simplify user authentication and service authorization for third-party applications. However, little effort has been devoted to automatically checking the security of the libraries that service providers widely use. In this paper, we formalize the OAuth specifications and security best practices, and design Cerberus, an automated static analyzer, to find logical flaws and identify vulnerabilities in the implementation of OAuth service provider libraries. To efficiently detect security violations in a large codebase of service provider implementation, Cerberus employs a query-driven algorithm for answering queries about OAuth specifications. We demonstrate the effectiveness of Cerberus by evaluating it on datasets of popular OAuth libraries with millions of downloads. Among these high-profile libraries, Cerberus has identified 47 vulnerabilities from ten classes of logical flaws, 24 of which were previously unknown. We got acknowledged by the developers of eight libraries and had three accepted CVEs.

Synthesizing Design Models from Scenarios by Learning

2007

The elicitation of requirements is the main initial phase in the typical software engineering development cycle. A plethora of elicitation techniques for requirement engineering exist. Popular requirement engineering methods, such as the Inquiry Cycle and CREWS [NE00], exploit use cases and scenarios to specify the system's requirements. Sequence diagrams are also at the heart of the UML. A scenario is a partial fragment of the system's behavior, describing the system components, their message exchange and concurrency.