Vincenzo Arceri - Academia.edu (original) (raw)

Papers by Vincenzo Arceri

Research paper thumbnail of Improving Dynamic Code Analysis by Code Abstraction

Electronic proceedings in theoretical computer science, Sep 5, 2021

In this paper, our aim is to propose a model for code abstraction, based on abstract interpretati... more In this paper, our aim is to propose a model for code abstraction, based on abstract interpretation, allowing us to improve the precision of a recently proposed static analysis by abstract interpretation of dynamic languages. The problem we tackle here is that the analysis may add some spurious code to the string-to-execute abstract value and this code may need some abstract representations in order to make it analyzable. This is precisely what we propose here, where we drive the code abstraction by the analysis we have to perform.

Research paper thumbnail of Relational String Abstract Domains

Lecture Notes in Computer Science, 2022

Research paper thumbnail of Abstract Domains for Type Juggling

Electronic Notes in Theoretical Computer Science, Mar 1, 2017

Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features tha... more Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features that make them both flexible and error-prone. In order to prevent bugs in web applications, there is a sore need for powerful static analysis tools. In this paper, we investigate how Abstract Interpretation may be leveraged to provide a precise value analysis providing rich typing information that can be a useful component for such tools. In particular, we define the formal semantics for a core of PHP that illustrates type juggling, the implicit type conversions typical of PHP, and investigate the design of abstract domains and operations that, while still scalable, are expressive enough to cope with type juggling. We believe that our approach can also be applied to other languages with implicit type conversions.

Research paper thumbnail of Lifting String Analysis Domains

Intelligent systems reference library, 2023

Research paper thumbnail of An Abstract Domain for Objects in Dynamic Programming Languages

Lecture Notes in Computer Science, 2020

Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects da... more Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects data structures allowing programmers to dynamically create, manipulate, and delete objects' properties. Moreover, in dynamic languages it is possible to access and update properties by using strings: this represents a hard challenge for static analysis. In this paper, we exploit the finite state automata abstract domain, approximating strings, in order to define a novel abstract domain for objects. We design an abstract interpreter useful to analyze objects in a toy language, inspired by real-word dynamic programming languages. We then show, by means of minimal yet expressive examples, the precision of the proposed abstract domain.

Research paper thumbnail of BIOCHAIN: towards a platform for securely sharing microbiological data

International Database Engineered Applications Symposium Conference

There is a need to persuade public and private entities to share their currently unexposed bio-da... more There is a need to persuade public and private entities to share their currently unexposed bio-data banks by preserving ownership and secrecy. The reason is to make available results that can be obtained by massively exploiting the content of such data by modern machine learning approaches. Digital catalogues of data collections are being provided. However, they are not developed to protect private content that may be shared according to privileges assigned by the owners. Here, we present BIOCHAIN, a data-sharing module which will be the basis for a computational platform aimed at performing federated data analysis. The platform is intended to be used by a consortium of private and public institutions in the field of microbiology. BIOCHAIN makes use of blockchain technology to guarantee fairness among entities of the consortium by allowing them to securely share their data. CCS CONCEPTS • Information systems → Data exchange; Data provenance; Distributed database transactions.

Research paper thumbnail of Decoupling the Ascending and Descending Phases in Abstract Interpretation

Lecture Notes in Computer Science, 2022

Interpretation approximates the semantics of a program by mimicking its concrete fixpoint computa... more Interpretation approximates the semantics of a program by mimicking its concrete fixpoint computation on an abstract domain A. The abstract (post-) fixpoint computation is classically divided into two phases: the ascending phase, using widenings as extrapolation operators to enforce termination, is followed by a descending phase, using narrowings as interpolation operators, so as to mitigate the effect of the precision losses introduced by widenings. In this paper we propose a simple variation of this classical approach where, to more effectively recover precision, we decouple the two phases: in particular, before starting the descending phase, we replace the domain A with a more precise abstract domain D. The correctness of the approach is justified by casting it as an instance of the A 2 I framework. After demonstrating the new technique on a simple example, we summarize the results of a preliminary experimental evaluation, showing that it is able to obtain significant precision improvements for several choices of the domains A and D.

Research paper thumbnail of Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach

Research paper thumbnail of Static analysis for dummies: experiencing LiSA

Proceedings of the 10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, 2021

Semantics-based static analysis requires a significant theoretical background before being able t... more Semantics-based static analysis requires a significant theoretical background before being able to design and implement a new analysis. Unfortunately, the development of even a toy static analyzer from scratch requires to implement an infrastructure (parser, control flow graphs representation, fixpoint algorithms, etc.) that is too demanding for bachelor and master students in computer science. This approach difficulty can condition the acquisition of skills on software verification which are of major importance for the design of secure systems. In this paper, we show how LiSA (Library for Static Analysis) can play a role in that respect. LiSA implements the basic infrastructure that allows a non-expert user to develop even simple analyses (e.g., dataflow and numerical non-relational domains) focusing only on the design of the appropriate representation of the property of interest and of the sound approximation of the program statements. CCS Concepts: • Software and its engineering → General programming languages; • Theory of computation → Program analysis.

Research paper thumbnail of An Abstract Domain for Objects in Dynamic Programming Languages

Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects da... more Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects data structures allowing programmers to dynamically create, manipulate, and delete objects’ properties. Moreover, in dynamic languages it is possible to access and update properties by using strings: this represents a hard challenge for static analysis. In this paper, we exploit the finite state automata abstract domain, approximating strings, in order to define a novel abstract domain for objects. We design an abstract interpreter useful to analyze objects in a toy language, inspired by real-word dynamic programming languages. We then show, by means of minimal yet expressive examples, the precision of the proposed abstract domain.

Research paper thumbnail of Twinning Automata and Regular Expressions for String Static Analysis

Lecture Notes in Computer Science, 2021

In this paper we formalize Tarsis, a new abstract domain based on the abstract interpretation the... more In this paper we formalize Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We implemented a prototype of Tarsis, and we applied it to some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that Tarsis is in position to obtain strictly more precise results than existing analyses.

Research paper thumbnail of Completeness of Abstract Domains for String Analysis of JavaScript Programs

Theoretical Aspects of Computing – ICTAC 2019, 2019

Completeness in abstract interpretation is a well-known property, which ensures that the abstract... more Completeness in abstract interpretation is a well-known property, which ensures that the abstract framework does not lose information during the abstraction process, with respect to the property of interest. Completeness has been never taken into account for existing string abstract domains, due to the fact that it is difficult to prove it formally. However, the effort is fully justified when dealing with string analysis, which is a key issue to guarantee security properties in many software systems, in particular for JavaScript programs where poorly managed string manipulating code often leads to significant security flaws. In this paper, we address completeness for the main JavaScript-specific string abstract domains, we provide suitable refinements of them, and we discuss the benefits of guaranteeing completeness in the context of abstractinterpretation based string analysis of dynamic languages.

Research paper thumbnail of Completeness of string analysis for dynamic languages

Information and Computation, 2021

Abstract In Abstract Interpretation, completeness ensures that the analysis does not lose informa... more Abstract In Abstract Interpretation, completeness ensures that the analysis does not lose information with respect to the property of interest. In particular, for dynamic languages like JavaScript, completeness of string analysis is a key security issue, as poorly managed string manipulation code may easily lead to significant security flaws. In this paper, we provide a systematic and constructive approach for generating the completion of string domains for dynamic languages, and we apply it to the refinement of existing string abstractions. We also provide an effective procedure to measure the precision improvement obtained when lifting the analysis to complete domains.

Research paper thumbnail of Static Analysis for ECMAScript String Manipulation Programs

Applied Sciences, 2020

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in ... more In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript sp...

Research paper thumbnail of Static Program Analysis for String Manipulation Languages

Electronic Proceedings in Theoretical Computer Science, 2019

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in ... more In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.

Research paper thumbnail of A sound abstract interpreter for dynamic code

Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020

Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically gener... more Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyze, are themselves dynamically mutating objects. Hence, the need to handle string-to-code statements approximating what they can execute, and therefore allowing the analysis to continue (even in presence of string-to-code statements) with an acceptable degree of precision. In order to reach this goal, we propose a static analysis allowing us to collect string values and allowing us to soundly over-approximate and analyze the code potentially executed by a string-to-code statement.

Research paper thumbnail of Improving Dynamic Code Analysis by Code Abstraction

Electronic proceedings in theoretical computer science, Sep 5, 2021

In this paper, our aim is to propose a model for code abstraction, based on abstract interpretati... more In this paper, our aim is to propose a model for code abstraction, based on abstract interpretation, allowing us to improve the precision of a recently proposed static analysis by abstract interpretation of dynamic languages. The problem we tackle here is that the analysis may add some spurious code to the string-to-execute abstract value and this code may need some abstract representations in order to make it analyzable. This is precisely what we propose here, where we drive the code abstraction by the analysis we have to perform.

Research paper thumbnail of Relational String Abstract Domains

Lecture Notes in Computer Science, 2022

Research paper thumbnail of Abstract Domains for Type Juggling

Electronic Notes in Theoretical Computer Science, Mar 1, 2017

Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features tha... more Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features that make them both flexible and error-prone. In order to prevent bugs in web applications, there is a sore need for powerful static analysis tools. In this paper, we investigate how Abstract Interpretation may be leveraged to provide a precise value analysis providing rich typing information that can be a useful component for such tools. In particular, we define the formal semantics for a core of PHP that illustrates type juggling, the implicit type conversions typical of PHP, and investigate the design of abstract domains and operations that, while still scalable, are expressive enough to cope with type juggling. We believe that our approach can also be applied to other languages with implicit type conversions.

Research paper thumbnail of Lifting String Analysis Domains

Intelligent systems reference library, 2023

Research paper thumbnail of An Abstract Domain for Objects in Dynamic Programming Languages

Lecture Notes in Computer Science, 2020

Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects da... more Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects data structures allowing programmers to dynamically create, manipulate, and delete objects' properties. Moreover, in dynamic languages it is possible to access and update properties by using strings: this represents a hard challenge for static analysis. In this paper, we exploit the finite state automata abstract domain, approximating strings, in order to define a novel abstract domain for objects. We design an abstract interpreter useful to analyze objects in a toy language, inspired by real-word dynamic programming languages. We then show, by means of minimal yet expressive examples, the precision of the proposed abstract domain.

Research paper thumbnail of BIOCHAIN: towards a platform for securely sharing microbiological data

International Database Engineered Applications Symposium Conference

There is a need to persuade public and private entities to share their currently unexposed bio-da... more There is a need to persuade public and private entities to share their currently unexposed bio-data banks by preserving ownership and secrecy. The reason is to make available results that can be obtained by massively exploiting the content of such data by modern machine learning approaches. Digital catalogues of data collections are being provided. However, they are not developed to protect private content that may be shared according to privileges assigned by the owners. Here, we present BIOCHAIN, a data-sharing module which will be the basis for a computational platform aimed at performing federated data analysis. The platform is intended to be used by a consortium of private and public institutions in the field of microbiology. BIOCHAIN makes use of blockchain technology to guarantee fairness among entities of the consortium by allowing them to securely share their data. CCS CONCEPTS • Information systems → Data exchange; Data provenance; Distributed database transactions.

Research paper thumbnail of Decoupling the Ascending and Descending Phases in Abstract Interpretation

Lecture Notes in Computer Science, 2022

Interpretation approximates the semantics of a program by mimicking its concrete fixpoint computa... more Interpretation approximates the semantics of a program by mimicking its concrete fixpoint computation on an abstract domain A. The abstract (post-) fixpoint computation is classically divided into two phases: the ascending phase, using widenings as extrapolation operators to enforce termination, is followed by a descending phase, using narrowings as interpolation operators, so as to mitigate the effect of the precision losses introduced by widenings. In this paper we propose a simple variation of this classical approach where, to more effectively recover precision, we decouple the two phases: in particular, before starting the descending phase, we replace the domain A with a more precise abstract domain D. The correctness of the approach is justified by casting it as an instance of the A 2 I framework. After demonstrating the new technique on a simple example, we summarize the results of a preliminary experimental evaluation, showing that it is able to obtain significant precision improvements for several choices of the domains A and D.

Research paper thumbnail of Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach

Research paper thumbnail of Static analysis for dummies: experiencing LiSA

Proceedings of the 10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, 2021

Semantics-based static analysis requires a significant theoretical background before being able t... more Semantics-based static analysis requires a significant theoretical background before being able to design and implement a new analysis. Unfortunately, the development of even a toy static analyzer from scratch requires to implement an infrastructure (parser, control flow graphs representation, fixpoint algorithms, etc.) that is too demanding for bachelor and master students in computer science. This approach difficulty can condition the acquisition of skills on software verification which are of major importance for the design of secure systems. In this paper, we show how LiSA (Library for Static Analysis) can play a role in that respect. LiSA implements the basic infrastructure that allows a non-expert user to develop even simple analyses (e.g., dataflow and numerical non-relational domains) focusing only on the design of the appropriate representation of the property of interest and of the sound approximation of the program statements. CCS Concepts: • Software and its engineering → General programming languages; • Theory of computation → Program analysis.

Research paper thumbnail of An Abstract Domain for Objects in Dynamic Programming Languages

Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects da... more Dynamic languages, such as JavaScript, PHP, Python or Ruby, provide a memory model for objects data structures allowing programmers to dynamically create, manipulate, and delete objects’ properties. Moreover, in dynamic languages it is possible to access and update properties by using strings: this represents a hard challenge for static analysis. In this paper, we exploit the finite state automata abstract domain, approximating strings, in order to define a novel abstract domain for objects. We design an abstract interpreter useful to analyze objects in a toy language, inspired by real-word dynamic programming languages. We then show, by means of minimal yet expressive examples, the precision of the proposed abstract domain.

Research paper thumbnail of Twinning Automata and Regular Expressions for String Static Analysis

Lecture Notes in Computer Science, 2021

In this paper we formalize Tarsis, a new abstract domain based on the abstract interpretation the... more In this paper we formalize Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We implemented a prototype of Tarsis, and we applied it to some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that Tarsis is in position to obtain strictly more precise results than existing analyses.

Research paper thumbnail of Completeness of Abstract Domains for String Analysis of JavaScript Programs

Theoretical Aspects of Computing – ICTAC 2019, 2019

Completeness in abstract interpretation is a well-known property, which ensures that the abstract... more Completeness in abstract interpretation is a well-known property, which ensures that the abstract framework does not lose information during the abstraction process, with respect to the property of interest. Completeness has been never taken into account for existing string abstract domains, due to the fact that it is difficult to prove it formally. However, the effort is fully justified when dealing with string analysis, which is a key issue to guarantee security properties in many software systems, in particular for JavaScript programs where poorly managed string manipulating code often leads to significant security flaws. In this paper, we address completeness for the main JavaScript-specific string abstract domains, we provide suitable refinements of them, and we discuss the benefits of guaranteeing completeness in the context of abstractinterpretation based string analysis of dynamic languages.

Research paper thumbnail of Completeness of string analysis for dynamic languages

Information and Computation, 2021

Abstract In Abstract Interpretation, completeness ensures that the analysis does not lose informa... more Abstract In Abstract Interpretation, completeness ensures that the analysis does not lose information with respect to the property of interest. In particular, for dynamic languages like JavaScript, completeness of string analysis is a key security issue, as poorly managed string manipulation code may easily lead to significant security flaws. In this paper, we provide a systematic and constructive approach for generating the completion of string domains for dynamic languages, and we apply it to the refinement of existing string abstractions. We also provide an effective procedure to measure the precision improvement obtained when lifting the analysis to complete domains.

Research paper thumbnail of Static Analysis for ECMAScript String Manipulation Programs

Applied Sciences, 2020

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in ... more In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript sp...

Research paper thumbnail of Static Program Analysis for String Manipulation Languages

Electronic Proceedings in Theoretical Computer Science, 2019

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in ... more In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.

Research paper thumbnail of A sound abstract interpreter for dynamic code

Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020

Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically gener... more Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyze, are themselves dynamically mutating objects. Hence, the need to handle string-to-code statements approximating what they can execute, and therefore allowing the analysis to continue (even in presence of string-to-code statements) with an acceptable degree of precision. In order to reach this goal, we propose a static analysis allowing us to collect string values and allowing us to soundly over-approximate and analyze the code potentially executed by a string-to-code statement.