Anne Etien - Academia.edu (original) (raw)
Papers by Anne Etien
HAL (Le Centre pour la Communication Scientifique Directe), Mar 1, 2020
ArXiv, 2020
Advanced reverse engineering tools are required to cope with the complexity of software systems a... more Advanced reverse engineering tools are required to cope with the complexity of software systems and the specific requirements of numerous different tasks (re-architecturing, migration, evolution). Consequently, reverse engineering tools should adapt to a wide range of situations. Yet, because they require a large infrastructure investment, being able to reuse these tools is key. Moose is a reverse engineering environment answering these requirements. While Moose started as a research project 20 years ago, it is also used in industrial projects, exposing itself to all these difficulties. In this paper we present ModMoose, the new version of Moose. ModMoose revolves around a new meta-model, modular and extensible; a new toolset of generic tools (query module, visualization engine, ...); and an open architecture supporting the synchronization and interaction of tools per task. With ModMoose, tool developers can develop specific meta-models by reusing existing elementary concepts, and d...
IEEE Software, 2021
Berger-Levrault is an international company that developed applications in GWT for more than 10 y... more Berger-Levrault is an international company that developed applications in GWT for more than 10 years. However, GWT is no longer actively maintained, with only one major update since 2015. To avoid being stuck with legacy technology, the company decided to migrate its applications to Angular. However, because of the size of the applications (more than 500 web pages per application), rewriting from scratch is not desirable. To ease the migration, we designed a semi-automated migration approach that helps developers migrate applications' front-end from GWT to Angular and a tool that performs the migration. In this paper, we present our approach and tool. We validated the approach on concrete application migration and compared its benefits to redeveloping the application manually. We report that the semi-automated migration offers an effort reduction over a manual migration. Finally, we present recommendations for future migration projects.
2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2021
In a collaboration with Berger-Levrault, a major IT company, we are working on the migration of G... more In a collaboration with Berger-Levrault, a major IT company, we are working on the migration of GWT applications to Angular. We focus on the GUI aspect of this migration which requires a framework switch (GWT to Angular) and a programming language switch (Java to TypeScript). Previous work identified that the GUI can be split into the UI structure and the GUI behavioral code. GUI behavioral code is the code executed when the user interacts with the UI. Although the migration of UI structure has already been studied, the migration of the GUI behavioral code has not. To help developers during the migration of their applications, we propose a generic approach in four steps that uses a meta-model to represent the GUI behavioral code. This approach includes a separation of the GUI behavioral code into events (caller code) and the code executed when an event is fired (called code). We present the approach and its implementation for a real industrial case study. The application comprises 470 Java (GWT) classes representing 56 web pages. We give examples of the migrated code. We evaluate the quality of the generated code with standard tools (SonarQube, codelizer) and compare it to another Java to TypeScript converter. The results show that our code has 53% fewer warnings and rule violations for SonarQube, and 99% fewer for codelizer.
INFORSID 2018 - 36ème édition d'INFormatique des ORganisations et Systèmes d'Information et de Décision, May 28, 2018
Dans une base de données relationnelle, certaines tables ont pour finalité de rassembler et d'app... more Dans une base de données relationnelle, certaines tables ont pour finalité de rassembler et d'apporter des informations complémentaires aux lignes des tables constituant le coeur de la base. Ces données sont stockées dans des tables que nous désignons "tables de nomenclatures". Pouvoir les distinguer présente de nombreux intérêts dans le cadre de l'étude, la maintenance et l'évolution d'une base de données. Nous proposons des propriétés permettant de définir la nature de ces tables. Une expérience permettant de valider les propriétés proposées est décrite puis appliquée à un cas d'étude. Un modèle de classification pour les tables de nomenclatures est construit à l'aide d'un algorithme d'exploration de données (datamining). Son évaluation montre une précision 88, 6% et un rappel de 88, 7%.
Abstract. It is widely acknowledged that the system functionality captured in a system model has ... more Abstract. It is widely acknowledged that the system functionality captured in a system model has to match organisational requirements available in the business model. However, fitness measures are rarely integrated in design methodologies. The paper proposes a framework to ease the generation of fitness measures adapted to a given methodology in order to quantify to which extent there is fit between the business and the system. The framework comprises a generic level and a specific level. The former provides generic evaluation criteria and metrics expressed on the basis of business and system ontologies. The specific level is dealing with a specific set of metrics adapted to specific business and system models. The paper presents the process for generating a specific set of measures from the generic set, illustrates it with two specific models and shows how the use of the generated metrics can help in making design decisions in the development of a hotel room booking system.
For more than three decades, reverse engineering has been a major issue in industry wanting to ca... more For more than three decades, reverse engineering has been a major issue in industry wanting to capitalise on legacy systems. Lots of companies have developed reverse engineering tools in order to help developers in their work. However, those tools have been focusing on traditional information systems. Working on a time critical embedded system we found that the solutions available focus either on software behaviour structuring or on data extraction from the system. None of them seem to be clearly using both approaches in a complementary way. In this paper, based on our industrial experiment, we list the requirements that such a tool should fulfil. We also present a short overview of existing reverse engineering tools and their features.
12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 1234567... more 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 A BPT REVIEW Busines...
Analysing a software system supposes two preliminary tasks: parsing the source code and resolving... more Analysing a software system supposes two preliminary tasks: parsing the source code and resolving the names (identifiers) it contains. The parsing results in an Abstract Syntax Tree (AST) representing the source code. Name resolution maps all the identifiers found in the code to the software entities they refer to (variables, functions, classes,. . .). If there are solutions for some popular programming languages (e.g., JDT for the Java language), these two tasks can impose a significant burden on multi-language platforms (e.g., Cast, Eclipse, Rascal, Spoofax, Synectique) where a parser with name resolution must be implemented for each language analysed. For the parser, one may use a grammar of the language and a parser generator tool. For name resolution, solutions are ad-hoc and one must develop them by hand. We work with a company that had to create parsers and name resolvers for five languages in the past 18 months. As a solution, we describe in this paper, an infrastructure tha...
Legacy software systems correspond to the wealth of the companies. They often exist for dozens of... more Legacy software systems correspond to the wealth of the companies. They often exist for dozens of years and concentrate a big part of the company knowledge, its business rules or its savoir-faire. Requirements to which these systems answer have evolved with time, as well as the used technologies leading to modications. These mo-dications occurring after the software delivery, they are considered maintenance. They correspond to more than 80% of the software li-fecycle and its cost. Maintaining a software system is a complex and useful activity that deserves to o be anticipated from the design activity. Remodularisation phases may be useful to reduce complexity massed from successive evolutions and to provide new strong basis for future evolutions. Work presented in this manuscript answers to a unique target : Designing systems of good quality, easily maintainable and managing their evolutions. Quality can be ensured and measured from dierent ways. In this document, I only focus on te...
2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2017
From time to time, developers perform sequences of code transformations in a systematic and repet... more From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches ("structural", based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%.
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)
A group of developers at Siemens Digital Industry Division approached our team to help them restr... more A group of developers at Siemens Digital Industry Division approached our team to help them restructure a large legacy system. Several problems were identified, including the presence of God classes (big classes with thousands of lines of code and hundred of methods). They had tried different approaches considering the dependencies between the classes, but none were satisfactory. Through interaction during the last three years with a lead software architect of the project, we designed a software visualization tool and an accompanying process that allows her to propose a decomposition of a God Class in a matter of one or two hours even without prior knowledge of the class (although actually implementing the decomposition in the source code could take a week of work). In this paper, we present the process that was formalized to decompose God Classes and the tool that was designed. We give details on the system itself and some of the classes that were decomposed. The presented process and visualisations have been successfully used for the last three years on a real industrial system at Siemens.
Rotten green tests are passing tests which have, at least, one assertion not executed. They give ... more Rotten green tests are passing tests which have, at least, one assertion not executed. They give developers a false confidence. In this paper, we present, RTj, a framework that analyzes test cases from Java projects with the goal of detecting and refactoring rotten test cases. RTj automatically discovered 427 rotten tests from 26 open-source Java projects hosted on GitHub. Using RTj, developers have an automated recommendation of the tests that need to be modified for improving the quality of the applications under test.
A major benefit of Model Driven Engineering (MDE) relies on the automatic generation of artefacts... more A major benefit of Model Driven Engineering (MDE) relies on the automatic generation of artefacts from high-level models through intermediary levels using model transformations. In such a process, the input must be well-designed and the model transformations should be trustworthy. Due to the specificities of models and transformations, classical software test techniques have to be adapted. Among these techniques, mutation analysis has been ported and a set of mutation operators has been defined. However, mutation analysis currently requires a considerable manual work and suffers from the test data set improvement activity. This activity is seen by testers as a difficult and time-consuming job, and reduces the benefits of the mutation analysis. This paper addresses the test data set improvement activity. Model transformation traceability in conjunction with a model of mutation operators, and a dedicated algorithm allow to automatically or semi-automatically produce test models that d...
Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for ... more Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for processing algorithms that suit well to their massively parallel architecture. Unfortunately, as parallel programming for this kind of architecture requires a complex distribution of tasks and data, developers find it difficult to implement their applications effectively. Although approaches based on source-to-source and model-to-source transformations have intended to provide a low learning curve for parallel programming and take advantage of architecture features to create optimized applications, the programming remains difficult for neophytes. A Model Driven Engineering (MDE) approach for GPU intends to hide the low-level details of GPU programming by automatically generating the application from the high-level specifications. However, the application designer should take into account some adjustments in the source code to achieve better performance at runtime. Directly modifying the ...
2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2020
Rotten green tests are passing tests which have at least one assertion that is not executed. They... more Rotten green tests are passing tests which have at least one assertion that is not executed. They give developers a false sense of trust in the code. In this paper, we present RTj, a framework that analyzes test cases from Java projects with the goal of detecting and refactoring rotten test cases. RTj automatically discovered 418 rotten tests from 26 open-source Java projects hosted on GitHub. Using RTj, developers have an automated recommendation of the tests that need to be modified for improving the quality of the applications under test. A video is available at: https://youtu.be/Uqxf-Wzp3Mg
ArXiv, 2021
In several domains it is crucial to store and manipulate data whose origin needs to be completely... more In several domains it is crucial to store and manipulate data whose origin needs to be completely traceable to guarantee the consistency, trustworthiness and reliability on the data itself typically for ethical and legal reasons. It is also important to guarantee that such properties are also carried further when such data is composed and processed into new data. In this article we present the main requirements and theorethical problems that arise by the design of a system supporting data with such capabilities. We present an architecture for implementing a system as well as a prototype developed in Pharo. ACM Reference Format: Ronie Salgado, Marcus Denker, Stéphane Ducasse, Anne Etien, and Vincent Aranega. 2020. Towards a Smart Data Processing and Storage Model: . In Proceedings of IWST ’20. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3139903.3139916
HAL (Le Centre pour la Communication Scientifique Directe), Mar 1, 2020
ArXiv, 2020
Advanced reverse engineering tools are required to cope with the complexity of software systems a... more Advanced reverse engineering tools are required to cope with the complexity of software systems and the specific requirements of numerous different tasks (re-architecturing, migration, evolution). Consequently, reverse engineering tools should adapt to a wide range of situations. Yet, because they require a large infrastructure investment, being able to reuse these tools is key. Moose is a reverse engineering environment answering these requirements. While Moose started as a research project 20 years ago, it is also used in industrial projects, exposing itself to all these difficulties. In this paper we present ModMoose, the new version of Moose. ModMoose revolves around a new meta-model, modular and extensible; a new toolset of generic tools (query module, visualization engine, ...); and an open architecture supporting the synchronization and interaction of tools per task. With ModMoose, tool developers can develop specific meta-models by reusing existing elementary concepts, and d...
IEEE Software, 2021
Berger-Levrault is an international company that developed applications in GWT for more than 10 y... more Berger-Levrault is an international company that developed applications in GWT for more than 10 years. However, GWT is no longer actively maintained, with only one major update since 2015. To avoid being stuck with legacy technology, the company decided to migrate its applications to Angular. However, because of the size of the applications (more than 500 web pages per application), rewriting from scratch is not desirable. To ease the migration, we designed a semi-automated migration approach that helps developers migrate applications' front-end from GWT to Angular and a tool that performs the migration. In this paper, we present our approach and tool. We validated the approach on concrete application migration and compared its benefits to redeveloping the application manually. We report that the semi-automated migration offers an effort reduction over a manual migration. Finally, we present recommendations for future migration projects.
2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2021
In a collaboration with Berger-Levrault, a major IT company, we are working on the migration of G... more In a collaboration with Berger-Levrault, a major IT company, we are working on the migration of GWT applications to Angular. We focus on the GUI aspect of this migration which requires a framework switch (GWT to Angular) and a programming language switch (Java to TypeScript). Previous work identified that the GUI can be split into the UI structure and the GUI behavioral code. GUI behavioral code is the code executed when the user interacts with the UI. Although the migration of UI structure has already been studied, the migration of the GUI behavioral code has not. To help developers during the migration of their applications, we propose a generic approach in four steps that uses a meta-model to represent the GUI behavioral code. This approach includes a separation of the GUI behavioral code into events (caller code) and the code executed when an event is fired (called code). We present the approach and its implementation for a real industrial case study. The application comprises 470 Java (GWT) classes representing 56 web pages. We give examples of the migrated code. We evaluate the quality of the generated code with standard tools (SonarQube, codelizer) and compare it to another Java to TypeScript converter. The results show that our code has 53% fewer warnings and rule violations for SonarQube, and 99% fewer for codelizer.
INFORSID 2018 - 36ème édition d'INFormatique des ORganisations et Systèmes d'Information et de Décision, May 28, 2018
Dans une base de données relationnelle, certaines tables ont pour finalité de rassembler et d'app... more Dans une base de données relationnelle, certaines tables ont pour finalité de rassembler et d'apporter des informations complémentaires aux lignes des tables constituant le coeur de la base. Ces données sont stockées dans des tables que nous désignons "tables de nomenclatures". Pouvoir les distinguer présente de nombreux intérêts dans le cadre de l'étude, la maintenance et l'évolution d'une base de données. Nous proposons des propriétés permettant de définir la nature de ces tables. Une expérience permettant de valider les propriétés proposées est décrite puis appliquée à un cas d'étude. Un modèle de classification pour les tables de nomenclatures est construit à l'aide d'un algorithme d'exploration de données (datamining). Son évaluation montre une précision 88, 6% et un rappel de 88, 7%.
Abstract. It is widely acknowledged that the system functionality captured in a system model has ... more Abstract. It is widely acknowledged that the system functionality captured in a system model has to match organisational requirements available in the business model. However, fitness measures are rarely integrated in design methodologies. The paper proposes a framework to ease the generation of fitness measures adapted to a given methodology in order to quantify to which extent there is fit between the business and the system. The framework comprises a generic level and a specific level. The former provides generic evaluation criteria and metrics expressed on the basis of business and system ontologies. The specific level is dealing with a specific set of metrics adapted to specific business and system models. The paper presents the process for generating a specific set of measures from the generic set, illustrates it with two specific models and shows how the use of the generated metrics can help in making design decisions in the development of a hotel room booking system.
For more than three decades, reverse engineering has been a major issue in industry wanting to ca... more For more than three decades, reverse engineering has been a major issue in industry wanting to capitalise on legacy systems. Lots of companies have developed reverse engineering tools in order to help developers in their work. However, those tools have been focusing on traditional information systems. Working on a time critical embedded system we found that the solutions available focus either on software behaviour structuring or on data extraction from the system. None of them seem to be clearly using both approaches in a complementary way. In this paper, based on our industrial experiment, we list the requirements that such a tool should fulfil. We also present a short overview of existing reverse engineering tools and their features.
12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 1234567... more 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 12345678901234567890123456789012123456789012 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789 A BPT REVIEW Busines...
Analysing a software system supposes two preliminary tasks: parsing the source code and resolving... more Analysing a software system supposes two preliminary tasks: parsing the source code and resolving the names (identifiers) it contains. The parsing results in an Abstract Syntax Tree (AST) representing the source code. Name resolution maps all the identifiers found in the code to the software entities they refer to (variables, functions, classes,. . .). If there are solutions for some popular programming languages (e.g., JDT for the Java language), these two tasks can impose a significant burden on multi-language platforms (e.g., Cast, Eclipse, Rascal, Spoofax, Synectique) where a parser with name resolution must be implemented for each language analysed. For the parser, one may use a grammar of the language and a parser generator tool. For name resolution, solutions are ad-hoc and one must develop them by hand. We work with a company that had to create parsers and name resolvers for five languages in the past 18 months. As a solution, we describe in this paper, an infrastructure tha...
Legacy software systems correspond to the wealth of the companies. They often exist for dozens of... more Legacy software systems correspond to the wealth of the companies. They often exist for dozens of years and concentrate a big part of the company knowledge, its business rules or its savoir-faire. Requirements to which these systems answer have evolved with time, as well as the used technologies leading to modications. These mo-dications occurring after the software delivery, they are considered maintenance. They correspond to more than 80% of the software li-fecycle and its cost. Maintaining a software system is a complex and useful activity that deserves to o be anticipated from the design activity. Remodularisation phases may be useful to reduce complexity massed from successive evolutions and to provide new strong basis for future evolutions. Work presented in this manuscript answers to a unique target : Designing systems of good quality, easily maintainable and managing their evolutions. Quality can be ensured and measured from dierent ways. In this document, I only focus on te...
2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2017
From time to time, developers perform sequences of code transformations in a systematic and repet... more From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches ("structural", based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%.
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)
A group of developers at Siemens Digital Industry Division approached our team to help them restr... more A group of developers at Siemens Digital Industry Division approached our team to help them restructure a large legacy system. Several problems were identified, including the presence of God classes (big classes with thousands of lines of code and hundred of methods). They had tried different approaches considering the dependencies between the classes, but none were satisfactory. Through interaction during the last three years with a lead software architect of the project, we designed a software visualization tool and an accompanying process that allows her to propose a decomposition of a God Class in a matter of one or two hours even without prior knowledge of the class (although actually implementing the decomposition in the source code could take a week of work). In this paper, we present the process that was formalized to decompose God Classes and the tool that was designed. We give details on the system itself and some of the classes that were decomposed. The presented process and visualisations have been successfully used for the last three years on a real industrial system at Siemens.
Rotten green tests are passing tests which have, at least, one assertion not executed. They give ... more Rotten green tests are passing tests which have, at least, one assertion not executed. They give developers a false confidence. In this paper, we present, RTj, a framework that analyzes test cases from Java projects with the goal of detecting and refactoring rotten test cases. RTj automatically discovered 427 rotten tests from 26 open-source Java projects hosted on GitHub. Using RTj, developers have an automated recommendation of the tests that need to be modified for improving the quality of the applications under test.
A major benefit of Model Driven Engineering (MDE) relies on the automatic generation of artefacts... more A major benefit of Model Driven Engineering (MDE) relies on the automatic generation of artefacts from high-level models through intermediary levels using model transformations. In such a process, the input must be well-designed and the model transformations should be trustworthy. Due to the specificities of models and transformations, classical software test techniques have to be adapted. Among these techniques, mutation analysis has been ported and a set of mutation operators has been defined. However, mutation analysis currently requires a considerable manual work and suffers from the test data set improvement activity. This activity is seen by testers as a difficult and time-consuming job, and reduces the benefits of the mutation analysis. This paper addresses the test data set improvement activity. Model transformation traceability in conjunction with a model of mutation operators, and a dedicated algorithm allow to automatically or semi-automatically produce test models that d...
Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for ... more Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for processing algorithms that suit well to their massively parallel architecture. Unfortunately, as parallel programming for this kind of architecture requires a complex distribution of tasks and data, developers find it difficult to implement their applications effectively. Although approaches based on source-to-source and model-to-source transformations have intended to provide a low learning curve for parallel programming and take advantage of architecture features to create optimized applications, the programming remains difficult for neophytes. A Model Driven Engineering (MDE) approach for GPU intends to hide the low-level details of GPU programming by automatically generating the application from the high-level specifications. However, the application designer should take into account some adjustments in the source code to achieve better performance at runtime. Directly modifying the ...
2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2020
Rotten green tests are passing tests which have at least one assertion that is not executed. They... more Rotten green tests are passing tests which have at least one assertion that is not executed. They give developers a false sense of trust in the code. In this paper, we present RTj, a framework that analyzes test cases from Java projects with the goal of detecting and refactoring rotten test cases. RTj automatically discovered 418 rotten tests from 26 open-source Java projects hosted on GitHub. Using RTj, developers have an automated recommendation of the tests that need to be modified for improving the quality of the applications under test. A video is available at: https://youtu.be/Uqxf-Wzp3Mg
ArXiv, 2021
In several domains it is crucial to store and manipulate data whose origin needs to be completely... more In several domains it is crucial to store and manipulate data whose origin needs to be completely traceable to guarantee the consistency, trustworthiness and reliability on the data itself typically for ethical and legal reasons. It is also important to guarantee that such properties are also carried further when such data is composed and processed into new data. In this article we present the main requirements and theorethical problems that arise by the design of a system supporting data with such capabilities. We present an architecture for implementing a system as well as a prototype developed in Pharo. ACM Reference Format: Ronie Salgado, Marcus Denker, Stéphane Ducasse, Anne Etien, and Vincent Aranega. 2020. Towards a Smart Data Processing and Storage Model: . In Proceedings of IWST ’20. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3139903.3139916