Júlia Couto | Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) (original) (raw)
Papers by Júlia Couto
Proceedings of the 23rd International Conference on Enterprise Information Systems, 2021
In the past few years, software engineering has increasingly automating several tasks, and machin... more In the past few years, software engineering has increasingly automating several tasks, and machine learning tools and techniques are among the main used strategies to assist in this process. However, there are still challenges to be overcome so that software engineering projects can increasingly benefit from machine learning. In this paper, we seek to understand the main challenges faced by people who use machine learning to assist in their software engineering tasks. To identify these challenges, we conducted a Systematic Review in eight online search engines to identify papers that present the challenges they faced when using machine learning techniques and tools to execute software engineering tasks. Therefore, this research focuses on the classification and discussion of eight groups of challenges: data labeling, data inconsistency, data costs, data complexity, lack of data, non-transferable results, parameterization of the models, and quality of the models. Our results can be used by people who intend to start using machine learning in their software engineering projects to be aware of the main issues they can face.
Proceedings of the 23rd International Conference on Enterprise Information Systems, 2021
Although the human brain stores images more easily than text, most of the tools adopted for softw... more Although the human brain stores images more easily than text, most of the tools adopted for software project management are based on textual reports. The number of software projects that fail is huge, and the lack of understanding of the project complexity by the stakeholders is among the reasons for project failure. Data visualization techniques and tools can help to identify the project issues and reduce misunderstandings. In this paper, we investigate how project management can benefit from data visualization. To do so, we adopted a hybrid research approach composed by a systematic mapping study, a survey, and three focus group sessions. As a result, we identify a set of the 16 visualization techniques and tools that can be used to support software project management and we propose a PMBoK extension that provides a reference for practitioners who are planning to use data visualization to support software project management.
International Conferences on Software Engineering and Knowledge Engineering, 2019
Creating an optimal amount of indexes, taking into account query performance and database size re... more Creating an optimal amount of indexes, taking into account query performance and database size remains a challenge. In theory, one can speed up query response by creating indexes on the most used columns, although causing slower data insertion and deletion, and requiring a much larger amount of memory for storing the indexing data, but in practice, it is very important to balance such a trade-off. This is not a trivial task that often requires action from the Database Administrator. We address this problem by introducing GADIS, A Genetic Algorithm for Database Index Selection, designed to automatically select the best configuration of indexes adaptable for any database schema. This method aims to find the fittest individuals for optimizing both query response time, and disk required for the indexed data. We evaluate the effectiveness of GADISthrough several experiments we developed based on a standard database benchmark, compare it to three baseline indexing strategies, and show that our approach consistently leads to a better resulting index configuration.
Proceedings of the 22nd International Conference on Enterprise Information Systems, 2020
Code companion to the paper: SMARTIX: A database indexing agent based on reinforcement learning
Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering, 2019
In the past few years, data lakes emerged as a trending topic in big data technologies. Although ... more In the past few years, data lakes emerged as a trending topic in big data technologies. Although literature presents different points of view related to its functionalities, it serves mainly to store a variety of data in a big data context. In this paper, we aim to identify and analyze data lake definitions and possible architectures. Our methodology was composed of a systematic literature mapping based on PRISMA, software engineering best practices to perform reviews, and Kappa method to assess results' quality. We performed the search in eight different electronic databases to achieve a wide variety of publishers in Computer Science. We first identified 662 papers matching our search criteria; after filtering, we selected 87 papers for review. We found that the term data lakes was first defined by James Dixon in 2010. We also found that the term is often related to raw data repositories. From the identified definitions, we propose a new one as a means to better state what data lakes refer to and improve how the community use them. Moreover, we foind that Hadoop and its ecosystem compose the most used toolset to create data lakes, revealing that this is the mainstream in architectures for data lakes as of today's available technologies.
Apesar de o cerebro humano armazenar imagens com maior facilidade do que texto, grande parte das ... more Apesar de o cerebro humano armazenar imagens com maior facilidade do que texto, grande parte das ferramentas utilizadas quando se trata de gestao de projetos e baseada em relatorios textuais, como arquivos do pacote Microsoft Office. Atualmente, ainda e grande a quantidade de projetos que falham, devido a motivos diversos, dentre os quais esta o nao entendimento do projeto por parte das partes envolvidas. Quando se trata de desenvolvimento de software, as incertezas, ambiguidades e complexidades inerentes a estes projetos podem amplificar as chances de falhas. As ferramentas e tecnicas de visualizacao de dados podem ajudar a esclarecer o entendimento do contexto e de detalhes do projeto para todas as partes envolvidas, reduzindo o risco de insucesso no projeto e facilitando os processos de comunicacao. O objetivo deste trabalho e identificar o que e utilizado de gestao visual aplicada ao gerenciamento de projetos, de maneira geral e por projetos de diversos tipos, e verificar, dentr...
Proceedings of the 23rd International Conference on Enterprise Information Systems, 2021
In the past few years, software engineering has increasingly automating several tasks, and machin... more In the past few years, software engineering has increasingly automating several tasks, and machine learning tools and techniques are among the main used strategies to assist in this process. However, there are still challenges to be overcome so that software engineering projects can increasingly benefit from machine learning. In this paper, we seek to understand the main challenges faced by people who use machine learning to assist in their software engineering tasks. To identify these challenges, we conducted a Systematic Review in eight online search engines to identify papers that present the challenges they faced when using machine learning techniques and tools to execute software engineering tasks. Therefore, this research focuses on the classification and discussion of eight groups of challenges: data labeling, data inconsistency, data costs, data complexity, lack of data, non-transferable results, parameterization of the models, and quality of the models. Our results can be used by people who intend to start using machine learning in their software engineering projects to be aware of the main issues they can face.
Proceedings of the 23rd International Conference on Enterprise Information Systems, 2021
Although the human brain stores images more easily than text, most of the tools adopted for softw... more Although the human brain stores images more easily than text, most of the tools adopted for software project management are based on textual reports. The number of software projects that fail is huge, and the lack of understanding of the project complexity by the stakeholders is among the reasons for project failure. Data visualization techniques and tools can help to identify the project issues and reduce misunderstandings. In this paper, we investigate how project management can benefit from data visualization. To do so, we adopted a hybrid research approach composed by a systematic mapping study, a survey, and three focus group sessions. As a result, we identify a set of the 16 visualization techniques and tools that can be used to support software project management and we propose a PMBoK extension that provides a reference for practitioners who are planning to use data visualization to support software project management.
International Conferences on Software Engineering and Knowledge Engineering, 2019
Creating an optimal amount of indexes, taking into account query performance and database size re... more Creating an optimal amount of indexes, taking into account query performance and database size remains a challenge. In theory, one can speed up query response by creating indexes on the most used columns, although causing slower data insertion and deletion, and requiring a much larger amount of memory for storing the indexing data, but in practice, it is very important to balance such a trade-off. This is not a trivial task that often requires action from the Database Administrator. We address this problem by introducing GADIS, A Genetic Algorithm for Database Index Selection, designed to automatically select the best configuration of indexes adaptable for any database schema. This method aims to find the fittest individuals for optimizing both query response time, and disk required for the indexed data. We evaluate the effectiveness of GADISthrough several experiments we developed based on a standard database benchmark, compare it to three baseline indexing strategies, and show that our approach consistently leads to a better resulting index configuration.
Proceedings of the 22nd International Conference on Enterprise Information Systems, 2020
Code companion to the paper: SMARTIX: A database indexing agent based on reinforcement learning
Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering, 2019
In the past few years, data lakes emerged as a trending topic in big data technologies. Although ... more In the past few years, data lakes emerged as a trending topic in big data technologies. Although literature presents different points of view related to its functionalities, it serves mainly to store a variety of data in a big data context. In this paper, we aim to identify and analyze data lake definitions and possible architectures. Our methodology was composed of a systematic literature mapping based on PRISMA, software engineering best practices to perform reviews, and Kappa method to assess results' quality. We performed the search in eight different electronic databases to achieve a wide variety of publishers in Computer Science. We first identified 662 papers matching our search criteria; after filtering, we selected 87 papers for review. We found that the term data lakes was first defined by James Dixon in 2010. We also found that the term is often related to raw data repositories. From the identified definitions, we propose a new one as a means to better state what data lakes refer to and improve how the community use them. Moreover, we foind that Hadoop and its ecosystem compose the most used toolset to create data lakes, revealing that this is the mainstream in architectures for data lakes as of today's available technologies.
Apesar de o cerebro humano armazenar imagens com maior facilidade do que texto, grande parte das ... more Apesar de o cerebro humano armazenar imagens com maior facilidade do que texto, grande parte das ferramentas utilizadas quando se trata de gestao de projetos e baseada em relatorios textuais, como arquivos do pacote Microsoft Office. Atualmente, ainda e grande a quantidade de projetos que falham, devido a motivos diversos, dentre os quais esta o nao entendimento do projeto por parte das partes envolvidas. Quando se trata de desenvolvimento de software, as incertezas, ambiguidades e complexidades inerentes a estes projetos podem amplificar as chances de falhas. As ferramentas e tecnicas de visualizacao de dados podem ajudar a esclarecer o entendimento do contexto e de detalhes do projeto para todas as partes envolvidas, reduzindo o risco de insucesso no projeto e facilitando os processos de comunicacao. O objetivo deste trabalho e identificar o que e utilizado de gestao visual aplicada ao gerenciamento de projetos, de maneira geral e por projetos de diversos tipos, e verificar, dentr...