Thiago Ferreira - Academia.edu (original) (raw)

Papers by Thiago Ferreira

Ítaca

O texto aproxima as proposições de Derrida a respeito de uma cegueira na experiência da visibilid... more O texto aproxima as proposições de Derrida a respeito de uma cegueira na experiência da visibilidade ao axioma “O verdadeiro artista ajuda o mundo revelando verdades místicas”, presente no trabalho Window or wall sign (1967), de Bruce Nauman, para delinear um paralelo entre os interditos próprios da linguagem e o fazer artístico. O posicionamento assumido de anúncio na obra de Nauman é utilizado aqui como ponte para a discussão sobre um pensamento que renuncia do vínculo da visualidade com a certeza, e que na impossibilidade de fazer ver o que está à frente, opta mesmo por anunciar, tatear.

IEEE Transactions on Services Computing

arXiv (Cornell University), Apr 12, 2017

Referring expression generation (REG) models that use speaker-dependent information require a con... more Referring expression generation (REG) models that use speaker-dependent information require a considerable amount of training data produced by every individual speaker, or may otherwise perform poorly. In this work we present a simple REG experiment that allows the use of larger training data sets by grouping speakers according to their overspecification preferences. Intrinsic evaluation shows that this method generally outperforms the personalised method found in previous work.

Software containers, such as Docker, are recently considered as the mainstream technology of prov... more Software containers, such as Docker, are recently considered as the mainstream technology of providing reusable software artifacts. Developers can easily build and deploy their applications based on the large number of reusable Docker images that are publicly available. Thus, a current popular trend in industry is to move towards the containerization of their applications. However, container-based projects compromise different components including the Docker and Docker-compose files, and several other dependencies to the source code combining different containers and facilitating the interactions with them. Similar to any other complex systems, container-based projects are prone to various quality and technical debt issues related to different artifacts: Docker and Docker-compose files, and regular source code ones. Unfortunately, there is a gap of knowledge in how container-based projects actually evolve and are maintained. In this paper, we address the above gap by studying refactorings, i.e., structural changes while preserving the behavior, applied in open-source Docker projects, and the technical debt issues they alleviate. We analyzed 68 projects, consisting of 19,5 MLOC, along with 193 manually examined commits. The results indicate that developers refactor these Docker projects for a variety of reasons that are specific to the configuration, combination and execution of containers, leading to several new technical debt categories and refactoring types compared to existing refactoring domains. For instance, refactorings for reducing the image size of Dockerfiles, improving the extensibility of Docker-compose files, and regular source code refactorings are mainly associated with the evolution of Docker and Docker-compose files. We also introduced 24 new Docker-specific refactorings and technical debt categories, respectively, and defined different best practices. The implications of this study will assist practitioners, tool builders, and educators in improving the quality of Docker projects.

Revista ECO-Pós, 2020

A presente resenha tem como objetivo discutir o livro Under the cover of the Chaos: Trump and the... more A presente resenha tem como objetivo discutir o livro Under the cover of the Chaos: Trump and the Battle for the American Right, escrito por Lawrence Grossberg, professor do departamento de Comunicação da Universidade da Carolina do Norte. Nele, o autor realiza uma análise conjuntural dos Estados Unidos partindo da vitória de Trump, evidenciando afetos e disputas que se articulam a várias formações de direita naquele país. Por fim, Grossberg propõe que contemos melhores histórias, explicitando a existência de distintas temporalidades em torno dessas direitas e a necessidade de uma ação política que considere o lugar da imaginação.

Proceedings of the 28th International Conference on Computational Linguistics, 2020

This paper introduces the first corpus for Automatic Post-Editing of English and a low-resource l... more This paper introduces the first corpus for Automatic Post-Editing of English and a low-resource language, Brazilian Portuguese. The source English texts were extracted from the WebNLG corpus and automatically translated into Portuguese using a state-of-the-art industrial neural machine translator. Post-edits were then obtained in an experiment with native speakers of Brazilian Portuguese. To assess the quality of the corpus, we performed error analysis and computed complexity indicators measuring how difficult the APE task would be. We report preliminary results of Phrase-Based and Neural Machine Translation Models on this new corpus. Data and code publicly available in our repository. 1

In the geographical database context, the UML profile called GeoProfile is used in the conceptual... more In the geographical database context, the UML profile called GeoProfile is used in the conceptual modeling of geographical data with welldefined metamodel topology constraints through the use of Object Constraint Language (OCL). This paper describes the process of automatic transformation of GeoProfile constructors and its spatial constraints along the different levels of the MDA architecture. The process was tested in the Enterprise Architect CASE tool. The proposal includes extending the OCLtoSQL plugin to automatically creates triggers that enforce the topology integrity constraints of geographical data in DBMS Oracle Spatial. Resumo. O perfil UML GeoProfile foi proposto para auxiliar no projeto de bancos de dados geográficos. O GeoProfile é utilizado durante a modelagem conceitual de dados geográficos, tendo as restrições topológicas bem definidas em seu meta-modelo, especificadas em Object Constraint Language (OCL). Este artigo descreve o processo de transformação automática do...

Data-to-text Natural Language Generation (NLG) is the computational process of generating natural... more Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A core micro-planning task within NLG is referring expression generation (REG), which aims to automatically generate noun phrases to refer to entities mentioned as discourse unfolds. A limitation of novel REG models is not being able to generate referring expressions to entities not encountered during the training process. To solve this problem, we propose two extensions to NeuralREG, a state-of-the-art encoder-decoder REG model. The first is a copy mechanism, whereas the second consists of representing the gender and type of the referent as inputs to the model. Drawing on the results of automatic and human evaluation as well as an ablation study using the WebNLG corpus, we contend that our proposal contributes to the generation of more meaningful referring expressions to unseen entities than the original system and related...

This demo paper introduces DaMata, a robot-journalist covering deforestation in the Brazilian Ama... more This demo paper introduces DaMata, a robot-journalist covering deforestation in the Brazilian Amazon. The robot-journalist is based on a pipeline architecture of Natural Language Generation, which yields multilingual daily and monthly reports based on the public data provided by DETER, a real-time deforestation satellite monitor developed and maintained by the Brazilian National Institute for Space Research (INPE). DaMata automatically generates reports in Brazilian Portuguese and English and publishes them on the Twitter platform. Corpus and code are publicly available.

IEEE Software, 2021

Several Software Engineering problems are complex and encompass a great number of objectives to b... more Several Software Engineering problems are complex and encompass a great number of objectives to be handled. However, practitioners may face several challenges to adopt existing metaheuristic search for their problems due to the lack of background, or some difficult choices such as the change operators, and parameters tuning. Nautilus Framework allows practitioners developing and experimenting several multi-and many-objectives evolutionary algorithms guided (or not) by human participation in few steps with a minimum required background in coding and search-based algorithms. A case study illustrates its benefits, which can also be used to support the construction of AI solutions guided by human decisions.

Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2020), 2020

We introduce robot journalists that cover two pressing topics in Brazilian society: COVID-19 spre... more We introduce robot journalists that cover two pressing topics in Brazilian society: COVID-19 spread and Legal Amazon deforestation. Our approach is able to automatically analyze structured domain data, select relevant content, generate news texts and publish them on the Web. We provide a thorough description of our system architecture, report on the results of automatic evaluation, discuss some of the advantages of robot-journalism in society, and point out further steps in our work. Corpus and code are publicly available.

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018

Traditionally, Referring Expression Generation (REG) models first decide on the form and then on ... more Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines. Data and models are publicly available 1 .

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 2017

This study introduces a statistical model able to generate variations of a proper name by taking ... more This study introduces a statistical model able to generate variations of a proper name by taking into account the person to be mentioned, the discourse context and variation. The model relies on the REGnames corpus, a dataset with 53,102 proper name references to 1,000 people in different discourse contexts. We evaluate the versions of our model from the perspective of how human writers produce proper names, and also how human readers process them. The corpus 1 and the model 2 are publicly available.

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016

In this study, we introduce a nondeterministic method for referring expression generation. We des... more In this study, we introduce a nondeterministic method for referring expression generation. We describe two models that account for individual variation in the choice of referential form in automatically generated text: a Naive Bayes model and a Recurrent Neural Network. Both are evaluated using the VaREG corpus. Then we select the best performing model to generate referential forms in texts from the GREC-2.0 corpus and conduct an evaluation experiment in which humans judge the coherence and comprehensibility of the generated texts, comparing them both with the original references and those produced by a random baseline model.

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015

This paper describes an experiment to elicit referring expressions from human subjects for resear... more This paper describes an experiment to elicit referring expressions from human subjects for research in natural language generation and related fields, and preliminary results of a computational model for the generation of these expressions. Unlike existing resources of this kind, the resulting data set-the Zoom corpus of natural language descriptions of map locations-takes into account a domain that is significantly closer to real-world applications than what has been considered in previous work, and addresses more complex situations of reference, including contexts with different levels of detail, and instances of singular and plural reference produced by speakers of Spanish and Portuguese.

Computational Intelligence in Electromyography Analysis - A Perspective on Current Applications and Future Challenges, 2012

We have decided to take a different look to the process on how to treat the EMG signal and how to... more We have decided to take a different look to the process on how to treat the EMG signal and how to analyze it. For instance, in order to have a more trustful signal, founds in literature recommend filtering, smoothing the raw and also rectifying the signal, which the last step does not affect the signal power. However, the filtered root mean square (RMS) signal could Computational Intelligence in Electromyography Analysis-A Perspective on Current Applications and Future Challenges 98 not be the best way to pre-process the EMG signal. Other current concern, in EMG signal pre-processing, is about the use of the total signal against evaluation only the burst-time segments of the signal. Those concerns are explained and analyzed along this chapter. In an epistemological language, we take a more critic look into the EMG signal processing. We hope the reader also to have the same look, not only into the results and conclusions, but also, into methods and thoughts, since the intention herein is not to bring an irrefutable true, but the real intention is to discuss and point out valuable arguments for the reader in order to he/she thinks about it by himself or herself, and apply it properly.

2014 9th Iberian Conference on Information Systems and Technologies (CISTI), 2014

ABSTRACT The GeoProfile, which was proposed to standardize the conceptual modeling of geographica... more ABSTRACT The GeoProfile, which was proposed to standardize the conceptual modeling of geographical databases and contains different abstraction levels of MDA architecture, is a UML profile that can be introduced on a variety of CASE tools already consolidated by the UML infrastructure. This article aims at describing the use of GeoProfile on the Enterprise Architect CASE tool and its language transformation to allow automatic transformations among different levels of MDA architecture.

Computational Linguistics and Intelligent Text Processing, 2014

This paper presents a study in the field of Natural Language Generation (NLG), focusing on the co... more This paper presents a study in the field of Natural Language Generation (NLG), focusing on the computational task of referring expression generation (REG). We describe a standard REG implementation based on the well-known Dale & Reiter Incremental algorithm, and a classification-based approach that combines the output of several support vector machines (SVMs) to generate definite descriptions from two publicly available corpora. Preliminary results suggest that the SVM approach generally outperforms incremental generation, which paves the way to further research on machine learning methods applied to the task.

Lecture Notes in Computer Science, 2014

We describe a classification-based approach to referring expression generation (REG) making use o... more We describe a classification-based approach to referring expression generation (REG) making use of standard context-related features, and an extension that adds speaker-related features. Results show that taking speakers' preferences into account outperforms the standard REG model in four test corpora of definite descriptions.

As mobile devices, access to geographical information, and migration from web 1.0 to web 2.0 adva... more As mobile devices, access to geographical information, and migration from web 1.0 to web 2.0 advance, users started playing the role of consumers, producers, and communicators. As a result, several internet systems have spawned that collect Volunteered Geographical Information (VGI). VGI collection systems often need to be developed within short timeframes. This paper presents a comparative analysis between two environments for VGIsystem development: Ushahidi Platform and ClickOnMap. This comparison employed a model based on system-quality norms ISO 9126. The results of this comparison may help VGI system developers choose the tool with the most appropriate characteristics to the goal intended when creating the system.