Maria Luiza Machado Campos | Universidade Federal do Rio de Janeiro (UFRJ) (original) (raw)
Papers by Maria Luiza Machado Campos
World Wide Web, 1998
Metadata, or information that makes data useful, have been considered by the database community... more Metadata, or information that makes data useful, have been considered by the database community basically as data in dictionaries used to control database management systems operations. More recently, metadata have been used to describe digital resources ...
Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um ... more Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um contínuo amadurecimento dos esforços de desenvolvimento de vocabulários, em especial na área de Biomedicina, caracterizada como um domínio inter e multidisciplinar e de temática complexa. Entretanto, apesar de haver propostas metodológicas e de melhores práticas sobre como organizar a estrutura terminológica de ontologias e de suas relações, pouco se explica sobre os métodos adotados para o levantamento terminológico do seu domínio e da delimitação de seu escopo, especialmente considerando o reuso de ontologias. O objetivo desse trabalho é apresentar as bases da Ciência da Informação e da Computação para atividades de reuso em ontologias, como um passo metodológico para a aquisição de conhecimento, visando possibilitar mecanismos para o mapeamento e alinhamento de termos em ontologias no domínio dos Tripanosomatídeos. .
Journal of Decision Systems, 2004
In the context of corporate business intelligence, the use of OLAP (On-Line Analytical Processing... more In the context of corporate business intelligence, the use of OLAP (On-Line Analytical Processing) tools constitutes a fundamental step to improve decision making and to create new knowledge. To help systematization and capture of the reasoning that happens during decision processes, this paper describes a computational environment to complement OLAP tools functionality, called AMPA (the initials for Analytical Processing Memory Environment, in Portuguese). AMPA proposes a structured representation of decision making processes, supporting the capture and retention of decision instances in organizational memory, maintaining not only reports and associated conclusions, but also the context in which they were created. This environment includes a series of sub-process components that guide the execution of decision processes (represented as frameworks) and describe many of its common activities (represented as beans), specially focusing on typical OLAP operations. An application case is described to exemplify the components use.
... ACKNOWLEDGMENTS Marcos RS Borges was partially supported by a grant from CNPq (Brazil) No. ..... more ... ACKNOWLEDGMENTS Marcos RS Borges was partially supported by a grant from CNPq (Brazil) No. ... [12] Oliveira, AC DE ; Araujo, RM DE ; Borges, MRS 2007 Telling Stories about System Use: Capturing Collective Tacit Knowledge for System Maintenance. ...
Reciis, 2009
Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um ... more Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um contínuo amadurecimento dos esforços de desenvolvimento de vocabulários, em especial na área de Biomedicina, caracterizada como um domínio inter e multidisciplinar e de temática complexa. Entretanto, apesar de haver propostas metodológicas e de melhores práticas sobre como organizar a estrutura terminológica de ontologias e de suas relações, pouco se explica sobre os métodos adotados para o levantamento terminológico do seu domínio e da delimitação de seu escopo, especialmente considerando o reuso de ontologias. O objetivo desse trabalho é apresentar as bases da Ciência da Informação e da Computação para atividades de reuso em ontologias, como um passo metodológico para a aquisição de conhecimento, visando possibilitar mecanismos para o mapeamento e alinhamento de termos em ontologias no domínio dos Tripanosomatídeos. .
The complexity of developing full fledged groupware systems imposes a heavy demand on the develop... more The complexity of developing full fledged groupware systems imposes a heavy demand on the development team and certainly is a task not to be accomplished in a short period of time. Many research projects opt to develop a specific tool, concentrating on a particular task of ...
... composition. For instance, suppose a tourism Web site is offering a car rental service. Moreo... more ... composition. For instance, suppose a tourism Web site is offering a car rental service. Moreover ... Customers of the Web site may frequently make use of the car rental service or they may often leave the site without renting cars. If the ...
Bioinformatics/computer Applications in The Biosciences, 2005
Growth of genome data and analysis possibilities have brought new levels of difficulty for scient... more Growth of genome data and analysis possibilities have brought new levels of difficulty for scientists to understand, integrate and deal with all this ever-increasing information. In this scenario, GARSA has been conceived aiming to facilitate the tasks of integrating, analyzing and presenting genomic information from several bioinformatics tools and genomic databases, in a flexible way. GARSA is a userfriendly web-based system designed to analyze genomic data in the context of a pipeline. EST and GGS data can be analyzed using the system since it accepts (1) chromatograms, (2) download of sequences from GenBank, (3) Fasta files stored locally or (4) a combination of all three. Quality evaluation of chromatograms, vector removing and clusterization are easily performed as part of the pipeline. A number of local and customizable Blast and CDD analyses can be performed as well as Interpro, complemented with phylogeny analyses. GARSA is being used for the analyses of Trypanosoma vivax (GSS and EST), Trypanosoma rangeli (GSS, EST and ORESTES), Bothrops jararaca (EST), Piaractus mesopotamicus (EST) and Lutzomyia longipalpis (EST).
In silico experiments encompass multiple combinations of program and data resources, which are co... more In silico experiments encompass multiple combinations of program and data resources, which are complex to be managed. Typically, script languages are used due to their ease of use, despite their specifity and difficulty of reuse. In contrast, Web service technology was specially conceived to encapsulate and combine programs and data, providing interoperability, scalability and flexibility issues. We have combined metadata support with Web services within a framework that supports scientific workflows. We have experimented this framework with a real structural genomic workflow, showing its viability and evidencing its advantages.
Scientific Workflow Management Systems (SWfMS) have been helping scientists to prototype and exec... more Scientific Workflow Management Systems (SWfMS) have been helping scientists to prototype and execute in silico experiments. They can systematically collect provenance information for the derived data products to be later queried. Despite the efforts on building a standard Open Provenance Model (OPM), provenance is tightly coupled to SWfMS. Thus scientific workflow provenance concepts, representation and mechanisms are very heterogeneous, difficult to integrate and dependent on the SWfMS. To help comparing, integrating and analyzing scientific workflow provenance, this paper presents a taxonomy about provenance characteristics. Its classification enables computer scientists to distinguish between different perspectives of provenance and guide to a better understanding of provenance data in general. The analysis of existing approaches will assist us in managing provenance data from distributed heterogeneous workflow executions.
Scientific experiments using workflows benefit from mechanisms to trace the generation of results... more Scientific experiments using workflows benefit from mechanisms to trace the generation of results. As workflows start to scale it is fundamental to have access to their underlying processes, parameters and data. Particularly in molecular dynamics (MD) simulations, a study of the interatomic interactions in proteins must use distributed high performance computing environments to produce timely results. Scientist's trust in experiments produced by gathering distributed partial results may be limited without provenance information. This paper presents a service architecture that captures and stores provenance data from distributed, autonomous, replicated and heterogeneous resources. Such provenance data can be used to trace the history of the distributed execution process. These services can be coupled to workflow management systems. The Kepler system was used as a basis to manage a grid workflow application. Experimental results regarding cluster and grid MD simulations were evaluated using the provenance services architecture.
There are many examples where cooperation among scientists takes place by exchanging scientific r... more There are many examples where cooperation among scientists takes place by exchanging scientific resources, such as data, programs and mathematical models. This is particularly true for environmental applications. Finding the right resource to apply in an environmental problem is a difficult task. Usually, this decision is based on previous experience. Scientists have to cooperate in order to solve such problems. To facilitate the exchange, reuse and dissemination of information we propose an architecture for managing distributed scientific resources. Our proposal combines a mediation-based heterogeneous distributed database system and an enhanced metadata support system for effective management of distributed scientific models and data.
International Journal of Data Mining and Bioinformatics, 2010
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormo... more Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Current technologies are continuously evolving and software companies need to adapt their process... more Current technologies are continuously evolving and software companies need to adapt their processes to these changes. Such adaptation often requires new investments in training and development. To address this issue, OMG defined a model driven development approach (MDD) which insulates business and application logic from technology evolution. Current MDD approaches falls short in fully derive implementation from models described at a high abstraction level. We propose a controlled natural language to complement UML models as an action specification language. In this article, we describe the language, its impact on systems development and the tools developed to support it. In order to demonstrate the language usability we present an application example.
In this work we propose the extension of an existing HDDS called LeSelect [13], specially develop... more In this work we propose the extension of an existing HDDS called LeSelect [13], specially developed to support environmental applications. Despite the many HDDS proposals, LeSelect is unique in its features to handle this kind of application, by allowing scientists to share their data ...
Managing bioinformatics experiments is challenging due to the orchestration and interoperation of... more Managing bioinformatics experiments is challenging due to the orchestration and interoperation of tools with semantics. An effective approach for managing those experiments is through workflow management systems (WfMS). We present several WfMS features for supporting genome homology workflows and discuss relevant issues for typical genomic experiments. In our evaluation we used OrthoSearch, a real genomic pipeline originally defined as a Perl script. We modeled it as a scientific workflow and implemented it on Kepler WfMS. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Data & Knowledge Engineering, 2005
In silico scientific experiments encompass multiple combinations of program and data resources. E... more In silico scientific experiments encompass multiple combinations of program and data resources. Each resource combination in an execution flow is called a scientific workflow. In bioinformatics environments, program composition is a frequent operation, requiring complex management. A scientist faces many challenges when building an experiment: finding the right program to use, the adequate parameters to tune, managing input/output data, building and reusing workflows. Typically, these workflows are implemented using script languages because of their simplicity, despite their specificity and difficulty of reuse. In contrast, Web service technology was specially conceived to encapsulate and combine programs and data, providing interoperation between applications from different platforms. The Web services approach is superior to scripts with regard to interoperability, scalability and flexibility issues. We have combined metadata support with Web services within a framework that supports scientific workflows. While most works are focused on metadata issues to manage and integrate heterogeneous scientific data sources, in this work we concentrate on metadata support to program management within workflows. We have used this framework with a real structural genomic workflow, showing its viability and evidencing its advantages.
World Wide Web, 1998
Metadata, or information that makes data useful, have been considered by the database community... more Metadata, or information that makes data useful, have been considered by the database community basically as data in dictionaries used to control database management systems operations. More recently, metadata have been used to describe digital resources ...
Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um ... more Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um contínuo amadurecimento dos esforços de desenvolvimento de vocabulários, em especial na área de Biomedicina, caracterizada como um domínio inter e multidisciplinar e de temática complexa. Entretanto, apesar de haver propostas metodológicas e de melhores práticas sobre como organizar a estrutura terminológica de ontologias e de suas relações, pouco se explica sobre os métodos adotados para o levantamento terminológico do seu domínio e da delimitação de seu escopo, especialmente considerando o reuso de ontologias. O objetivo desse trabalho é apresentar as bases da Ciência da Informação e da Computação para atividades de reuso em ontologias, como um passo metodológico para a aquisição de conhecimento, visando possibilitar mecanismos para o mapeamento e alinhamento de termos em ontologias no domínio dos Tripanosomatídeos. .
Journal of Decision Systems, 2004
In the context of corporate business intelligence, the use of OLAP (On-Line Analytical Processing... more In the context of corporate business intelligence, the use of OLAP (On-Line Analytical Processing) tools constitutes a fundamental step to improve decision making and to create new knowledge. To help systematization and capture of the reasoning that happens during decision processes, this paper describes a computational environment to complement OLAP tools functionality, called AMPA (the initials for Analytical Processing Memory Environment, in Portuguese). AMPA proposes a structured representation of decision making processes, supporting the capture and retention of decision instances in organizational memory, maintaining not only reports and associated conclusions, but also the context in which they were created. This environment includes a series of sub-process components that guide the execution of decision processes (represented as frameworks) and describe many of its common activities (represented as beans), specially focusing on typical OLAP operations. An application case is described to exemplify the components use.
... ACKNOWLEDGMENTS Marcos RS Borges was partially supported by a grant from CNPq (Brazil) No. ..... more ... ACKNOWLEDGMENTS Marcos RS Borges was partially supported by a grant from CNPq (Brazil) No. ... [12] Oliveira, AC DE ; Araujo, RM DE ; Borges, MRS 2007 Telling Stories about System Use: Capturing Collective Tacit Knowledge for System Maintenance. ...
Reciis, 2009
Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um ... more Resumo: Nos últimos anos tem havido um impulso no número de ontologias produzidas, refletindo um contínuo amadurecimento dos esforços de desenvolvimento de vocabulários, em especial na área de Biomedicina, caracterizada como um domínio inter e multidisciplinar e de temática complexa. Entretanto, apesar de haver propostas metodológicas e de melhores práticas sobre como organizar a estrutura terminológica de ontologias e de suas relações, pouco se explica sobre os métodos adotados para o levantamento terminológico do seu domínio e da delimitação de seu escopo, especialmente considerando o reuso de ontologias. O objetivo desse trabalho é apresentar as bases da Ciência da Informação e da Computação para atividades de reuso em ontologias, como um passo metodológico para a aquisição de conhecimento, visando possibilitar mecanismos para o mapeamento e alinhamento de termos em ontologias no domínio dos Tripanosomatídeos. .
The complexity of developing full fledged groupware systems imposes a heavy demand on the develop... more The complexity of developing full fledged groupware systems imposes a heavy demand on the development team and certainly is a task not to be accomplished in a short period of time. Many research projects opt to develop a specific tool, concentrating on a particular task of ...
... composition. For instance, suppose a tourism Web site is offering a car rental service. Moreo... more ... composition. For instance, suppose a tourism Web site is offering a car rental service. Moreover ... Customers of the Web site may frequently make use of the car rental service or they may often leave the site without renting cars. If the ...
Bioinformatics/computer Applications in The Biosciences, 2005
Growth of genome data and analysis possibilities have brought new levels of difficulty for scient... more Growth of genome data and analysis possibilities have brought new levels of difficulty for scientists to understand, integrate and deal with all this ever-increasing information. In this scenario, GARSA has been conceived aiming to facilitate the tasks of integrating, analyzing and presenting genomic information from several bioinformatics tools and genomic databases, in a flexible way. GARSA is a userfriendly web-based system designed to analyze genomic data in the context of a pipeline. EST and GGS data can be analyzed using the system since it accepts (1) chromatograms, (2) download of sequences from GenBank, (3) Fasta files stored locally or (4) a combination of all three. Quality evaluation of chromatograms, vector removing and clusterization are easily performed as part of the pipeline. A number of local and customizable Blast and CDD analyses can be performed as well as Interpro, complemented with phylogeny analyses. GARSA is being used for the analyses of Trypanosoma vivax (GSS and EST), Trypanosoma rangeli (GSS, EST and ORESTES), Bothrops jararaca (EST), Piaractus mesopotamicus (EST) and Lutzomyia longipalpis (EST).
In silico experiments encompass multiple combinations of program and data resources, which are co... more In silico experiments encompass multiple combinations of program and data resources, which are complex to be managed. Typically, script languages are used due to their ease of use, despite their specifity and difficulty of reuse. In contrast, Web service technology was specially conceived to encapsulate and combine programs and data, providing interoperability, scalability and flexibility issues. We have combined metadata support with Web services within a framework that supports scientific workflows. We have experimented this framework with a real structural genomic workflow, showing its viability and evidencing its advantages.
Scientific Workflow Management Systems (SWfMS) have been helping scientists to prototype and exec... more Scientific Workflow Management Systems (SWfMS) have been helping scientists to prototype and execute in silico experiments. They can systematically collect provenance information for the derived data products to be later queried. Despite the efforts on building a standard Open Provenance Model (OPM), provenance is tightly coupled to SWfMS. Thus scientific workflow provenance concepts, representation and mechanisms are very heterogeneous, difficult to integrate and dependent on the SWfMS. To help comparing, integrating and analyzing scientific workflow provenance, this paper presents a taxonomy about provenance characteristics. Its classification enables computer scientists to distinguish between different perspectives of provenance and guide to a better understanding of provenance data in general. The analysis of existing approaches will assist us in managing provenance data from distributed heterogeneous workflow executions.
Scientific experiments using workflows benefit from mechanisms to trace the generation of results... more Scientific experiments using workflows benefit from mechanisms to trace the generation of results. As workflows start to scale it is fundamental to have access to their underlying processes, parameters and data. Particularly in molecular dynamics (MD) simulations, a study of the interatomic interactions in proteins must use distributed high performance computing environments to produce timely results. Scientist's trust in experiments produced by gathering distributed partial results may be limited without provenance information. This paper presents a service architecture that captures and stores provenance data from distributed, autonomous, replicated and heterogeneous resources. Such provenance data can be used to trace the history of the distributed execution process. These services can be coupled to workflow management systems. The Kepler system was used as a basis to manage a grid workflow application. Experimental results regarding cluster and grid MD simulations were evaluated using the provenance services architecture.
There are many examples where cooperation among scientists takes place by exchanging scientific r... more There are many examples where cooperation among scientists takes place by exchanging scientific resources, such as data, programs and mathematical models. This is particularly true for environmental applications. Finding the right resource to apply in an environmental problem is a difficult task. Usually, this decision is based on previous experience. Scientists have to cooperate in order to solve such problems. To facilitate the exchange, reuse and dissemination of information we propose an architecture for managing distributed scientific resources. Our proposal combines a mediation-based heterogeneous distributed database system and an enhanced metadata support system for effective management of distributed scientific models and data.
International Journal of Data Mining and Bioinformatics, 2010
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormo... more Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Current technologies are continuously evolving and software companies need to adapt their process... more Current technologies are continuously evolving and software companies need to adapt their processes to these changes. Such adaptation often requires new investments in training and development. To address this issue, OMG defined a model driven development approach (MDD) which insulates business and application logic from technology evolution. Current MDD approaches falls short in fully derive implementation from models described at a high abstraction level. We propose a controlled natural language to complement UML models as an action specification language. In this article, we describe the language, its impact on systems development and the tools developed to support it. In order to demonstrate the language usability we present an application example.
In this work we propose the extension of an existing HDDS called LeSelect [13], specially develop... more In this work we propose the extension of an existing HDDS called LeSelect [13], specially developed to support environmental applications. Despite the many HDDS proposals, LeSelect is unique in its features to handle this kind of application, by allowing scientists to share their data ...
Managing bioinformatics experiments is challenging due to the orchestration and interoperation of... more Managing bioinformatics experiments is challenging due to the orchestration and interoperation of tools with semantics. An effective approach for managing those experiments is through workflow management systems (WfMS). We present several WfMS features for supporting genome homology workflows and discuss relevant issues for typical genomic experiments. In our evaluation we used OrthoSearch, a real genomic pipeline originally defined as a Perl script. We modeled it as a scientific workflow and implemented it on Kepler WfMS. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Data & Knowledge Engineering, 2005
In silico scientific experiments encompass multiple combinations of program and data resources. E... more In silico scientific experiments encompass multiple combinations of program and data resources. Each resource combination in an execution flow is called a scientific workflow. In bioinformatics environments, program composition is a frequent operation, requiring complex management. A scientist faces many challenges when building an experiment: finding the right program to use, the adequate parameters to tune, managing input/output data, building and reusing workflows. Typically, these workflows are implemented using script languages because of their simplicity, despite their specificity and difficulty of reuse. In contrast, Web service technology was specially conceived to encapsulate and combine programs and data, providing interoperation between applications from different platforms. The Web services approach is superior to scripts with regard to interoperability, scalability and flexibility issues. We have combined metadata support with Web services within a framework that supports scientific workflows. While most works are focused on metadata issues to manage and integrate heterogeneous scientific data sources, in this work we concentrate on metadata support to program management within workflows. We have used this framework with a real structural genomic workflow, showing its viability and evidencing its advantages.