Juan M. Orduña | Universitat de València (original) (raw)
Papers by Juan M. Orduña
Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Nex... more Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The exist-ing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows–Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith–Waterman algorithm, which is exclusively employed to...
Networks of Workstations have become a cost-effective alternative to parallel computers. The char... more Networks of Workstations have become a cost-effective alternative to parallel computers. The characterization of these networks results quite difficult, since the traditional parameters used for regular topologies (node degree, diameter, average distance, etc.) do not provide information about the arrangement of the links. In this paper, we propose a new model of communication cost between network nodes. This model takes into account both the network topology and the routing algorithm, but it does not depend on the traffic pattern generated by the application running on the machine. The evaluation results show that our communication cost model is highly correlated with network performance. Since it provides a metric based on internode distance, our model can be used as the basis for both an efficient characterization of networks as well as an efficient mapping of processes to processors. 1
Modern scientific collaborations, like the ATLAS experiment at CERN, produce large amounts of dat... more Modern scientific collaborations, like the ATLAS experiment at CERN, produce large amounts of data that need cataloging to meet multiple use cases and search criteria. Challenges arise in indexing and collecting billions of events, or particle collisions, from hundred of grid sites worldwide. In addition we face challenges in the organization of the data storage layer of the catalog, that should be capable of handling mixed OLTP (high-volume transaction processing updates ) and OLAP (real-time analytical queries) use cases. In order to overcome the challenge on the distributed data collection of events, we have designed and implemented a distributed producer/consumer architecture, based on an Object Store as a shared storage, and with dynamic data selection. Producers run at hundreds of grid sites worldwide indexing millions of files summing up Petabytes of data, and store a small quantity of metadata per event in an ObjectStore. Then a reference to the data is sent to a supervisor,...
The use of authoring tools has become a valuable trend for the fast development of Augmented Real... more The use of authoring tools has become a valuable trend for the fast development of Augmented Reality (AR) applications in industrial organizations. However, most of current AR authoring tools are actually programming interfaces that are exclusively suitable for programmers, and they do not provide advanced visual effects. In this paper, we propose an easy-to-use AR authoring tool oriented to the development of AR applications for the execution of industrial sequential procedures. Unlike other recent easy-to-use AR authoring tools, this software framework allows non-programming users to develop low-cost AR applications, including occlusion capabilities, by means of the use of a Kinect sensor. The evaluation results show that overlaying 3D instructions on the actual work pieces reduces the error rate for an assembly task by more than a 75%, particularly diminishing cumulative errors common in sequential procedures. Also, the results show that the time required by non-programming users...
Journal of Grid Computing
The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The ex... more The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousands of physics users. The ATLAS EventIndex project, currently running in production, builds a complete catalogue of particle collisions, or events, for the ATLAS experiment at the LHC. The distributed nature of the experiment data model is exploited by running jobs at over one hundred Grid data centers worldwide. Millions of files with petabytes of data are indexed, extracting a small quantity of metadata per event, that is conveyed with a data collection system in real time to a central Hadoop instance at CERN. After a successful first implementation based on a messaging system, some issues suggested performance bottlenecks for the challenging higher rates in next runs of the experiment. In this work we characterize the weaknesses of the previous messaging system, regarding co...
The International Journal of High Performance Computing Applications
DNA methylation (mC) and hydroxymethylation (hmC) can significantly affect the normal human devel... more DNA methylation (mC) and hydroxymethylation (hmC) can significantly affect the normal human development, as well as health and disease status. hmC studies require not only specific treatment of DNA, but also software tools for their analysis. However, there are no software tools capable of analyzing DNA hmC currently. In this article, we propose HPG-HMapper, a parallel software tool for analyzing the DNA hmC data obtained by ten-eleven translocation–assisted bisulfite sequencing. This tool takes as input data the output files of mC aligner tools, and it yields mC maps and the accounting of methylated and hydroxymethylated bases on each chromosome. The design of this tool includes the consideration of different approaches, one of them based on binary trees and the other one based on bit arrays. The performance evaluation results show that a hybrid implementation is the most efficient option, allowing fast read and update operations while keeping a small memory footprint.
The Journal of Supercomputing, 2015
Interactive 3D terrain visualization plays an important role in multiple networked applications l... more Interactive 3D terrain visualization plays an important role in multiple networked applications like virtual worlds visualization, multiplayer games or distributed simulators. Since the client/server architecture has obvious scalability limitations, different peer-to-peer schemes have been proposed as trade-off solutions that yield good robustness, availability and scalability for this kind of systems. In this paper, we propose a new hybrid distributed architecture that significantly improves the scalability and performance of the existing proposals. The proposed scheme redesigns the relationships between the different elements of a hybrid architecture, modifies the information shared by each client with its neighboring peers and varies the messages exchanged among them, providing a larger number of users with a fluid navigation experience over a large virtual terrain.
Proceedings 2000 International Conference on Parallel Processing, 2000
The Journal of Supercomputing, 2014
Journal of Network and Computer Applications, 2009
Applied Soft Computing, 2010
La propuesta de trabajar por competencias en lugar de hacerlo como se hacia tradicionalmente, por... more La propuesta de trabajar por competencias en lugar de hacerlo como se hacia tradicionalmente, por objetivos, ha hecho proliferar propuestas y alternativas metodologicas para favorecer un cambio en la Educacion Superior. Una de estas metodologias es el aprendizaje basado en problemas (ABP). El ABP es una metodologia de aprendizaje en la cual el punto de partida es un problema construido por el profesor que permite al estudiante identificar necesidades para comprender mejor ese problema/situacion, identificar principios que sustentan el conocimiento y cumplir objetivos de aprendizaje relacionados con cada porcion del contenido de la materia. En el caso de las tecnologias de la informacion y las comunicaciones es especialmente relevante, ya que permite desarrollar simultaneamente conocimientos teoricos y estrategias para resolver problemas practicos en pequenos grupos, similares a los que se encuentran en la practica profesional. Esta metodologia se ha implementado en la asignatura de ...
Communications in Computer and Information Science, 2013
Augmented Reality (AR) applications have been emerged last years as a valuable tool for saving si... more Augmented Reality (AR) applications have been emerged last years as a valuable tool for saving significant costs in maintenance and process control tasks in industry. This trend has been stimulated by the appearance of authoring tools that allow the fast and easy development of AR applications. However, most of current AR authoring tools are actually programming interfaces that are exclusively suitable for programmers, and they do not provide advanced visual effects such as occlusion or object collision detection.
Computational Science and Its Applications, 2006
Abstract. In recent years, distributed virtual environments (DVEs) have become a major trend in d... more Abstract. In recent years, distributed virtual environments (DVEs) have become a major trend in distributed applications, mainly due to the enormous,popularity of multiplayer online games,in the entertainment industry. Although the workload generated by avatars in a DVE system has already been characterized, the special features of multiplayer online games make,these applications to require a particular workload characterization. This paper presents
Cluster Computing, 2000. …, 2000
Many research activities have focused on the problem of task scheduling in heterogeneous systems ... more Many research activities have focused on the problem of task scheduling in heterogeneous systems from the computa-tional point of view. Howevel; an ideal scheduling strategy would also take into account the communication require-ments of the applications and the ...
Electronics
The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the unders... more The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the understanding of genetic diseases related to abnormal cell behavior. DNA methylation analysis tools have become especially relevant in recent years. However, these tools have a high computational cost and some of them require the configuration of specific hardware and software, extending the time for research and diagnosis. In previous works, we proposed some tools for DNA methylation analysis and a new tool, called HPG-DHunter, for the detection and visualization of Differentially Methylated Regions (DMRs). Even though this tool offers a user-friendly interface, its installation and maintenance requires the information technology knowledge specified above. In this paper, we propose our tool as a web-based application, which allows biomedical researchers the use of a powerful tool for methylation analysis, even for those not specialized in the management of Graphics Processing Units (GPUs) and...
Computational and Mathematical Methods
Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Nex... more Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The exist-ing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows–Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith–Waterman algorithm, which is exclusively employed to...
Networks of Workstations have become a cost-effective alternative to parallel computers. The char... more Networks of Workstations have become a cost-effective alternative to parallel computers. The characterization of these networks results quite difficult, since the traditional parameters used for regular topologies (node degree, diameter, average distance, etc.) do not provide information about the arrangement of the links. In this paper, we propose a new model of communication cost between network nodes. This model takes into account both the network topology and the routing algorithm, but it does not depend on the traffic pattern generated by the application running on the machine. The evaluation results show that our communication cost model is highly correlated with network performance. Since it provides a metric based on internode distance, our model can be used as the basis for both an efficient characterization of networks as well as an efficient mapping of processes to processors. 1
Modern scientific collaborations, like the ATLAS experiment at CERN, produce large amounts of dat... more Modern scientific collaborations, like the ATLAS experiment at CERN, produce large amounts of data that need cataloging to meet multiple use cases and search criteria. Challenges arise in indexing and collecting billions of events, or particle collisions, from hundred of grid sites worldwide. In addition we face challenges in the organization of the data storage layer of the catalog, that should be capable of handling mixed OLTP (high-volume transaction processing updates ) and OLAP (real-time analytical queries) use cases. In order to overcome the challenge on the distributed data collection of events, we have designed and implemented a distributed producer/consumer architecture, based on an Object Store as a shared storage, and with dynamic data selection. Producers run at hundreds of grid sites worldwide indexing millions of files summing up Petabytes of data, and store a small quantity of metadata per event in an ObjectStore. Then a reference to the data is sent to a supervisor,...
The use of authoring tools has become a valuable trend for the fast development of Augmented Real... more The use of authoring tools has become a valuable trend for the fast development of Augmented Reality (AR) applications in industrial organizations. However, most of current AR authoring tools are actually programming interfaces that are exclusively suitable for programmers, and they do not provide advanced visual effects. In this paper, we propose an easy-to-use AR authoring tool oriented to the development of AR applications for the execution of industrial sequential procedures. Unlike other recent easy-to-use AR authoring tools, this software framework allows non-programming users to develop low-cost AR applications, including occlusion capabilities, by means of the use of a Kinect sensor. The evaluation results show that overlaying 3D instructions on the actual work pieces reduces the error rate for an assembly task by more than a 75%, particularly diminishing cumulative errors common in sequential procedures. Also, the results show that the time required by non-programming users...
Journal of Grid Computing
The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The ex... more The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousands of physics users. The ATLAS EventIndex project, currently running in production, builds a complete catalogue of particle collisions, or events, for the ATLAS experiment at the LHC. The distributed nature of the experiment data model is exploited by running jobs at over one hundred Grid data centers worldwide. Millions of files with petabytes of data are indexed, extracting a small quantity of metadata per event, that is conveyed with a data collection system in real time to a central Hadoop instance at CERN. After a successful first implementation based on a messaging system, some issues suggested performance bottlenecks for the challenging higher rates in next runs of the experiment. In this work we characterize the weaknesses of the previous messaging system, regarding co...
The International Journal of High Performance Computing Applications
DNA methylation (mC) and hydroxymethylation (hmC) can significantly affect the normal human devel... more DNA methylation (mC) and hydroxymethylation (hmC) can significantly affect the normal human development, as well as health and disease status. hmC studies require not only specific treatment of DNA, but also software tools for their analysis. However, there are no software tools capable of analyzing DNA hmC currently. In this article, we propose HPG-HMapper, a parallel software tool for analyzing the DNA hmC data obtained by ten-eleven translocation–assisted bisulfite sequencing. This tool takes as input data the output files of mC aligner tools, and it yields mC maps and the accounting of methylated and hydroxymethylated bases on each chromosome. The design of this tool includes the consideration of different approaches, one of them based on binary trees and the other one based on bit arrays. The performance evaluation results show that a hybrid implementation is the most efficient option, allowing fast read and update operations while keeping a small memory footprint.
The Journal of Supercomputing, 2015
Interactive 3D terrain visualization plays an important role in multiple networked applications l... more Interactive 3D terrain visualization plays an important role in multiple networked applications like virtual worlds visualization, multiplayer games or distributed simulators. Since the client/server architecture has obvious scalability limitations, different peer-to-peer schemes have been proposed as trade-off solutions that yield good robustness, availability and scalability for this kind of systems. In this paper, we propose a new hybrid distributed architecture that significantly improves the scalability and performance of the existing proposals. The proposed scheme redesigns the relationships between the different elements of a hybrid architecture, modifies the information shared by each client with its neighboring peers and varies the messages exchanged among them, providing a larger number of users with a fluid navigation experience over a large virtual terrain.
Proceedings 2000 International Conference on Parallel Processing, 2000
The Journal of Supercomputing, 2014
Journal of Network and Computer Applications, 2009
Applied Soft Computing, 2010
La propuesta de trabajar por competencias en lugar de hacerlo como se hacia tradicionalmente, por... more La propuesta de trabajar por competencias en lugar de hacerlo como se hacia tradicionalmente, por objetivos, ha hecho proliferar propuestas y alternativas metodologicas para favorecer un cambio en la Educacion Superior. Una de estas metodologias es el aprendizaje basado en problemas (ABP). El ABP es una metodologia de aprendizaje en la cual el punto de partida es un problema construido por el profesor que permite al estudiante identificar necesidades para comprender mejor ese problema/situacion, identificar principios que sustentan el conocimiento y cumplir objetivos de aprendizaje relacionados con cada porcion del contenido de la materia. En el caso de las tecnologias de la informacion y las comunicaciones es especialmente relevante, ya que permite desarrollar simultaneamente conocimientos teoricos y estrategias para resolver problemas practicos en pequenos grupos, similares a los que se encuentran en la practica profesional. Esta metodologia se ha implementado en la asignatura de ...
Communications in Computer and Information Science, 2013
Augmented Reality (AR) applications have been emerged last years as a valuable tool for saving si... more Augmented Reality (AR) applications have been emerged last years as a valuable tool for saving significant costs in maintenance and process control tasks in industry. This trend has been stimulated by the appearance of authoring tools that allow the fast and easy development of AR applications. However, most of current AR authoring tools are actually programming interfaces that are exclusively suitable for programmers, and they do not provide advanced visual effects such as occlusion or object collision detection.
Computational Science and Its Applications, 2006
Abstract. In recent years, distributed virtual environments (DVEs) have become a major trend in d... more Abstract. In recent years, distributed virtual environments (DVEs) have become a major trend in distributed applications, mainly due to the enormous,popularity of multiplayer online games,in the entertainment industry. Although the workload generated by avatars in a DVE system has already been characterized, the special features of multiplayer online games make,these applications to require a particular workload characterization. This paper presents
Cluster Computing, 2000. …, 2000
Many research activities have focused on the problem of task scheduling in heterogeneous systems ... more Many research activities have focused on the problem of task scheduling in heterogeneous systems from the computa-tional point of view. Howevel; an ideal scheduling strategy would also take into account the communication require-ments of the applications and the ...
Electronics
The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the unders... more The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the understanding of genetic diseases related to abnormal cell behavior. DNA methylation analysis tools have become especially relevant in recent years. However, these tools have a high computational cost and some of them require the configuration of specific hardware and software, extending the time for research and diagnosis. In previous works, we proposed some tools for DNA methylation analysis and a new tool, called HPG-DHunter, for the detection and visualization of Differentially Methylated Regions (DMRs). Even though this tool offers a user-friendly interface, its installation and maintenance requires the information technology knowledge specified above. In this paper, we propose our tool as a web-based application, which allows biomedical researchers the use of a powerful tool for methylation analysis, even for those not specialized in the management of Graphics Processing Units (GPUs) and...
Computational and Mathematical Methods