permGPU : Using graphics processing units in RNA microarray association studies (original) (raw)

Hierarchical Parallelization of Gene Differential Association Analysis

BMC Bioinformatics, 2011

Background: Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine-and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results: Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions: The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.

Transcription network construction for large-scale microarray datasets using a high-performance computing approach

BMC Genomics, 2008

The advance in high-throughput genomic technologies including microarrays has demonstrated the potential of generating a tremendous amount of gene expression data for the entire genome. Deciphering transcriptional networks that convey information on intracluster correlations and intercluster connections of genes is a crucial analysis task in the post-sequence era. Most of the existing analysis methods for genome-wide gene expression profiles consist of several steps that often require human involvement based on experiential knowledge that is generally difficult to acquire and formalize. Moreover, large-scale datasets typically incur prohibitively expensive computation overhead and thus result in a long experiment-analysis research cycle.

DSIMBench: A Benchmark for Microarray Data Using R

Lecture Notes in Computer Science, 2014

Parallel computing in R has been widely used to analyse microarray data. We have seen various applications using various data distribution and calculation approaches. Newer data storage systems, such as MySQL Cluster and HBase, have been proposed for R data storage; while the parallel computation frameworks, including MPI and MapReduce, have been applied to R computation. Thus, it is difficult to understand the whole analysis workflows for which the tool kits are suited for a specific environment. In this paper we propose DSIMBench, a benchmark containing two classic microarray analysis functions with eight different parallel R workflows, and evaluate the benchmark in the IC Cloud testbed platform.

affyPara-a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data

Bioinformatics and biology insights, 2009

Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is necessary to estimate the rule's prediction quality honestly.This paper proposes the new Bioconductor package affyPara for parallelized preprocessing of Affymetrix microarray data. Partition of data can be applied on arrays and parallelization of algorithms is a straightforward consequence. The partition of data and distribution to several nodes solves the main memory problems and accelerates preprocessing by up to the factor 20 for 200 or more arrays.affyPara is a free and open source package, un...

Bioinformatics Tools Enabling U-Statistics for Microarrays

2006

It is rare that a single gene is sufficient to represent all aspects of genomic activity. Similarly, most common diseases cannot be explained by a mutations at a single locus. Since complex systems tend to be neither linear nor hierarchical in nature, but to have correlated components of unknown relative importance, the assumptions of traditional (parametric) multivariate statistical methods can rarely be justified on theoretical grounds. Empirical "validation" is not only problematic, but also time consuming. Here we demonstrates how bioinformatics tools, ranging from spreadsheets to grids, can enable u-statistics as a non-parametric alternative for scoring multivariate ordinal data. Applications are shown to improve assessment of genetic risk factors, quality control of microarrays and signal value estimation, scoring genomic profiles that best correlated with complex risk factors (cardiovascular diseases), and complex responses to an intervention (treatment of psoriasis).

maigesPack: A Computational Environment for Microarray Data Analysis

Microarray technology is still an important way to assess gene expression in molecular biology. This is mainly because one can measure the expression profiles for thousands of genes at the same time and that makes this technology a good option for some studies focused on systems biology. One of the main problems is that the experimental procedure is complex and the data has several sources of variance, that makes the statistical modeling more difficult. So far, there is no standard protocol for the generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack. The software helps with the data organization and also with the analysis process. Besides that, it makes the analysis more robust, reliable and reproducible. The package aggregates several data analysis procedures reported in the literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks.

Analysis of Genetic Expression with Microarrays using GPU Implemented Algorithms

Computacion Y Sistemas, 2013

DNA microarrays are used to simultaneously analyze the expression level of thousands of genes under multiple conditions; however, massive amount of data is generated making its analysis a challenge and an ideal candidate for massive parallel processing. Among the available technologies, the use of General Purpose computation on Graphics Processing Units (GPGPU) is an efficient cost-effective alternative, compared to a Central Processing Unit (CPU). This paper presents an implementation of algorithms using Compute Unified Device Architecture (CUDA) to determine statistical significance in the evaluation of gene expression levels for a microarray hybridization experiment designed and carried out at the Centro de Investigaciones Biológicas del Noroeste S.C. (CIBNOR). The obtained results are compared to traditional implementations.

Computational Modeling and Analysis of Microarray Data: New Horizons

Microarrays, 2016

High-throughput microarray technologies have long been a source of data for a wide range of biomedical investigations. Over the decades, variants have been developed and sophistication of measurements has improved, with generated data providing both valuable insight and considerable analytical challenge. The cost-effectiveness of microarrays, as well as their fundamental applicability, made them a first choice for much early genomic research and efforts to improve accessibility, quality and interpretation have continued unabated. In recent years, however, the emergence of new generations of sequencing methods and, importantly, reduction of costs, has seen a preferred shift in much genomic research to the use of sequence data, both less 'noisy' and, arguably, with species information more directly targeted and easily interpreted. Nevertheless, new microarray data are still being generated and, together with their considerable legacy, can offer a complementary perspective on biological systems and disease pathogenesis. The challenge now is to exploit novel methods for enhancing and combining these data with those generated by alternative high-throughput techniques, such as sequencing, to provide added value. Augmentation and integration of microarray data and the new horizons this opens up, provide the theme for the papers in this Special Issue.