Survey of using GPU CUDA programming model in medical image analysis (original) (raw)

Abstract

A B S T R A C T With the technology development of medical industry, processing data is expanding rapidly and computation time also increases due to many factors like 3D, 4D treatment planning, the increasing sophistication of MRI pulse sequences and the growing complexity of algorithms. Graphics processing unit (GPU) addresses these problems and gives the solutions for using their features such as, high computation throughput, high memory bandwidth, support for floating-point arithmetic and low cost. Compute unified device architecture (CUDA) is a popular GPU programming model introduced by NVIDIA for parallel computing. This review paper briefly discusses the need of GPU CUDA computing in the medical image analysis. The GPU performances of existing algorithms are analyzed and the computational gain is discussed. A few open issues, hardware configurations and optimization principles of existing methods are discussed. This survey concludes the few optimization techniques with the medical imaging algorithms on GPU. Finally, limitation and future scope of GPU programming are discussed. 1. Introduction Computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET) and ultrasound are famous medical modalities that produce the 2D, 3D and 4D types of medical images which are guiding the diagnosis process and treatment planning. The medical image processing and analysis are computationally expensive while medical imaging data dimension increasing [1]. The conventional CPU with limited multi-core is not sufficient to process these types of huge data. Graphics processing unit (GPU) is a new technology capable for finding out solutions to the computational problems in all the engineering and medical fields. In the medical industry, GPU is more suitable for processing the higher dimension data. GPU computation has provided a huge edge over the central processing units (CPU) with respect to computation speed. GPU is highly parallel, multithread, multiple core processors and has high memory bandwidth to give the solution to the computational problems [2]. The main reason for the evolution of powerful GPUs is the constant demand for greater realism in computer games. During the past few decades, the computational performance of GPUs has increased much more quickly than that of conventional CPUs. Hence it plays a major role in the field of modern industrial research and development. GPU has already achieved a significant speed (2x-1000x) than CPU implementation on various fields [3] [4] [5]. GPU is well suitable to implement the program execution with the different data elements. This process is called as data parallelism. Data parallelism is maps data elements to parallel threads available in GPU [6]. Data parallelism gives high gains in independent processes between data elements. The prime areas of data parallelism are 3D rendering, stereo vision, pattern recognition, image, video and medical industry applications. A large performance gap occurs between GPU and general purpose multi-core CPU. Architectural level comparison of CPU and GPU are given in Fig. 1. The design of a CPU is optimized for sequential programming. It makes use of sophisticated control logic to allow instructions from a single thread of execution to execute in parallel or even out of their sequential order while maintaining the appearance of sequential execution. Modern CPU microprocessors typically have four large processor cores designed to deliver strong sequential code performance but not enough to process the huge data. A basic model of GPU has large number of processor cores, ALU's, control units and various types of memories. In general, heterogeneous CPU and GPU computation is appreciable instead of standalone CPU or GPU implementation. The dependent processes are recommended in CPU and the independent processes can be accelerated by the GPU. GPU with high amount of threads give better performance. This paper reviews the implication of GPU programming model in medical image analysis and illustrated some applications with examples. The general framework of medical image analysis pipeline is given in Fig. 2. The computational complexities of all these fields are increasing

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (57)

Rodger JA. Discovery of medical big data analytics: improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid hadoop hive. Inf Med Unlocked 2015;1:17-26.
CUDA C. Programming guide. Technical report. NVIDIA; 2017., Version 8.0. .
Ghorpade J, Parande J, Kulkarni M, Bawaskar A. GPGPU processing in CUDA architecture. Adv Comput Int J (ACIJ) 2012;3(1):105-20.
Farber Rob. CUDA application design and development. first ed. Elsevier; 2011. p. 1-336.
David BK, Kaufmann WH. Praise of programming massively parallel Processors: a hands-on approach. second ed. Elsevier; 2012. p. 1-514.
Lippuner J, Elbakri IA. A GPU implementation of EGSnrc's Monte Carlo photon transport for imaging applications. Phys Med Biol 2011;56(22):7145-62.
Deserno TM, Handels H, Maier-Hein KH, Mersmann S, Palm C, Tolxdorff T, et al. Viewpoints on medical image processing: from science to application. Curr Med Imaging Rev 2013;9(2):79-88.
Ouahabi AA. Review of wavelet denosing in medical imaging. In: Proceedings of the 8th international workshop on systems, signal processing and their applications. IEEE; 2013. p. 19-26.
Eklund A, Dufort P, Forsberg D, LaConte SM. Medical image processing on the GPU -past, present and future. Med Image Anal 2013;17(8):1073-94.
Li CY, Chang HH. CUDA-based acceleration of collateral filtering in brain MR images. In: Eighth international conference on graphic and image processing, 10225. International Society for Optics and Photonics; 2017.
Jaros M, Strakos P, Karasek T, Ríha L, Vasatova A, Jarosova M, et al. Implementation of K-means segmentation algorithm on Intel xeon phi and GPU: application in medical imaging. Adv Eng Softw 2017;103:21-8.
Keceli AS, Can AB, Kaya A. A GPU-based approach for automatic segmentation of white matter lesions. IETE J Res 2017;63(3):461-72.
Knutsson HE, Wilson R, Granlund GH. Anisotropic non-stationary image estimation and its applications-part I: restoration of noisy images. IEEE Trans Commun 1983; 31(3):388-97.
Apolinario JA, Netto SL. Introduction to adaptive filters, QRS-RLS adaptive filtering. Springer; 2009. p. 1-27 [Chapter 2].
Eklund A, Andersson M, Knutsson H. True 4D image denoising on the GPU. Int J Biomed Imaging 2011:1-16.
Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 1990;12(7):629-39.
Wang N, Chen W, Feng Q. Angiogram images enhancement method based on GPU. World Congr Med Phys Biomed Eng 2012;39:868-71.
Attia MH, Elshehaby SA, Elmaghraby AS. Implementation of edge-enhancement nonlinear anisotropic diffusion filtering using different CUDA memory models. In: Proceedings of the international symposium on signal processing and information technology (ISSPIT). IEEE; 2016. p. 501-4.
Tomasi C, Manduchi R. Bilateral filtering for gray and colour images. In: Proceedings of the international conference on computer vision. IEEE; 1998. p. 839-46.
Staal LK. Bilateral filtering with CUDA. University of Aarhus; 2012.
Jiang F, Shi D, Liu DC. Fast adaptive ultrasound speckle reduction with bilateral filter on CUDA. In: Proceedings of the international conference on bioinformatics and biomedical engineering. IEEE; 2011.
Howison M. Comparing GPU implementations of bilateral and anisotropic diffusion filters for 3D biomedical datasets, SIAM conferences of imaging science, 2010.
McConnel Brain Imaging Center, http://www.bic.mni.mcgill.ca/brainweb, Last Accessed on 21st June 2017.
Bovik A. The essential guide to video processing. first ed. USA: Academic Press; 2009. p. 1-778.
Buades A, Coll B, Morel M. Image denoising methods. A new nonlocal principle. SIAM Rev 2010;52(1):113-47.
Cuomo S, Michele PD, Piccialli F. 3D data denoising via nonlocal means filter by using parallel GPU strategies. Comput Math Methods Med 2014:1-14.
Nguyen T, Nakib A, Nguyen H. Medical image denoising via optimal implementation of non-local means on hybrid parallel architecture. Comput Methods Programs Biomed 2016;129:29-39.
Hill DLG, Batchelor PG, Holden M, Hawkes DJ. Medical image registration. Phys Med Biol 2001;46(3):R1-45.
Fluck O, Vetter C, Wein W, Kamen A, Preim B, Westermann RA. Survey of medical image registration on graphics hardware. Comput Methods Programs Biomed 2011; 104(3):e45-57.
Coatelen J, Qin Y, Dowson N, Barra V, Caux J. Image registration on GPU. ISIMA - University of Blaise Pascal -CSIRO; 2011. p. 1-47. Technical report.
Massanes F, Cadennes M, Brankov JG. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards. J Electron Imaging 2011;20(3):1-10.
Li M, Xiang Z, Xiao L, Castillo E, Castillo R and Guerrero T. GPU-accelerated block matching algorithm for deformable registration of lung CT images, Proceedings of the international conference on progress in informatics and computing, pp. 292-295, 2016.
Tamaki T, Abe M, Raytchev B, Kaneda K. Softassign and EM-ICP on GPU. In: Proceedings of the international conference on networking and computing. IEEE; 2010. p. 179-83.
Olmedo E, Calleja J, Benitez A, Medina MA. Point to point processing of digital images using parallel computing. IJCSI Int J Comput Sci Issues 2012;9(3):1-10.
Pratt WK. Digital image processing. fourth ed. Los Altos, California: John Wiley & Sons, Inc.; 2007.
Park S, Lee J, Lee H, Shin J, Seo J, Lee KH, et al. Parallelized seeded region growing using CUDA. 2014. p. 1-10.
Westhoff A M. Hybrid parallelization of a seeded region growing segmentation of brain images for a GPU cluster, Proceedings of the international conferences on architecture of computing systems, 2014.
Ravi S, Khan AM. Morphological operations for image processing: understanding and its applications. In: Proceedings of the national conference on VLSI, signal processing & communications; 2013. p. 17-9.
Serra J. Introduction to mathematical morphology. Comput Vis Graph Image Process 1986;35(3):283-305.
Kalaiselvi T, Sriramakrishnan P, Somasundaram K. Performance analysis of morphological operations in CPU and GPU for accelerating digital image applications. Int J Comput Sci Inf Technol 2016;4(1):15-27.
Koay JM, Chang YC, Tahir SM, Sreeramula S. Parallel implementation of morphological operations on binary images using CUDA. Adv Mach Learn Signal Process 2016;387:163-73.
Vincent L, Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 1991;13(6):583-98.
Pan L, Gu L, Xu J. Implementation of medical image segmentation in CUDA. In: Proceedings of the international conference on technology and applications in biomedicine. IEEE; 2008. p. 82-5.
Vitor G, Ferreira J, Korbes A. Fast image segmentation by watershed transform on graphical hardware. In: Proceedings of 30th CILAMCE; 2009.
T. Kalaiselvi et al. Informatics in Medicine Unlocked 9 (2017) 133-144
Shi L, Liu W, Zhang H, Xie Y, Wang DA. Survey of GPU-based medical image computing techniques. Quant Imaging Med Surg 2012;2(3):188-206.
Jayaram K Udupa, Hung H, Chuang K. Surface and volume rendering in three- dimensional imaging: a comparison. J Digital Imaging 1991;4(3):159-68.
Kalaiselvi T, Sriramakrishnan P, Nagaraja P. Brain tumor boundary detection by edge indication map using Bi-Modal fuzzy histogram thresholding technique from MRI T2-weighted scans. Int J Image, Graph Signal Process 2016;8(9):51-9.
Lorensen W, Cline H. Marching cubes: a high resolution 3D surface construction algorithm. Proc 14th Annu Conf Comput Graph Interact Tech 1987;21(4):163-9.
Smistad E, Elster A C, and Lindseth F. Fast surface extraction and visualization of medical images using OpenCL and GPUs, Workshop on high performance and distributed computing for medical imaging, 2011.
Ling T, Zhi-Yu Q. An improved fast ray casting volume rendering algorithm of medical image. In: Proceedings of the international conference on biomedical engineering and informatics. IEEE; 2011. p. 109-12.
Weinlich A, Keck B, Scherl H, Kowarschik M and Hornegger J. Comparison of high- speed ray casting on GPU using CUDA and OpenGL, Proceedings of the international workshop on new frontiers in high-performance & hardware-aware computing, pp.25-30, 2008.
Zhang Q, Eagleson R, Peters TM. Dynamic real-time 4D cardiac MDCT image display using GPU-accelerated volume rendering. Comput Med Imaging Graph 2009;33(6):461-76.
BRATS 2012 database, http://www2.imm.dtu.dk/projects/BRATS2012/, Last accessed 21st June 2017.
Doctor, Software purchased under DST project sanction, Principle Investigator, Kalaiselvi T, Department of Computer Science and Applications, The Gandhigram Rural Institute.
Zhu L. Accelerating content-based image retrieval via GPU-adaptive index structure. Sci World J 2014:1-12.
Sinnott-Armstrong NA, Granizo-Mackenzie D, Moore JH. High performance parallel disease detection: an artificial immune system for graphics processing units. Computational Genetics Laboratory Dartmouth Medical School Lebanon; 2010.