High performance computing in biomedical informatics (original) (raw)

Parallel Data Mining for Medical Informatics

2010

ABSTRACT As in many fields the data deluge impacts all aspects of Life Sciences from chemistry data in PubChem; genetic sequence data through health records. This data demands analysis and mining algorithms that are both high performance and robust. Further although some of the data can be usefully viewed as points in a vector space; for others it is better just to consider relationships defined just by dissimilarities between points.

Data Mining Algorithms And Medical Sciences

Extensive amounts of data stored in medical databases require the development of dedicated tools for accessing the data, data analysis, knowledge discovery, and effective use of sloretl knowledge and data. Widespread use of medical information systems and explosive enlargement of medical databases require conventional manual data analysis to be coupled with methods for competent computer-assisted analysis. In this paper, I use Data Mining techniques for the data analysis, data accessing and knowledge discovery procedure to show experimentally and practically that how consistent, able and fast are these techniques for the study in the particular field? A solid mathematical threshold (0 to 1) is set to analyze the data. The obtained outcome will be tested by applying the approach to the databases, data warehouses and any data storage of different sizes with different entry values. The results shaped will be of different level from short to the largest sets of tuple. By this, we may take the results formed for different use e.g. Patient investigation, frequency of different disease.

General Framework for Biomedical Knowledge With Data Mining Techniques

2013

Data mining is the process which automates the extraction of predictive information discovers the interesting knowledge from large amounts of data stored in information repositories. Biomedical informatics (BMI) is the science underlying acquisition, maintenance, retrieval, collecting, manipulating, and analysing the biomedical knowledge and information to improve medical data analysis, problem solving, and decision making, inspired by efforts toward progress in medical domain. In this research work a comprehensive framework will be generated which comprises of various data mining techniques and evaluate meaningful information from biomedical data. Data mining field will be applied to biomedical data to analyze the characteristics, identify patterns of interest, for diagnosing and predicting patients' health. These proposed biomedical data mining framework useful to the scholars who are interested in the related researches of data mining and medical domain. Data mining is a repl...

Elementary approach towards Biological Data Mining

International Journal of Trend in Scientific Research and Development

In this paper we provide an overview on interactive and integrative knowledge discovery and data most important challenges, includes the need to develop and apply novel methods, algorithms and tools for the integration, fusion, pre-processing, mapping, analysis and interpretation of complex biomedical data with the aim to identify testable hypotheses, and build real models. The HCI-KDD approach, which is a synergistic combination of methodologies and approaches of two areas, Human-Computer Interaction (HCI) and Knowledge Discovery & Data Mining (KDD), offer ideal conditions towards solving these challenges: with the goal of supporting human intelligence with machine intelligence. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today increasingly large and complex data sets "Big Data". The application of data mining in the domain of bioinformatics is explained. It also highlights some of the current challenges and opportunities of data mining in bioinformatics.

On the Impact of High Performance Computing in Big Data Analytics for Medicine

Applied Medical Informatics, 2020

For a long time, High Performance Computing (HPC) has been critical for running large-scale modeling and simulation using numerical models. The big data analytics domain (BDA) has been rapidly developed over the last years to process huge amounts of data now being generated in various domains. But, in general, the data analytics software was not developed inside the scientific computing community, and new approaches were adopted by BDA specialists. Data-intensive applications are needed in various fields of medicine and healthcare ranges from advanced research— as genomics, proteomics, epidemiology and systems biology—to medical diagnosis and treatments, or to commercial initiatives to develop new drugs. BDA needs the infrastructure and the fundamentals of HPC in order to face with the needed computational challenges. There are important differences in the approaches of these two domains: those that are working in BDA focus on the 5Vs of big data which are: volume, velocity, variety...

Parallelizing the Execution of Native Data Mining Algorithms for Computational Biology

Data mining is being increasingly used in biology. Biologists are adopting prototyping languages, like R and Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts are becoming increasingly complex, and also require frequent updates. Application to large datasets becomes impractical and the time-to-paper increases. Furthermore, even if there are various systems that can be used to efficiently process large datasets, for example using Cloud and High Performance Computing, they usually require procedures to be translated into specific languages or adapted to a certain computing platform. Such modifications can speed up the processing but translation is not automatic, especially in complex cases, and can require a large amount of rogramming effort and accurate validation. In this paper, we propose an approach to parallelize data mining procedures in the form of compiled software or R scripts developed by biology communities of practice. Our approach requires minimal alteration of the original code. In many cases, there is no need for code modification. Furthermore, it allows for fast updating when a new version is ready.We clarify the constraints and the benefits of our method and report a practical use case to demonstrate such benefits compared with a standard execution. Our approach relies on a distributed network of web services and ultimately exposes the algorithms as-a-Service, to be invoked by remote thin clients.

Parallel and distributed computing for data mining

1999

Similar scenarios will occur in other areas: we will see large numbers of radiological images generated in hospitals and immense product and customer databases as the Internet and e-commerce continue to expand. 1 Exploring useful information from such data will require efficient parallel algorithms running on high-performance computing systems with powerful parallel I/O capabilities. Without such systems and techniques, invaluable data and information will remain undiscovered.

Towards an environment for data mining based analysis processes in bioinformatics and personalized medicine

Medicine is undergoing a revolution that is even transforming the nature of health care from reactive to proactive. To serve these new and diverse needs, bioinformatics and data mining are teaming up to generate tools and procedures for prediction of disease recurrence and progression, response to treatment, as well as new insights into various oncogenic pathways by taking into account the user needs and their heterogeneity. The p-medicine consortium (http://www.p-medicine.eu) is creating a biomedical platform to facilitate the translation from current practice to a predictive, personalized, preventive, participatory and psycho-cognitive medicine by integrating VPH models, clinical practice, imaging and omics data. In this paper, we present the challenges for data mining based analysis in bio-and medical informatics and our approach towards a data mining environment addressing these requirements in the p-medicine platform.

Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing

Advanced Information and Knowledge Processing, 2013

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.