Hadoop Mapreduce Based Distributed Phylogenetic Analysis (original) (raw)

Phylogenetic analysis is most important in scientific research of evolution of life, it is a measure of footprints between organisms and analysis requires multiple sequence alignment as input. Even though algorithms such as Needle-Wunsch Algorithm (NWA) and Smith-Waterman Algorithm (SWA) produce accurate alignments but they are not applicable to larger length genome sequence that increases computational complexity. The proposed approach uses complete composition vector (CCV) to represent each sequence as vector derived from K-mere by passing for multiple sequence alignment and Unweighted Pair Group Method with Arithmetic mean (UPGMA) which produces tree. The aim is to improve and optimize the performance of phylogenetic analysis for large sequence data by map reduce programming model.