A Detailed Survey on Approaches of Phylogenetic Analysis (original) (raw)

Hadoop Mapreduce Based Distributed Phylogenetic Analysis

International Journal of Computer Scien ce Trends and Technology (IJCST), 2016

Phylogenetic analysis is most important in scientific research of evolution of life, it is a measure of footprints between organisms and analysis requires multiple sequence alignment as input. Even though algorithms such as Needle-Wunsch Algorithm (NWA) and Smith-Waterman Algorithm (SWA) produce accurate alignments but they are not applicable to larger length genome sequence that increases computational complexity. The proposed approach uses complete composition vector (CCV) to represent each sequence as vector derived from K-mere by passing for multiple sequence alignment and Unweighted Pair Group Method with Arithmetic mean (UPGMA) which produces tree. The aim is to improve and optimize the performance of phylogenetic analysis for large sequence data by map reduce programming model.

Multiple Sequence Alignment Based Method for Construction of Phylogenetic Trees

IJCSMC, 2019

Due to the importance of DNA (genetic material) and protein sequences, their comparison becomes the major part of biology. But the presence of large and complex datasets of biological information requires an efficient computational methodology to handle them. The sequence comparison facilitates identification of genes and conserved sequence patterns to infer the evolutionary relationship among different species. This paper uses Multiple Sequence Alignment (MSA) method that aligns multiple sequences at a time to depict phylogeny. The p53 protein sequences of ten different species are loaded from the NCBI (National Center for Biotechnology Information) databank in the FASTA format. Based on the evolutionary distances of these species, two phylogenetic trees are constructed for the two divided parts of this dataset. A single tree is generated by joining two trees using pruning method. To obtain an optimal alignment, each sequence in the pruned alignment is locally aligned with the consensus sequence. The minimum optimal alignment is obtained after performing left and right shift operations.

Phylogenetics Algorithms and Applications

Advances in Intelligent Systems and Computing

Phylogenetics is a powerful approach in finding evolution of current day species. By studying phylogenetic trees, scientists gain a better understanding of how species have evolved while explaining the similarities and differences among species. The phylogenetic study can help in analysing the evolution and the similarities among diseases and viruses, and further help in prescribing their vaccines against them. This paper explores computational solutions for building phylogeny of species along with highlighting benefits of alignment-free methods of phylogenetics. The paper has also discussed the application of phylogenetic study in disease diagnosis and evolution.

Mathematical Understanding of Sequence Alignment and Phylogenetic Algorithms: A Comprehensive Review of Methods

2020

Context: Pairwise sequence alignment is one of the ways to arrange two biological sequences to identify regions of resemblance that may suggest the functional, structural, and/or evolutionary relationship (proteins or nucleic acids) between the sequences. There are two strategies in pairwise sequence alignment: Local sequence Alignment (Smith-waterman algorithm) and Global sequence Alignment (Needleman-Wunsch algorithm). In local sequence alignment, two sequences that may or may not be related are aligned to find regions of local similarities in large sequences whereas in global sequence alignment, two sequences same in length are aligned to identify conserved regions. Similarities and divergence between biological sequences identified by sequence alignment also have to be rationalized and visualized in the sense of phylogenetic trees. The phylogenetic tree construction methods are divided into distance-based and characterbased methods. Evidence Acquisition: In this article, differe...

A Comparative study of Multiple Sequence Alignment Tools to construct Phylogenetic Trees

Phylogenetic tree is a branched structure which represents the evolutionary relationships among genes and organisms. Multiple sequence alignment is an initial step in constructing a phylogenetic tree. The most widely used tools for phylogenetic analysis i.e. PHYLIP (Phylogeny Inference Package) and PAUP (Phylogenetic analysis using parsinomy) have so far been used for inferring phylogenies. However, the above referred packages inturn had to rely on other tools for input. In this context, many open source MSA tools are available for generating both multiple sequence alignment and phylogenetic tree. The purpose of the present paper is to highlight various open source MSA tools for constructing phylogenetic trees using distance based methods after generating the alignment. A comparative study of five MSA tools Geneious, ClustalX, DNAMAN, STRAP and MUSCLE is presented here with a motive of creating awareness among bioinformaticians about MSA tools that helps in constructing phylogenetic trees.

A Review on Phylogenetic Analysis: A Journey through Modern Era

Phylogenetic analysis may be considered to be a highly reliable and important bioinformatics tool. The importance of phylogenetic analysis lies in its simple manifestation and easy handling of data. The simple tree representation of the evolution makes the phylogenetic analysis easier to comprehend and represent as well. The varied applications of phylogenetics in different fields of biology make this analysis an absolute necessity. The different aspects of phylogenetic analysis have been described in a comprehensive manner. This review may be useful to those who would like to have a firsthand knowledge of phylogenetics.

A framework for phylogenetic sequence alignment

Plant Systematics and Evolution, 2009

A phylogenetic alignment differs from other forms of multiple sequence alignment because it must align homologous features. Therefore, the goal of the alignment procedure should be to identify the events associated with the homologies, so that the aligned sequences accurately reflect those events. That is, an alignment is a set of hypotheses about historical events rather than about residues, and any alignment algorithm must be designed to identify and align such events. Some events (e.g., substitution) involve single residues, and our current algorithms can successfully align those events when sequence similarity is great enough. However, the other common events (such as duplication, translocation, deletion, insertion and inversion) can create complex sequence patterns that defeat such algorithms. There is therefore currently no computerized algorithm that can successfully align molecular sequences for phylogenetic analysis, except under restricted circumstances. Manual re-alignment of a preliminary alignment is thus the only feasible contemporary methodology, although it should be possible to automate such a procedure.

A weighting system and aigorithm for aligning many phylogenetically related sequences

Bioinformatics, 1995

Most multiple sequence alignment programs explicitly or implicitly try to optimize some score associated with the resulting alignment. Although the sum-of-pairs score is currently most widely used, it is inappropriate when the phylogenetic relationships among the sequences to be aligned are not evenly distributed, since the contributions of densely populated groups dominate those of minor members. This paper proposes an iterative multiple sequence alignment method which optimizes a weighted sum-of-pairs score, in which the weights given to individual sequence pairs are adjusted to compensate for the biased contributions. A simple method that rapidly calculates such a set of weights for a given phylogenetic tree is presented. The multiple sequence alignment is refined through partitioning and realignment restricted to the edges of the tree. Under this restriction, profile-based fast and rigorous group-to-group alignment is achieved at each iteration, rendering the overall computational cost virtually identical to that using an unweighted score. Consistency of nearly 90% was attained between structural and sequence alignments of multiple divergent globins, confirming the effectiveness of this strategy in improving the quality of multiple sequence alignment.

A new sequence distance measure for phylogenetic tree construction

Bioinformatics, 2003

Motivation: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. Results: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets.