A comparative analysis of smith-waterman based partial alignment (original) (raw)

Alignment results of Anchor-Flood algorithm for OAEI-2008

The 7th International Semantic Web Conference, 2008

Alignment Results of Anchor-Flood Algorithm for OAEI-2008 Md. Hanif Seddiqui and Masaki Aono Toyohashi University of Technology, Aichi, Japan hanif@ kde. ics. tut. ac. jp, aono@ ics. tut. ac. jp Abstract. Our proposed algorithm called Anchor-Flood algorithm, starts off ...

GEM-TN-92-150 Analysis of an Alignment Scheme for

An analysis package has been developed to investigate an alignment scheme defined for the GEM muon barrel. Individual chambers can be arbitrarily translated and/or rotated, and sagitta corrections are calculated from a system of alignment monitors. The sagitta error, correction, and residual are plotted (vs. θ and φ) for straight-line tracks arising from the IP. Using this package, intuition may be developed into the accuracy and nature of particular alignment corrections. Several effects have been found to limit the error correction to the percent or permil levels for certain chamber deflections, which may become significant for large misalignments.

Extending alignments with k-mismatches and ℓ-gaps

Theoretical Computer Science, 2014

Recently, the problem of extending an alignment with k-mismatches and a single gap for pairwise sequence alignment was introduced (Flouri et al., 2011). The authors considered the problem of extending an alignment under the Hamming distance model by also allowing the insertion of a single gap; and presented a Θ(mβ)-algorithm to solve it, where m is the length of the shortest sequence to be extended, and β is the maximum allowed length of the single gap. Very recently, it was shown ) that this problem is strongly and directly motivated by the next-generation resequencing application: aligning tens of millions of short DNA sequences against a reference genome. In this article, we consider an extension of this problem: extending an alignment with k-mismatches and two gaps; and present a Θ(mβ)-time algorithm to solve it. This extension is proved to be fundamental in the next-generation re-sequencing application . In addition, we present a generalisation of our solution to solve the problem of extending an alignment with k-mismatches and ℓ-gaps in time Θ(mβℓ). The presented solutions work provided that all gaps in the alignment must occur in one of the two sequences.

Performance Analysis of Sequence Alignment Applications

2006

Advances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-Waterman

Accuracy analysis of multiple structure alignments

Protein Science, 2009

Protein structure alignment methods are essential for many different challenges in protein science, such as the determination of relations between proteins in the fold space or the analysis and prediction of their biological function. A number of different pairwise and multiple structure alignment (MStA) programs have been developed and provided to the community. Prior knowledge of the expected alignment accuracy is desirable for the user of such tools. To retrieve an estimate of the performance of current structure alignment methods, we compiled a test suite taken from literature and the SISYPHUS database consisting of proteins that are difficult to align. Subsequently, different MStA programs were evaluated regarding alignment correctness and general limitations. The analysis shows that there are large differences in the success between the methods in terms of applicability and correctness. The latter ranges from 44 to 75% correct core positions. Taking only the best method result per test case this number increases to 84%. We conclude that the methods available are applicable to difficult cases, but also that there is still room for improvements in both, practicability and alignment correctness. An approach that combines the currently available methods supported by a proper score would be useful. Until then, a user should not rely on just a single program.

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

BMC Bioinformatics, 2009

Background: Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.