Fast Hybrid String Matching Algorithm based on the Quick-Skip and Tuned Boyer-Moore Algorithms (original) (raw)

A New Efficient Hybrid String Matching Algorithm to Solve the Exact String Matching Problem

The string matching algorithms are considered one of the most studied in the computer science field because the fundamental role they play in many different applications such as information retrieval, editors, security applications, firewall, and biological applications. This study aims to introduce a new hybrid algorithm based on two well-known algorithms, namely, the modified Horspool and SSABS hybrid algorithms. Two factors used to analyze the proposed algorithm which is the total number of character comparisons and total number of attempts. The ABSBMH algorithm which is the name chosen for the proposed hybrid algorithm was tested on different types of standard datatype. The ABSBMH algorithm shows less number of character comparisons when compared to the results of other algorithms, while show almost no big different in the results of number of attempts this is due to the proposed hybrid algorithm preprocessing phase based on SSABS algorithm which is the same preprocessing phase of the Quick Search algorithm, so for all these reasons the results of the ABSBMH and other algorithms in terms of total number of attempts have been shown a small different, this is because it use different pattern lengths which are selected randomly from the databases. The experiential results expose that

EMPIRICAL PERFORMANCE EVALUATION OF KNUTH MORRIS PRATT AND BOYER MOORE STRING MATCHING ALGORITHMS

Many algorithms have been proposed for string matching in order to find a specific pattern in a given text. These algorithms have been used in many applications such as software editors, genetics, Internet search engines, natural language processing, etc. The aim of this paper is to evaluate the performance of two popular algorithms: Boyer Moore (BM) and Knuth Morris Pratt (KMP) in terms of execution time. The algorithms have been programmed using Java and Java Microbenchmark Harness to evaluate their execution time using a number of experimental test scenarios. Results show that the BM algorithm outperformed the KMP algorithm in all test scenarios.

A New Efficient Hybrid Exact String Matching Algorithm and Its Applications

String matching is one of most challenging issues in computer science. In this study, a new efficient hybrid string matching algorithm called Atheer was developed. This proposed algorithm is integrated with the excellent properties of three algorithms, namely, the Karp-Rabin, Raita, and Smith algorithms. The Atheer algorithm demonstrated an efficient performance in the number of comparison attempts as well as in the character comparisons with original algorithms in the first step and with recent and standard algorithms (i.e., Horspool, Quick search, Two-way, Fast search, SSABS, TVSBS, AKRAM, and Maximum shift) in the second step. The proposed algorithm in this study utilized several data types, namely, DNA sequences, Protein sequences, XML structures, Pitch characters, English texts, and Source codes. The Pitch database was the best match for Atheer in terms of the number of comparison attempts involving long and short patterns; the DNA database was the worst match. In terms of the character comparisons, the best database was the Source code database; the DNA sequence data type was also the worst match when short and long patterns were used.

A FAST STRING MATCHING ALGORITHM

The pattern matching is a well known and important task of the pattern discovery process in today's world for finding the nucleotide or amino acid sequence patterns in protein sequence databases. Although pattern matching is commonly used in computer science, its applications cover a wide range, including in editors, information retrieval. In this paper we propose a new pattern matching algorithm that has an improved performance compare to the well known algorithms in the literature so far. Our proposed algorithm has been evolved after the comparatively study of the well known algorithms like Boyer Moore , Horspool and Raita. When we are talking about the overall performance of the proposed algorithm it has been improved using the shift provided by the Horspool search bad-character and by defining a fixed order of comparison. The proposed algorithm has been compared with other well known algorithm.

Improved string searching

Software: Practice and Experience, 1989

We show that it is possible to improve the average time of the Boyer-Moore string matching algorithm using more space. This is accomplished by applying a transformation that virtually increases the size of the alphabet in use. The improvement is such that for long patterns it is possible to obtain an algorithm more than 50 per cent faster than the original one. We include experimental results on random and English text. Some improvements for searching on English text are also discussed.

The exact string matching algorithms efficiency review

Exact String matching algorithms has been very significant in many applications in the last two decades. This is due to the advancement in technology that produces large volumes of data. The main factors in string matching algorithms are the number of attempts, the number of character comparison and the running time. These factors are influenced by the type of algorithm, type of data, data size and length of pattern used. In this article, we perform review for advantages and disadvantages of executing exact string matching algorithm. We conclude that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison. The bit parallelism algorithms have the similar limitations.

String Matching FAAST Algorithm

String matching is a fundamental and challenging problem in field of computer science in the current era. It is highly recommended to have fastest algorithms in our different application including text processing and DNA analysis. The invention of modern digital computers mad us able to use pattern matching in various fields of our life. This has motivated us to design the fastest algorithms to meet our current needs. In this paper, we are presenting a new algorithm that offers better performance compared to those reported in the literature till know. This new algorithm has developed by analyzing the previous famous algorithms, Such as Quick search, Raita, Boyer Moore, and Horspool. It also aims at solving a popular variant of the approximate string matching problem, the k-mismatch problem, whose main objective is to find all possible occurrence of a short pattern in a lengthy text/ paragraph having at most kmismatch characters.

A NEW STRING MATCHING ALGORITHM

2000

In this paper a new exact string-matching algorithm with sub-linear average case complexity has been presented. Unlike other sub-linear string-matching algorithms it never performs more than n text character comparisons while working on a text of length n. It requires only O(mþs) extra pre-processing time and space, where m is the length of the pattern and s is the size

String Matching Algorithms

International Journal Of Engineering And Computer Science, 2018

To analyze the content of the documents, the various pattern matching algorithms are used to find all the occurrences of a limited set of patterns within an input text or input document. In order to perform this task, this research work used four existing string matching algorithms; they are Brute Force algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer Moore algorithm and Rabin Karp algorithm. This work also proposes three new string matching algorithms. They are Enhanced Boyer Moore algorithm, Enhanced Rabin Karp algorithm and Enhanced Knuth-Morris-Pratt algorithm. Findings: For experimentation, this work has used two types of documents, i.e. .txt and .docx. Performance measures used are search time, number of iterations and accuracy. From the experimental results, it is realized that the enhanced KMP algorithm gives better accuracy compared to other string matching algorithms. Application/Improvements: Normally, these algorithms are used in the field of text mining, document cl...

Comparative Study between Various Pattern Matching Algorithms

IJCA, 2016

Present paper describes the details of the study of the work that has been done in the field of text searching, a subdivision of Natural Language Processing (NLP) till date. The work in this project includes the study and analysis of some of the algorithms devised under this topic, finding the faults or loopholes and trying to increase the efficiency of these algorithms devised, taking forward the range of work done on it. Experiment is done on the various text search algorithms that have been devised namely Knuth-Morris Pratt Algorithm, Naïve Search Algorithm and Boyer-Moore Algorithm by providing text input of various sizes and analyzing their behavior on these variable inputs. After analyzing and doing the study on these algorithms the results states that Boyer-Moore"s Algorithm worked quite well and efficiently than the rest of them when dealing with larger data sets. When working on larger alphabets the Knuth-Morris Pratt Algorithm works quite well. These algorithms do have drawbacks as their efficiency depends upon the alphabet/pattern size. And also this paper describes new pattern matching algorithm that uses delimiter for shifting the pattern while matching.