A Two-Hashing Table Multiple String Pattern Matching Algorithm (original) (raw)

Adaptive hashing based multiple variable length pattern search algorithm for large data sets

2014 International Conference on Data Science & Engineering (ICDSE), 2014

Searching of patterns in large data sets is need of the hour to extract knowledge from data warehouses. This paper presents a new hashing based algorithm for fast search of multiple variable length patterns in large data sets. It rules out traditional way of generation of shift table for each character present in pattern. It can also accommodate patterns which come up during search time, thus works well for both predetermined as well as dynamic pattern set. Furthermore, its speed enhances as the minimum pattern length P increases for data set of length n taking O(n/P) time during search. Experimental results for runtime behavior of presented algorithm with varying parameters like number of patterns to be searched and length of data set extended upto (but not limited to) 200,000 characters are produced.

Modified Suffix Search Algorithm for Multiple String Matching

2013

String Matching is now a prominent field in the area of Computer Science and it has many applications in the real world. A new algorithm for Suffix Search which uses chained hashing is proposed and this works well in matched case and mismatched case. A separate hash function is introduced in this paper. Hash functions can be declared in many ways. In this, radix hashing is used and the need of the shift table used in these algorithms can be avoided. Every pattern matching algorithm consists of mainly two phases. They are the preprocessing phase and the matching phase. Each of these phases has its own time complexity as well as space complexity. The proposed method has very low time complexity in average case.

A New Efficient Hybrid String Matching Algorithm to Solve the Exact String Matching Problem

The string matching algorithms are considered one of the most studied in the computer science field because the fundamental role they play in many different applications such as information retrieval, editors, security applications, firewall, and biological applications. This study aims to introduce a new hybrid algorithm based on two well-known algorithms, namely, the modified Horspool and SSABS hybrid algorithms. Two factors used to analyze the proposed algorithm which is the total number of character comparisons and total number of attempts. The ABSBMH algorithm which is the name chosen for the proposed hybrid algorithm was tested on different types of standard datatype. The ABSBMH algorithm shows less number of character comparisons when compared to the results of other algorithms, while show almost no big different in the results of number of attempts this is due to the proposed hybrid algorithm preprocessing phase based on SSABS algorithm which is the same preprocessing phase of the Quick Search algorithm, so for all these reasons the results of the ABSBMH and other algorithms in terms of total number of attempts have been shown a small different, this is because it use different pattern lengths which are selected randomly from the databases. The experiential results expose that

String Matching Algorithms

International Journal Of Engineering And Computer Science, 2018

To analyze the content of the documents, the various pattern matching algorithms are used to find all the occurrences of a limited set of patterns within an input text or input document. In order to perform this task, this research work used four existing string matching algorithms; they are Brute Force algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer Moore algorithm and Rabin Karp algorithm. This work also proposes three new string matching algorithms. They are Enhanced Boyer Moore algorithm, Enhanced Rabin Karp algorithm and Enhanced Knuth-Morris-Pratt algorithm. Findings: For experimentation, this work has used two types of documents, i.e. .txt and .docx. Performance measures used are search time, number of iterations and accuracy. From the experimental results, it is realized that the enhanced KMP algorithm gives better accuracy compared to other string matching algorithms. Application/Improvements: Normally, these algorithms are used in the field of text mining, document cl...

A FAST STRING MATCHING ALGORITHM

The pattern matching is a well known and important task of the pattern discovery process in today's world for finding the nucleotide or amino acid sequence patterns in protein sequence databases. Although pattern matching is commonly used in computer science, its applications cover a wide range, including in editors, information retrieval. In this paper we propose a new pattern matching algorithm that has an improved performance compare to the well known algorithms in the literature so far. Our proposed algorithm has been evolved after the comparatively study of the well known algorithms like Boyer Moore , Horspool and Raita. When we are talking about the overall performance of the proposed algorithm it has been improved using the shift provided by the Horspool search bad-character and by defining a fixed order of comparison. The proposed algorithm has been compared with other well known algorithm.

Improved Double-Skip String Matching Algorithm (IDSA

The string-matching problem is defined as a given large text string and a pattern string to find all its occurrences in the given text. A modification of one of the recent and fast string-matching algorithms is presented here. The modification was tested with English text. The results suggest a reduction in the average number of comparisons performed up to 36% comparing to the original Double-Skip Algorithm (DSA).

Efficient Wu Manber String Matching Algorithm for Large Number of Patterns

International Journal of Computer Applications, 2015

String matching is one of the most important concept used in computer science in various real life applications like as Intrusion detection system, Data mining, Plagiarism detection system. There are many string matching algorithms which help to find pattern from the text. These algorithms are categorized in single string matching and multiple string matching. The Wu-Manber (WM) algorithm is multiple patterns algorithm which is the finest string matching algorithm. The performance of WM depends on various table build in pre processing phase these are prefix table, shift table and hase table. We introduce a new algorithm namely the Efficient Wu Manber algorithm (EWM) algorithm which is advance version of Wu Manber algorithm with respect to time. Efficient Wu-Manber Algorithm eliminate the prefix table which is unused most of the cases in wu manber, construct two shift table instead of single shift table and uses nonlinear data structure i.e. AVL tree instead of linear data structure i.e. linked list used in WM in Hash table, which reduce the traversed number of nodes to find exact match. The experimental results and analysis show that EWM algorithm has better performance as compare to WM and its existing improved algorithm and also better from various string matching

Comparative Study between Various Pattern Matching Algorithms

IJCA, 2016

Present paper describes the details of the study of the work that has been done in the field of text searching, a subdivision of Natural Language Processing (NLP) till date. The work in this project includes the study and analysis of some of the algorithms devised under this topic, finding the faults or loopholes and trying to increase the efficiency of these algorithms devised, taking forward the range of work done on it. Experiment is done on the various text search algorithms that have been devised namely Knuth-Morris Pratt Algorithm, Naïve Search Algorithm and Boyer-Moore Algorithm by providing text input of various sizes and analyzing their behavior on these variable inputs. After analyzing and doing the study on these algorithms the results states that Boyer-Moore"s Algorithm worked quite well and efficiently than the rest of them when dealing with larger data sets. When working on larger alphabets the Knuth-Morris Pratt Algorithm works quite well. These algorithms do have drawbacks as their efficiency depends upon the alphabet/pattern size. And also this paper describes new pattern matching algorithm that uses delimiter for shifting the pattern while matching.

The exact string matching algorithms efficiency review

Exact String matching algorithms has been very significant in many applications in the last two decades. This is due to the advancement in technology that produces large volumes of data. The main factors in string matching algorithms are the number of attempts, the number of character comparison and the running time. These factors are influenced by the type of algorithm, type of data, data size and length of pattern used. In this article, we perform review for advantages and disadvantages of executing exact string matching algorithm. We conclude that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison. The bit parallelism algorithms have the similar limitations.

Fast string matching for multiple searches

Software: Practice and Experience, 2001

We present a string matching or pattern matching method which is especially useful when a single block of text must be searched repeatedly for different patterns. The method combines linking the text according to digrams, searching on the least-frequent digram, and probing selected characters as a preliminary filter before full pattern comparison. Tests on real alphabetic data show that the number of character comparisons may be decreased by two orders of magnitude compared with Knuth-Morris-Pratt and similar searching, but with an initialization overhead comparable to five to ten conventional searches.