Bit-level processor array architecture for flexible string matching (original) (raw)

Processor Array Architectures for Flexible Approximate String Matching

In this paper, we present linear processor array architectures for flexible approximate string matching. These architectures are based on parallel realization of dynamic programming and non-deterministic finite automaton algorithms. The algorithms consist of two phases, i.e. preprocessing and searching. Then, starting from the data dependence graphs of the searching phase, parallel algorithms are derived, which can be realized directly onto special purpose processor array architectures for approximate string matching. Further, the preprocessing phase is also accommodated onto the same processor array designs. Finally, the proposed architectures support flexible patterns i.e. patterns with a ''don't care'' symbol, patterns with a complement symbol and patterns with a class symbol.

A Programmable Array Processor Architecture for Flexible Approximate String Matching Algorithms

Approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. This paper proposes a generic design of a programmable array processor architecture for a wide variety of approximate string matching algorithms to gain high performance at low cost. Further, we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms. Further, the architecture performs approximate string matching for complex patterns that contain don't care, complement and classes symbols. We also simulate and evaluate the proposed architecture on a field programmable gate array (FPGA) device using the JHDL tool for synthesis and the Xilinx Foundation tools for mapping, placement, and routing. Finally, our programmable implementation achieves about 8-340 times faster execution than a desktop computer with a Pentium 4 3.5 GHz for all algorithms when the length of the pattern is 1024.

Implementation of a programmable array processor architecture for approximate string matching algorithms on FPGAs

20th International Parallel and Distributed Processing Symposium, IPDPS 2006, 2006

Approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. This paper proposes a generic design of a programmable array processor architecture for a wide variety of approximate string matching algorithms to gain high performance at low cost. Further, we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms. Further, the architecture performs approximate string matching for complex patterns that contain don't care, complement and classes symbols. We also implement and evaluate the proposed architecture on a field programmable gate array (FPGA) device using the JHDL tool for synthesis and the Xilinx Foundation tools for mapping, placement, and routing. Finally, our programmable implementation achieves about 9-340 times faster than a desktop computer with a Pentium 4 3.5 GHz for all algorithms when the length of the pattern is 1024.

FPGA-based string matching

2011

String matching has become essential for modern computers. It is used in many applications ranging from data mining to network security. A problem is that current general purpose computers are no longer fast enough to deal with the ever increasing amounts of data that are passed though them due to the massive increases in network traffic and data storage capacities offered. This paper aims to demonstrate the significant performance gains that can be achieved by employing string matching algorithms directly on hardware using an FPGA, as opposed to the traditional software-only solution. A possible future FPGA-based string matching board that could be installed in current computers is discussed.

A Memory-Efficient and Modular Approach for String Matching on FPGAs

2010

In Network Intrusion Detection Systems (NIDSs), string matching demands exceptionally high performance to match the content of network traffic against a predefined database of malicious patterns. Much work has been done in this field; however, they result in low memory efficiency 1 . Due to the available on-chip memory and the number of I/O pins of Field Programmable Gate Arrays (FPGAs), state-ofthe-art designs cannot support large dictionaries without using high-latency external DRAM. We propose a novel Memory efficient Architecture for large-scale String Matching (MASM), based on pipelined binary search tree. With memory efficiency close to 1 byte/char, MASM can support a dictionary 2 of over 4 MBytes, using a single FPGA device. The architecture can also be easily partitioned, so as to use external SRAM to handle even larger dictionaries of over 8 MBytes. Our implementation results show a sustained throughput of 3.5 Gbps, even when external SRAM is used. The MASM module can be simply duplicated to accept multiple characters per cycle, leading to scalable throughput with respect to the number of characters processed in each cycle. Dictionary update involves only rewriting the memory content, which can be done quickly without reconfiguring the chip.

A reconfigurable array based prototype of a specialised string lookup chip

2008 26th International Conference on Microelectronics, 2008

Different strategies for performing string lookups have been developed and deployed during the evolutionary scientific process. These are the results of both the development of technology and the need for improvement of previously existing solutions. Hence, the string lookup problem has been well studied and the respectful amount of good solutions is present. Due to nature of the problem, most of the solutions are software based. Nevertheless, in the modern computing environments, in which the amount of data to be searched trough is increasingly growing, the problem re-arises demanding for the different type of approaches that could target multi-gigabit throughput rates so as to perform close to real-time string lookups. In that light, this paper studies the potential of migrating, a well-known and widely used, Boyer-Moore string lookup algorithm to a hardware specific device capable of satisfying the demanded throughput, by proposing and characterising the initial implementation option on a reconfigurable platform.

Super fast hardware string matching

2006 IEEE International Conference on Field Programmable Technology, 2006

With the appearance of multi-gigabit network infrastructure, a typical network intrusion detection system (NIDS) has to cope with the network speed. By examining each packet flowing through a network segment, suspicious packets are detected and reported to assure security. Up to 57% of the execution time in a NIDS is found to compare string against a predefined/known pattern. It is hard to implement a multigigabit performance NIDS without hardware support. This paper proposes a very high speed string matching algorithm which can be easily implemented into FPGAs. The parallel matching design takes a segment of text from the payload of a packet and detects all possible tokens including those crossing text segment boundaries. Simulation results show a throughput of 23.43 Gbps with a moderate operating frequency of 366.2 MHz.

Bit Parallel String Matching Algorithms: A Survey

International Journal of Computer Applications, 2014

The intrinsic parallelism in bit operations like AND/OR inside a computer word is known as bit parallelism. Since 1992, this bit parallelism is directly used in string matching for matching efficiency improvement. Some of the popular bit parallel string matching algorithms Shift OR, Shift OR with Q-Gram, BNDM, TNDM, SBNDM, LBNDM, FBNDM, BNDMq, and Multiple pattern BNDM. This paper discusses the working of various bit parallel string matching algorithms with example. Here we present how bit parallelism is useful for efficiency improvement in various algorithms.