GitHub - p-ranav/hypergrep: Recursively search directories for a regex pattern (original) (raw)

Highlights

Performance

The following tests compare the performance of hypergrep against:

System Details

Type Value
Processor 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Instruction Set Extensions Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2, Intel® AVX-512
Installed RAM 32.0 GB (31.9 GB usable)
SSD ADATA SX8200PNP
OS Ubuntu 20.04 LTS
C++ Compiler g++ (Ubuntu 11.1.0-1ubuntu1-20.04) 11.1.0

Vcpkg Installed Libraries

vcpkg commit: 662dbb5

Library Version
argparse 2.9
concurrentqueue 1.0.3
fmt 10.0.0
hyperscan 5.4.2
libgit2 1.6.4

Single Large File Search: OpenSubtitles.raw.en.txt

The following searches are performed on a single large file cached in memory (~13GB, OpenSubtitles.raw.en.gz).

Regex Line Count ag ugrep ripgrep hypergrep
Count number of times Holmes did somethinghgrep -c 'Holmes did \w' 27 n/a 1.820 1.022 0.696
Literal with Regex Suffixhgrep -nw 'Sherlock [A-Z]\w+' en.txt 7882 n/a 1.812 1.509 0.803
Simple Literalhgrep -nw 'Sherlock Holmes' en.txt 7653 15.764 1.888 1.524 0.658
Simple Literal (case insensitive)hgrep -inw 'Sherlock Holmes' en.txt 7871 15.599 6.945 2.162 0.650
Alternation of Literalshgrep -n 'Sherlock Holmes|John Watson Irene Adler Inspector Lestrade Professor Moriarty' en.txt 10078 n/a
Alternation of Literals (case insensitive)hgrep -in 'Sherlock Holmes|John Watson Irene Adler Inspector Lestrade Professor Moriarty' en.txt 10333 n/a
Words surrounding a literal stringhgrep -n '\w+[\x20]+Holmes[\x20]+\w+' en.txt 5020 n/a 6m 11s 1.523 0.638

Git Repository Search: torvalds/linux

The following searches are performed on the entire Linux kernel source tree (after running make defconfig && make -j8). The commit used is f1fcb.

Regex Line Count ag ugrep ripgrep hypergrep
Simple Literalhgrep -nw 'PM_RESUME' 9 2.807 0.316 0.147 0.140
Simple Literal (case insensitive)hgrep -niw 'PM_RESUME' 39 2.904 0.435 0.149 0.141
Regex with Literal Suffixhgrep -nw '[A-Z]+_SUSPEND' 536 3.080 1.452 0.148 0.143
Alternation of four literalshgrep -nw '(ERR_SYS|PME_TURN_OFF LINK_REQ_RST CFG_BME_EVT)' 16 3.085 0.410
Unicode Greekhgrep -n '\p{Greek}' 111 3.762 0.484 0.345 0.146

Git Repository Search: apple/swift

The following searches are performed on the entire Apple Swift source tree. The commit used is 3865b.

Regex Line Count ag ugrep ripgrep hypergrep
Function/Struct/Enum declaration followed by a valid identifier and opening parenthesishgrep -n '(func|struct enum)\s+[A-Za-z_][A-Za-z0-9_]*\s*\(' 59026 1.148 0.954 0.154
Words starting with alphabetic characters followed by at least 2 digitshgrep -nw '[A-Za-z]+\d{2,}' 127858 1.169 1.238 0.156 0.095
Workd starting with Uppercase letter, followed by alpha-numeric chars and/or underscores hgrep -nw '[A-Z][a-zA-Z0-9_]*' 2012372 3.131 2.598 0.550 0.482
Guard let statement followed by valid identifierhgrep -n 'guard\s+let\s+[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*\w+' 839 0.828 0.174 0.054 0.047

Directory Search: /usr

The following searches are performed on the /usr directory.

Regex Line Count ag ugrep ripgrep hypergrep
Any HTTPS or FTP URLhgrep "(https?|ftp)://[^\s/$.?#].[^\s]*" 13682 4.597 2.894 0.305 0.171
Any IPv4 IP addresshgrep -w "(?:\d{1,3}\.){3}\d{1,3}" 12643 4.727 2.340 0.324 0.166
Any E-mail addresshgrep -w "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}" 47509 5.477 37.209 0.494 0.220
Any valid date MM/DD/YYYYhgrep "(0[1-9]|1[0-2])/(0[1-9] [12]\d 3[01])/(19 20)\d{2}" 116 4.239
Count the number of HEX valueshgrep -cw "(?:0x)?[0-9A-Fa-f]+" 68042 5.765 28.691 1.439 0.611
Search any C/C++ for a literalhgrep --filter "\.(c|cpp h hpp)$" test 7355 n/a 0.505

Build

Install Dependencies with vcpkg

git clone https://github.com/microsoft/vcpkg cd vcpkg ./bootstrap-vcpkg.sh ./vcpkg install concurrentqueue fmt argparse libgit2 hyperscan

Build hypergrep using cmake and vcpkg

Clone the repository

git clone https://github.com/p-ranav/hypergrep
cd hypergrep

If cmake is older than 3.19

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=<path_to_vcpkg>/scripts/buildsystems/vcpkg.cmake ..
make

If cmake is newer than 3.19

Use the release preset:

export VCPKG_ROOT=<path_to_vcpkg>
cmake -B build -S . --preset release
cmake --build build

Binary Portability

To build the binary for x86_64 portability, invoke cmake with -DBUILD_PORTABLE=on option. This will use -march=x86-64 -mtune=generic and -static-libgcc -static-libstdc++, and link the C++ standard library and GCC runtime statically into the binary, reducing dependencies on the target system.