Z algorithm (Linear time pattern searching Algorithm) (original) (raw)

Given two strings **text (the text) and **pattern (the pattern), consisting of lowercase English alphabets, find all 0-based starting indices where the pattern occurs as a substring in the text.

Example:

**Input: text = "aabxaabxaa", pattern = "aab"
**Output: [0, 4]
**Explanation:

KMP-Algorithm-for-Pattern-Searching

In the Naive String Matching algorithm, we compare the pattern with every substring of the text of the same size one by one, which takes O(n × m) time, where n is the length of the text and m is the length of the pattern. This becomes inefficient for large inputs. The Z Algorithm improves this by preprocessing the string and efficiently matching prefixes, allowing pattern searching in O(n + m) time.

Core Idea

Z-Array to Track Prefix Matches

For a string s of length n, the Z-array stores:

z[i] = length of the longest substring starting from i which is also a prefix of s

**Example:
s = "aabxaab".

The Z-array is: z = [0, 1, 0, 0, 3, 1, 0]

Calculation of Z Array

The naive way checks, for each index i, how many characters from s[i...] match the prefix s[0...], which can take O(n²) time. The Z-algorithm improves this and computes all values in O(n) time.

While computing, we maintain a window [l, r] called the Z-box, which stores the rightmost substring that matches the prefix.

This helps reuse previous results instead of comparing again. When processing index i, there are two possibilities:

**If i > r (outside the Z-box): Compare characters from scratch and update [l, r].

**If i ≤ r:

The key idea is to preprocess a new string formed by combining the pattern and the text, separated by a special delimiter (e.g., $) that doesn’t appear in either string. This avoids accidental overlaps.

We construct a new string as: s = pattern + '$' + text

We then compute the Z-array for this combined string. The Z-array at any position i tells us the length of the longest prefix of the pattern that matches the substring of the text starting at that position (adjusted for offset due to the pattern and separator).

So, whenever we find a position i such that: Z[i] == length of pattern

It means the entire pattern matches the text at a position: match position = i - (pattern length + 1)

C++ `

#include #include using namespace std;

// Z-function to compute Z-array vector zFunction(string &s) { int n = s.length(); vector z(n); int l = 0, r = 0;

for (int i = 1; i < n; i++) {
    if (i <= r) {
        int k = i - l;
        
        // Case 2: reuse the previously computed value
        z[i] = min(r - i + 1, z[k]);
    }

    // Try to extend the Z-box beyond r
    while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
        z[i]++;
    }

    // Update the [l, r] window if extended
    if (i + z[i] - 1 > r) {
        l = i;
        r = i + z[i] - 1;
    }
}

return z;

}

// Function to find all occurrences of pattern in text vector search(string &text, string &pattern) { string s = pattern + '$' + text; vector z = zFunction(s); vector pos; int m = pattern.size();

for (int i = m + 1; i < z.size(); i++) {
    if (z[i] == m){
        // pattern match starts here in text
        pos.push_back(i - m - 1); 
    }
}
return pos;

}

int main() { string text = "aabxaabxaa"; string pattern = "aab";

vector<int> matches = search(text, pattern);

for (int pos : matches)
    cout << pos << " ";

return 0;

}

Java

import java.util.ArrayList; import java.util.Arrays;

public class GfG {

// Z-function to compute Z-array
static ArrayList<Integer> zFunction(String s) {
    int n = s.length();
    ArrayList<Integer> z = new ArrayList<>();
    for (int i = 0; i < n; i++) {
        z.add(0);
    }
    int l = 0, r = 0;
    
    for (int i = 1; i < n; i++) {
        if (i <= r) {
            int k = i - l;
            
            // Case 2: reuse the previously computed value
            z.set(i, Math.min(r - i + 1, z.get(k)));
        }
        
        // Try to extend the Z-box beyond r
        while (i + z.get(i) < n && 
                s.charAt(z.get(i)) == s.charAt(i + z.get(i))) {
            z.set(i, z.get(i) + 1);
        }
        
        // Update the [l, r] window if extended
        if (i + z.get(i) - 1 > r) {
            l = i;
            r = i + z.get(i) - 1;
        }
    }
    
    return z;
}

// Function to find all occurrences of pattern in text
static ArrayList<Integer> search(String text, String pattern) {
    String s = pattern + '$' + text;
    ArrayList<Integer> z = zFunction(s);
    ArrayList<Integer> pos = new ArrayList<>();
    int m = pattern.length();
    
    for (int i = m + 1; i < z.size(); i++) {
        if (z.get(i) == m){
            
            // pattern match starts here in text
            pos.add(i - m - 1); 
        }
    }
    return pos;
}

public static void main(String[] args) {
    String text = "aabxaabxaa";
    String pattern = "aab";
    
    ArrayList<Integer> matches = search(text, pattern);
    
    for (int pos : matches)
        System.out.print(pos + " ");
}

}

Python

def zFunction(s): n = len(s) z = [0] * n l, r = 0, 0

for i in range(1, n):
    if i <= r:
        k = i - l

        # Case 2: reuse the previously computed value
        z[i] = min(r - i + 1, z[k])

    # Try to extend the Z-box beyond r
    while i + z[i] < n and s[z[i]] == s[i + z[i]]:
        z[i] += 1

    # Update the [l, r] window if extended
    if i + z[i] - 1 > r:
        l = i
        r = i + z[i] - 1

return z

def search(text, pattern): s = pattern + '$' + text z = zFunction(s) pos = [] m = len(pattern)

for i in range(m + 1, len(z)):
    if z[i] == m:
        
        # pattern match starts here in text
        pos.append(i - m - 1)

return pos

if name == 'main': text = 'aabxaabxaa' pattern = 'aab'

matches = search(text, pattern)

for pos in matches:
    print(pos, end=' ')

C#

using System; using System.Collections.Generic;

public class GfG {

// Z-function to compute Z-array
static List<int> zFunction(string s)
{
    int n = s.Length;
    List<int> z = new List<int>(new int[n]);
    int l = 0, r = 0;

    for (int i = 1; i < n; i++) {
        if (i <= r) {
            int k = i - l;

            // Case 2: reuse the previously computed
            // value
            z[i] = Math.Min(r - i + 1, z[k]);
        }

        // Try to extend the Z-box beyond r
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
            z[i]++;
        }

        // Update the [l, r] window if extended
        if (i + z[i] - 1 > r) {
            l = i;
            r = i + z[i] - 1;
        }
    }

    return z;
}

// Function to find all occurrences of pattern in text
static List<int> search(string text, string pattern)
{
    string s = pattern + '$' + text;
    List<int> z = zFunction(s);
    List<int> pos = new List<int>();
    int m = pattern.Length;

    for (int i = m + 1; i < z.Count; i++) {
        if (z[i] == m) {

            // pattern match starts here in text
            pos.Add(i - m - 1);
        }
    }
    return pos;
}

public static void Main()
{
    string text = "aabxaabxaa";
    string pattern = "aab";

    List<int> matches = search(text, pattern);

    foreach(int pos in matches)
        Console.Write(pos + " ");
}

}

JavaScript

function zFunction(s) { let n = s.length; let z = new Array(n).fill(0); let l = 0, r = 0;

for (let i = 1; i < n; i++) {
    if (i <= r) {
        let k = i - l;

        // Case 2: reuse the previously computed value
        z[i] = Math.min(r - i + 1, z[k]);
    }

    // Try to extend the Z-box beyond r
    while (i + z[i] < n && s[z[i]] === s[i + z[i]]) {
        z[i]++;
    }

    // Update the [l, r] window if extended
    if (i + z[i] - 1 > r) {
        l = i;
        r = i + z[i] - 1;
    }
}

return z;

}

function search(text, pattern) { let s = pattern + '$' + text; let z = zFunction(s); let pos = []; let m = pattern.length;

for (let i = m + 1; i < z.length; i++) {
    if (z[i] === m){
        
        // pattern match starts here in text
        pos.push(i - m - 1); 
    }
}
return pos;

}

// Driver Code let text = 'aabxaabxaa'; let pattern = 'aab'; let matches = search(text, pattern); console.log(matches.join(" "));

`

**Time Complexity: O(n + m), where n is the length of the text and m is the length of the pattern, since the combined string and Z-array are processed linearly.
**Auxiliary Space: O(n + m), used to store the combined string and the Z-array for efficient pattern matching.

Advantages of Z-Algorithm

Real-Life Applications