Ukkonen's Suffix Tree Construction Part 5 (original) (raw)

Last Updated : 23 Jul, 2025

This article is continuation of following four articles:Ukkonen’s Suffix Tree Construction – Part 1 Ukkonen’s Suffix Tree Construction – Part 2

Ukkonen’s Suffix Tree Construction – Part 3 Ukkonen’s Suffix Tree Construction – Part 4Please go through Part 1, Part 2, Part 3 and Part 4, before looking at current article, where we have seen few basics on suffix tree, high level ukkonen’s algorithm, suffix link and three implementation tricks and some details on activePoint along with an example string “abcabxabcd” where we went through six phases of building suffix tree. Here, we will go through rest of the phases (7 to 11) and build the tree completely.***********************Phase 7***********************************In phase 7, we read 7th character (a) from string S

phase 7

At the end of phase 7, remainingSuffixCount is 1 (One suffix ‘a’, the last one, is not added explicitly in tree, but it is there in tree implicitly). Above Figure 33 is the resulting tree after phase 7.***********************Phase 8***********************************In phase 8, we read 8th character (b) from string S

Set END to 7

At the end of phase 8, remainingSuffixCount is 2 (Two suffixes, ‘ab’ and ‘b’, the last two, are not added explicitly in tree explicitly, but they are in tree implicitly).***********************Phase 9***********************************In phase 9, we read 9th character (c) from string S

Set END to 8

At the end of phase 9, remainingSuffixCount is 3 (Three suffixes, ‘abc’, ‘bc’ and 'c', the last three, are not added explicitly in tree explicitly, but they are in tree implicitly).***********************Phase 10***********************************In phase 10, we read 10th character (d) from string S

Phase 9

Phase 10

The newly created internal node c (in above Figure) in current extension 7, will get it's suffix link set in next extension 8 (see Figure 38 below).

Extension 7

Please note the node C from previous extension (see Figure 37 above) got it's suffix link set here and node D created in current extension will get it's suffix link set in next extension. What happens if no new node created in next extensions? We have seen this before in Phase 6 (Part 4) and will see again in last extension of this Phase 10. Stay Tuned.

extension 8

extension 9

Internal Node created in previous extension, waiting for suffix link to be set in next extension, points to root if no internal node created in next extension. In code implementation, as soon as a new internal node (Say A) gets created in an extension j, we will set it's suffix link to root node and in next extension j+1, if Rule 2 applies on an existing or newly created node (Say B) or Rule 3 applies with some active node (Say B), then suffix link of node A will change to the new node B , else node A will keep pointing to root

We see following facts in Phase 10:

***********************Phase 11***********************************In phase 11, we read 11th character ($) from string S

extension 10

phase 11

Now we have added all suffixes of string ‘abcabxabcd$’ in suffix tree. There are 11 leaf ends in this tree and labels on the path from root to leaf end represents one suffix. Now the only one thing left is to assign a number (suffix index) to each leaf end and that number would be the suffix starting position in the string S. This can be done by a DFS traversal on tree. While DFS traversal, keep track of label length and when a leaf end is found, set the suffix index as “stringSize – labelSize + 1”. Indexed suffix tree will look like below:no edge going out from activeNode root _In above Figure, suffix indices are shown as character position starting with 1 (It's not zero indexed). In code implementation, suffix index will be set as zero indexed, i.e. where we see suffix index j (1 to m for string of length m) in above figure, in code implementation, it will be j-1 (0 to m-1)_And we are done !!!!Data Structure to represent suffix treeHow to represent the suffix tree?? There are nodes, edges, labels and suffix links and indices. Below are some of the operations/query we will be doing while building suffix tree and later on while using the suffix tree in different applications/usages:

We may think of different data structures which can fulfil these requirements. In the next Part 6, we will discuss the data structure we will use in our code implementation and the code as well.