GitHub - gdetari/SymSpellSwift: Swift implementation of SymSpell: Spelling correction & Fuzzy search (original) (raw)

Swift implementation of SymSpell: Spelling correction & Fuzzy search: 1 million times faster through Symmetric Delete spelling correction algorithm

Description from https://github.com/wolfgarbe/SymSpell/

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

Opposite to other algorithms only deletes are required, no transposes + replaces + inserts. Transposes + replaces + inserts of the input term are transformed into deletes of the dictionary term. Replaces and inserts are expensive and language dependent: e.g. Chinese has 70,000 Unicode Han characters!

The speed comes from the inexpensive delete-only edit candidate generation and the pre-calculation. An average 5 letter word has about 3 million possible spelling errors within a maximum edit distance of 3, but SymSpell needs to generate only 25 deletes to cover them all, both at pre-calculation and at lookup time. Magic!

Single word spelling correction

Lookup provides a very fast spelling correction of single words.

Applications

Compound aware multi-word spelling correction

Supports compound aware automatic spelling correction of multi-word input strings.

Compound splitting & decompounding

lookup() assumes every input string as single term. lookupCompound() also supports compound splitting / decompounding with three cases:

  1. mistakenly inserted space within a correct word led to two incorrect terms
  2. mistakenly omitted space between two correct words led to one incorrect combined term
  3. multiple input terms with/without spelling errors

Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.

  1. Automatic spelling correction

Large document collections make manual correction infeasible and require unsupervised, fully-automatic spelling correction. In conventional spelling correction of a single token, the user is presented with multiple spelling correction suggestions. For automatic spelling correction of long multi-word text the algorithm itself has to make an educated choice.

Examples:

Word Segmentation of noisy text

WordSegmentation divides a string into words by inserting missing spaces at appropriate positions.

Examples:

Applications:

Swift implementation

Current implementation builds on the original SymSpell, but uses Swift best practices and modern paradigms to achieve the same results with even better performance.

Usage

let symSpell = SymSpell(maxDictionaryEditDistance: 2, prefixLength: 3) if let path = Bundle.main.url(forResource: "frequency_dictionary_en_82_765", withExtension: "txt") { try? symSpell.loadDictionary(from: path, termIndex: 0, countIndex: 1, termCount: 82765) }

let results = symSpell.lookup("intermedaite")

print(results.first?.term)