FastText Working and Implementation (original) (raw)

Last Updated : 12 Jun, 2026

FastText is a word embedding technique developed by Facebook that represents words using character level subwords. It handles unseen words effectively and captures both semantic and morphological information.

FastText Architecture and Working

FastText extends traditional word embedding models by representing words as collections of character n-grams rather than treating them as single units. This approach helps capture word structure and generate embeddings for unseen words.

Character N-Gram Representation

FastText breaks each word into smaller groups of characters called n-grams. Instead of learning only the whole word, it also learns these smaller character patterns, helping it understand word structure and meaning. Consider the word "running":

**Here:

Hierarchical Softmax Optimization

Hierarchical Softmax is an optimization technique used by FastText to speed up training. Instead of comparing a word with every word in the vocabulary, it organizes words in a tree structure and performs fewer calculations.

Implementation

Step 1: Installing Required Libraries

Run the following command in your command prompt

pip install gensim

Step 2: Import required libraries

from gensim.models import FastText

`

Step 3: Creating Training Data

sentences = [ ["the", "king", "rules", "the", "kingdom"], ["the", "queen", "helps", "the", "king"], ["running", "is", "good", "exercise"], ["the", "runner", "runs", "fast"], ["walking", "is", "healthy", "activity"], ["the", "walker", "walks", "slowly"], ["reading", "books", "is", "fun"], ["the", "reader", "reads", "daily"] ]

print("Training data created successfully")

`

**Output:

Training data created successfully

Step 4: Training a Basic FastText Model

model = FastText( sentences, vector_size=50, window=5, min_count=1, min_n=3, max_n=6, sg=1, epochs=10 )

print("Model trained successfully")

`

**Output:

Model trained successfully

Step 5: Getting Word Vectors

king_vector = model.wv["king"]

print("Vector for 'king':") print(king_vector[:5])

print("Vector Shape:", king_vector.shape)

`

**Output:

output45

Output

Step 6: Handling Unseen Words (OOV)

One of FastText's major advantages is its ability to generate embeddings for unseen words using character n-grams

kingdom_vector = model.wv["kingdom"]

print("Vector for 'kingdom':") print(kingdom_vector[:5])

`

**Output:

output46

Output

Step 7: Finding Similar Words

print("Words similar to 'king':")

similar_words = model.wv.most_similar( "king", topn=3 )

for word, score in similar_words: print(word, ":", round(score, 4))

`

**Output:

output47

Output

Download full code from here

Applications

Advantages

Limitations