Lemmatization vs. Stemming (original) (raw)

Last Updated : 5 Jan, 2026

Lemmatization and stemming are two popular text‑preprocessing techniques in NLP used to reduce words to their base form. While stemming cuts words down to their root by trimming endings, lemmatization uses linguistic rules to return meaningful base words (lemmas). Understanding the difference helps improve text analysis, search accuracy and NLP model performance.

frame_3294

Lemmatization vs. Stemming

Stemming

Stemming is a rule-based text normalisation technique that reduces words to their root form by removing prefixes or suffixes. The resulting form called a stem, may not be a valid or meaningful word in the language.

In essence, stemming performs mechanical truncation of words.

Techniques Used

**Example:

Original Word Stem
running run
studies studi
smiling smile
communication commun

Implementation

Python `

from nltk.stem import PorterStemmer stemmer = PorterStemmer() words = ["running", "studying", "smiling", "communication"] stemmed_words = [stemmer.stem(word) for word in words] print(stemmed_words)

`

**Output:

['run', 'studi', 'smile', 'commun']

Applications

Lemmatization

Lemmatization is a linguistically driven text normalization technique that converts words into their base dictionary form, known as a lemma, by considering grammar, vocabulary and context.

Techniques Used

Example

Original Word Lemma POS
running run Verb
better good Adjective
studies study Noun
was be Verb

Implementation

Python `

from nltk.stem import WordNetLemmatizer from nltk.corpus import wordnet lemmatizer = WordNetLemmatizer() words = ["running", "better", "studies", "was"] lemmas = [ lemmatizer.lemmatize("running", pos=wordnet.VERB), lemmatizer.lemmatize("better", pos=wordnet.ADJ), lemmatizer.lemmatize("studies", pos=wordnet.NOUN), lemmatizer.lemmatize("was", pos=wordnet.VERB) ] print(lemmas)

`

**Output:

['run', 'good', 'study', 'be']

Applications

Lemmatization vs. Stemming

Let's compare the two techniques:

Aspect Lemmatization Stemming
Definition Converts words to their base or dictionary form (lemma). Reduces words to their root form (stem) which may not be a valid word.
Complexity Higher complexity, context-aware. Lower complexity, context-agnostic.
Algorithms Uses dictionaries and morphological analysis. Uses rule-based algorithms like Porter, Snowball and Lancaster Stemmers.
Accuracy Produces more accurate and meaningful words. Less accurate, may produce non-meaningful stems.
Output Example "Running" → "run", "Better" → "good". "Running" → "run" or "runn", "Better" → "bett".
Use in Search Engines Better search results through understanding context. Useful for quick search indexing
Text Analysis Essential for tasks needing accurate word forms (e.g., sentiment analysis, topic modeling) Used for initial stages of preprocessing to reduce word variability
Machine Translation Helps in producing grammatically correct translations Less common due to potential inaccuracy
Information Retrieval Suitable for detailed and precise analysis Useful for reducing data dimensionality