Natural Language Processing Interview Question (original) (raw)

Last Updated : 27 Sep, 2025

Natural Language Processing (NLP) is evolving rapidly, with interviews focusing not just on basics but also on advanced architectures, contextual understanding and real-world applications. Let's prepare for interviews with a few practice questions.

Q1. What is tokenization and what are its types?

Tokenization is the process of splitting text into smaller units called tokens, which can be words, subwords or characters. It is a fundamental step in NLP because most downstream tasks—like embeddings, parsing and classification—require input in a structured, tokenized form.

**Types of Tokenization:

Q2. What is the difference between stemming and lemmatization?

Let's see the difference between stemming and lemmatization,

Feature Stemming Lemmatization
Definition Reduces a word to its base or root form by removing suffixes/prefixes Reduces a word to its dictionary or canonical form using linguistic rules and context
Output Often produces non-words or truncated forms Produces valid words found in a dictionary
Accuracy Crude approximation; may remove too much More precise; considers context and part-of-speech
Computation Fast, computationally inexpensive Slower due to use of dictionaries and POS tagging
Example "studies" → "studi"; "running" → "run" "better" → "good"; "running" → "run"
Use-case Search engines, text indexing where speed matters NLP tasks requiring semantic understanding, e.g., sentiment analysis, text summarization

Q3. What is the Out-of-Vocabulary (OOV) problem in NLP?

The OOV problem occurs when a model encounters a word not seen during training, leading to poor representation or prediction failure. This is a major challenge for traditional embedding methods like Word2Vec or GloVe, which assign vectors only to words present in the training corpus.

**Example: Model trained on "I love NLP" may fail on "I enjoy NLU" because "NLU" is OOV.

**Solutions:

  1. **Subword embeddings: BPE, WordPiece break unknown words into known subwords.
  2. **Character-level embeddings: Represent words via character sequences to handle rare/misspelled words.
  3. **Contextual embeddings: Models like BERT or ELMo generate dynamic embeddings for any input, mitigating OOV issues.

Q4. What is the Bag of Words (BoW) model and what are its limitations?

Bag of Words (BoW) is a feature extraction technique that represents text as a vector of word counts or frequencies, ignoring grammar and word order. It’s simple and widely used in classic NLP pipelines.

**Example:

**Limitations:

Q5. What is TF-IDF and how is it used in NLP?

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used in NLP to evaluate the importance of a word in a document relative to a collection of documents (corpus).

TF-IDF score is the product of TF and IDF, highlighting words that are important in a specific document but not common across the corpus.

**Formula:

TF-IDF(t,d) = TF(t,d)\times IDF(t)

**Applications:

Q6. What are word embeddings and why are they important?

Word embeddings are dense vector representations of words in a continuous space, capturing semantic and syntactic relationships. Unlike one-hot vectors, embeddings encode similarity and contextual relationships between words.

**Example: "king" and "queen" have vectors close together, reflecting their semantic similarity, whereas "king" and "apple" are distant.

**Word Embedding Techniques:

**Importance:

Q7. What is the difference between word embeddings and contextual embeddings?

Let's see the difference between word embeddings and contextual embeddings,

Feature Word Embeddings Contextual Embeddings
Definition Fixed vector representation for each word Dynamic vectors that vary depending on context
Example "bank" → same vector regardless of "river bank" or "financial bank" "bank" → different vectors for "river bank" and "financial bank"
Techniques Word2Vec, GloVe, FastText BERT, ELMo, GPT
Context Awareness None Captures surrounding words and semantic meaning
Use-case Basic NLP tasks, semantic similarity Tasks requiring context understanding (QA, NER, disambiguation)

Q8. What are the different types of embeddings in NLP?

Let's see the various types of embeddings,

**1. Word-level embeddings:

**2. Subword embeddings:

**3. Character-level embeddings:

**4. Contextual embeddings:

**5. Sentence/document embeddings:

Different embeddings capture meaning at different granularities—word, subword, character, sentence—depending on the task.

Q9. Dense vs. Sparse Embeddings

Let's see the difference between sparse and dense embeddings,

Feature Dense Embeddings Sparse Embeddings
Vector Characteristics Low-dimensional, mostly non-zero High-dimensional, mostly zero
Representation Learned via neural networks capturing semantic meaning Based on explicit features like TF-IDF or one-hot encoding
Dimensionality Typically 100–1000 dimensions Thousands to millions of dimensions
Interpretability Less interpretable; dimensions do not correspond directly to features Highly interpretable; each dimension maps to a specific feature or term
Use-case Semantic search, recommendations, NLP tasks needing contextual understanding Keyword matching, traditional information retrieval, sparse data scenarios
Storage & Efficiency Compact but computationally intensive Larger storage, efficiently indexed for exact match retrieval
Strength Captures subtle contextual and semantic relationships Efficient for exact match retrieval and scalable
Limitation Requires large datasets and training overhead Cannot capture semantic similarity or context well

Q10. What is the difference between pretrained embeddings and fine-tuning?

Let's see the difference between pretrained embeddings and fine-tuning,

Feature Pretrained Embeddings Fine-tuning
Definition Embeddings trained on large general corpora and used as-is Pretrained embeddings further trained on task-specific data to improve performance
Flexibility General-purpose Task-specific adaptation
Computation Low (no additional training required) Higher (requires gradient updates on embeddings)
Example Word2Vec trained on Wikipedia BERT embeddings fine-tuned for sentiment analysis
Use-case Quickly incorporate semantic knowledge into models Improve accuracy for specific downstream tasks

Q11. What are Recurrent Neural Networks (RNNs)?

Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data, where the order of inputs is important. Unlike feedforward networks, RNNs maintain a hidden state that acts as memory, capturing information from previous time steps to influence current predictions.

**Mathematical Representation:

h_t = f(W_{xh}x_t + W_{hh}h_{t-1} + b_h)

**Applications:

Q13. What is the vanishing gradient problem in RNNs?

The vanishing gradient problem occurs when gradients shrink exponentially as they are backpropagated through time steps in an RNN. This prevents the network from learning long-range dependencies effectively.

**Example: If an RNN is trying to predict a word based on a sequence of 50 previous words, gradients may become extremely small by the time they reach the first word, leading to ineffective weight updates.

**Solutions:

  1. **LSTM (Long Short-Term Memory) networks with memory cells
  2. **GRU (Gated Recurrent Units) with simplified gating
  3. **Gradient clipping to prevent extremely small or large gradients

The vanishing gradient problem is why RNNs struggle with long sequences, motivating LSTM and GRU architectures.

Q14. What is the difference between RNN, LSTM and GRU networks?

Let's see the difference between RNN, LSTM, GRU,

Feature RNN LSTM GRU
Memory Mechanism Simple hidden state Separate memory cell and hidden state Single hidden state combining memory
Gates None Input, Forget, Output Update, Reset
Ability to handle long sequences Poor due to vanishing gradients Excellent due to gating mechanisms Good, slightly less complex than LSTM
Complexity Low High Moderate
Computation Cost Lower Higher Lower than LSTM
Use-case Short sequences or simple tasks Long sequences, language modeling, translation Medium-length sequences, lightweight tasks
Advantage Simple and fast Captures long-range dependencies Efficient and less computationally heavy
Limitation Cannot handle long-term dependencies Computationally intensive Slightly less powerful for very long sequences

Q15. Explain sequence-to-sequence (Seq2Seq) models and their components

Sequence-to-sequence (Seq2Seq) models are neural network architectures designed to transform an input sequence into an output sequence. They are widely used in NLP tasks where the lengths of input and output sequences can vary, such as machine translation, text summarization and speech recognition.

**Components:

**1. Encoder:

**2. Decoder:

**3. Attention Mechanism :

**Example:

Seq2Seq allows mapping variable-length input sequences to variable-length outputs, a crucial aspect in NLP tasks like translation.

Q16. What is an Encoder-Decoder model in NLP?

The Encoder-Decoder architecture is a foundational framework for sequence-to-sequence (Seq2Seq) tasks in NLP. It separates input comprehension and output generation, enabling flexible transformation of variable-length sequences.

**Components:

**Encoder:

**Decoder:

**Attention Layer:

**Use-cases:

Encoder-Decoder models provide a structured way to handle variable-length sequences, bridging input understanding with output generation.

Q17. Explain the Transformer architecture and its impact on NLP

Transformers process sequences in parallel using self-attention, instead of sequentially like RNNs. Self-attention weighs the importance of every word with respect to others, capturing long-range dependencies effectively.

**Key Components:

**Impact on NLP:

Transformers outperform RNN-based architectures by modeling context more effectively and training faster on large corpora.

Q18. Give the difference between BERT and GPT architectures.

Let's see the difference between BERT and GPT architectures,

Feature BERT GPT
Architecture Encoder-only Decoder-only
Training Objective Masked Language Modeling (MLM) Autoregressive Language Modeling
Context Bidirectional (considers left and right context) Unidirectional (left-to-right context)
Use-case Understanding tasks: NER, QA, classification Text generation, completion, dialogue systems
Fine-tuning Requires task-specific fine-tuning Can generate text with minimal adaptation
Strength Captures full context for comprehension Generates coherent, sequential text
Limitation Not naturally suited for text generation Limited bidirectional understanding

Q19. Autoregressive vs Autoencoder models.

Let's see the differences between Autoregressive and Autoencoders,

Feature Autoregressive Models Autoencoder Models
Purpose Predict next token based on previous tokens in sequence Reconstruct input from compressed latent representation
Context Usage Left (previous) context only (unidirectional) Both left and right context (bidirectional)
Training Objective Maximize likelihood of next token Minimize reconstruction loss between input and output
Typical Architecture Decoder-only Transformer (e.g., GPT series) Encoder-decoder or encoder-only (e.g., BERT)
Applications Text generation, speech synthesis, time-series forecasting Text classification, question answering, representation learning
Inference Sequential token generation; slower Parallel processing possible; faster
Strength Excellent at coherent sequential generation Strong at contextual understanding and embeddings
Limitation Limited bidirectional context Not naturally suited for free-form text generation

Q20. What are the differences between Masked Language Modeling (MLM) and Causal Language Modeling (CLM)?

Let's see the difference between MLM and CLM,

Feature Masked Language Modeling (MLM) Causal Language Modeling (CLM)
Objective Predict masked tokens anywhere in the input Predict the next token based on previous tokens only
Context Bidirectional (uses both left and right context) Unidirectional (left-to-right context)
Example Input: "The [MASK] is bright" → predict "sun" Input: "The sun is" → predict "bright"
Model Examples BERT, RoBERTa GPT series
Use-case Language understanding tasks: NER, QA, classification Language generation tasks: text completion, dialogue
Strength Captures full context for better comprehension Generates coherent sequential text
Limitation Not naturally suited for free text generation Limited to past context; cannot see future tokens

Q21. How does dependency parsing differ from constituency parsing?

Feature Dependency Parsing Constituency Parsing
Focus Grammatical relationships between words (head-dependent relations) Hierarchical structure of phrases (sub-phrases like NP, VP)
Output Dependency tree: edges represent direct relationships Constituency tree: nested tree structure of constituents
Example Sentence: "She enjoys reading books" → "enjoys" is root; "She" → subject; "reading books" → object Same sentence → NP: "She"; VP: "enjoys reading books"
Advantages Highlights syntactic dependencies useful for relation extraction Captures hierarchical grammatical structure
Use-cases Information extraction, syntax-based sentiment analysis, NER Grammar analysis, parsing for machine translation, text generation
Representation Graph-based (nodes = words, edges = dependencies) Tree-based (nested phrase structure)

Q22. What are positional encodings in Transformers and why are they needed?

Transformers process sequences in parallel and do not inherently capture the order of tokens. Positional encodings add information about the position of each token in the sequence, enabling the model to recognize the relative or absolute position of words.

**Types of Positional Encodings:

  1. **Sinusoidal (fixed) encoding: Uses sine and cosine functions of different frequencies.
  2. **Learned encoding: Position embeddings are learned during training.

Why Needed:

Positional encodings provide order information, essential for accurate context modeling in Transformers.

Q23. Explain the concept of embeddings for subwords and character-level models

Subword and character-level embeddings are designed to address the Out-of-Vocabulary (OOV) problem and handle rare, morphologically complex or unseen words in NLP tasks. They allow models to generate meaningful representations even for words not seen during training.

**1. Subword Embeddings:

Words are split into subword units such as prefixes, suffixes or frequent subword patterns. Common methods: Byte-Pair Encoding (BPE), WordPiece, FastText.

**Benefits:

**Example:

**2. Character-Level Embeddings:

Each character in a word is represented as an embedding. A sequence of character embeddings is processed (e.g., via CNNs or RNNs) to form a word-level representation.

**Benefits:

**Example:

Q24. Explain Named Entity Recognition (NER) and its importance

Named Entity Recognition (NER) is a subtask of information extraction that identifies and classifies entities in text into predefined categories such as persons organizations, locations, dates, monetary values, percentages and more.

**Example:

**Sentence: "Apple Inc. was founded by Steve Jobs in Cupertino."

**NER Output:

**Importance of NER:

NER is foundational for structured understanding of unstructured text, enabling downstream NLP tasks to operate more effectively.

Q26. What is Word Sense Disambiguation (WSD)? Differentiate between WSD and NER.

Word Sense Disambiguation (WSD) is the process of determining the correct meaning of a word in context when the word has multiple possible senses. WSD is crucial for accurate understanding and downstream NLP tasks.

**Techniques:

  1. **Knowledge-based approaches: Use lexical databases like WordNet to match context with word senses.
  2. **Supervised learning: Train classifiers on labeled datasets where words are annotated with their correct senses.
  3. **Contextual embeddings: Modern models like BERT produce dynamic embeddings that inherently disambiguate word senses based on surrounding context.
Feature WSD (Word Sense Disambiguation) NER (Named Entity Recognition)
Definition Identifies the correct meaning (sense) of a word based on context. Identifies and classifies proper nouns or entities (like names, locations, organizations).
Focus Resolving lexical ambiguity for common words. Detecting specific entities in text.
Example “Bank” → financial institution vs river bank depending on sentence. “Apple” → company vs “Steve Jobs” → person.
Context Use Requires surrounding words or sentence-level context to choose correct sense. Uses surrounding words and sometimes grammar to classify entity type.
Applications Machine translation, semantic search, word-level sense analysis. Information extraction, question answering, knowledge graph construction.

Q27. What is topic modeling and which algorithms are commonly used?

Topic modeling is an unsupervised learning technique that identifies hidden topics in large collections of text documents by analyzing word patterns and co-occurrences.

**Common Algorithms:

**Applications:

Information Extraction (IE) is the process of automatically converting unstructured text into structured data that can be easily analyzed and used in downstream applications. IE allows systems to extract meaningful facts, entities, relationships and events from raw text.

**Key Components:

  1. **Named Entity Recognition (NER): Identify and classify entities (persons organizations, locations, etc.)
  2. **Relation Extraction: Detect relationships between entities (e.g., “Steve Jobs → founder → Apple Inc.”)
  3. **Event Extraction: Identify events and participants, along with temporal and spatial details

**Applications:

Q29. What are the challenges faced in sentiment analysis and how can they be addressed?

Sentiment analysis determines the emotional tone of text (positive, negative, neutral), but it faces several challenges:

**1. Sarcasm and Irony:

**2. Contextual Ambiguity:

**3. Domain-Specific Language:

**4. Negation Handling:

**5. Imbalanced Data:

Q30. What are common challenges in text classification and how can they be solved?

Text classification assigns predefined categories to text documents, but it faces multiple challenges:

**High Dimensionality:

**Class Imbalance:

**Noise and Irrelevant Information:

**Ambiguity:

**Domain Adaptation:

Q31. How do attention mechanisms work in NLP?

Attention mechanisms in NLP allow models to focus on relevant parts of the input sequence when processing or generating text. Each word is assigned a weight based on its importance to other words in the sequence, enabling the model to capture context and long-range dependencies effectively.

Q32. What is the role of Layer Normalization in Transformer models?

Layer Normalization is a technique that normalizes the inputs of each layer to have zero mean and unit variance, stabilizing and accelerating training in deep neural networks, particularly Transformers.

Layer Normalization ensures stable and efficient training, improving convergence and model performance.

Q33. What is the role of context windows in NLP?

A context window is the set of words surrounding a target word that a model considers when interpreting its meaning. It defines the scope of context used to capture semantic and syntactic relationships.

**Types:

Q34. What is zero-shot and few-shot learning in NLP?

Zero-shot learning in NLP refers to the ability of a model to perform a task without having seen any labeled examples of that task during training, relying solely on its pre-trained knowledge. **For Example: A sentiment analysis model trained on English being used to classify sentiments in Hindi without explicit Hindi training data.

Few-shot learning refers to the ability of a model to adapt to a task with only a small number of labeled examples, leveraging prior knowledge for generalization. **For example: Fine-tuning a pre-trained model for intent classification with just a handful of labeled sentences.

Q35. Explain Cross-lingual Transfer Learning and its challenges.

Cross-lingual Transfer Learning is the process of using knowledge learned from a high-resource source language (e.g., English) to improve model performance in a low-resource target language (e.g., Swahili), enabling multilingual applications with limited labeled data.

**Challenges:

Q36. What is retrieval-augmented generation (RAG) in NLP?

Retrieval-Augmented Generation (RAG) is a hybrid NLP approach that combines retrieval-based methods with generative models to improve accuracy, factuality and knowledge coverage in text generation tasks.

**Applications:

RAG enhances generative models by anchoring responses in real-world data, making them more reliable and trustworthy.

Q37. How can knowledge graphs be integrated into NLP applications?

A knowledge graph (KG) is a structured representation of entities and their relationships. Integrating KGs into NLP allows models to use explicit symbolic knowledge alongside statistical learning for better reasoning and interpretability.

**Applications:

Q38. Describe how you would implement a chatbot using NLP techniques.

A chatbot is an AI system that simulates human conversation, often using Natural Language Processing (NLP) to understand user input and generate appropriate responses.

**Implementation Steps:

**1. Text Preprocessing

**2. Intent Recognition

**3. Entity Extraction

**4. Dialogue Management

**5. Response Generation

**6. Knowledge Integration

**Application:

Q39. What are machine translation approaches?

Machine Translation (MT) refers to the process of automatically converting text or speech from one natural language into another using computational methods. Over time, MT has evolved through three main paradigms: Rule-Based (RBMT), Statistical (SMT) and Neural Machine Translation (NMT). Each approach differs in how it models language, handles grammar and learns translation patterns.

**1. Rule-Based Machine Translation (RBMT):

RBMT is the earliest approach to MT, which relies on explicit linguistic rules and bilingual dictionaries crafted by experts. It uses knowledge of grammar, syntax and semantics of both the source and target languages to perform translation.

**2. Statistical Machine Translation (SMT):

SMT relies on probability and statistics derived from large bilingual corpora to generate translations. Instead of rules, it learns how words and phrases in one language map to another based on frequency and alignment.

**3. Neural Machine Translation (NMT):

NMT uses deep learning models, particularly sequence-to-sequence architectures with attention (and later Transformers), to perform translation. It represents words and sentences in continuous vector spaces (embeddings), enabling context-aware and fluent translations.

Q40. How can NLP be applied in recommendation systems, search engines and QA systems?

**1. NLP in Recommendation System

A recommendation system is an AI-based system that predicts and suggests items (such as products, movies or news articles) to users by analyzing their past interactions, preferences and available content. When NLP is applied, the system can also interpret textual content (e.g., item descriptions, user reviews) to make smarter and more personalized recommendations.

**How NLP is applied:

**2. NLP in Search Engines

A search engine is a system that retrieves and ranks relevant documents, web pages or content based on a user’s query. NLP improves search engines by enabling them to understand the meaning behind queries instead of just matching keywords.

**How NLP is applied:

**3. NLP in Question Answering (QA) Systems

A QA system is an NLP-powered application that provides direct answers to user queries expressed in natural language, instead of returning just a list of documents. Unlike search engines, QA systems aim to extract or generate exact responses from available knowledge.

**How NLP is applied:

Q41. What are the evaluations metrics in NLP?

Evaluation metrics in NLP vary by task and generally include:

**Classification Tasks:

**Machine Translation:

**Summarization:

**Semantic Similarity:

Q42. What is cosine similarity and Word Mover’s Distance (WMD) in semantic similarity?

Semantic similarity refers to measuring how closely two texts (words, sentences or documents) are related in meaning. Two popular approaches for this are cosine similarity and Word Mover’s Distance (WMD).

**Cosine Similarity:

Cosine similarity is a vector-based metric that measures the cosine of the angle between two vectors in high-dimensional space. It captures how similar the direction of two vectors is, regardless of their magnitude.

**Formula:

\text{Cosine Similarity}= \frac{A.B}{||A||\times||B||}

where A and B are embedding vectors.

**Use cases:

**Word Mover’s Distance (WMD):

Word Mover’s Distance is a document-level distance metric that measures the minimum cumulative distance required to move words from one text to another, using their embeddings. It is based on the Earth Mover’s Distance (optimal transport theory).

**Advantages over cosine similarity:

**Example:

Cosine similarity may not capture the relation fully, but WMD shows high similarity because word embeddings align semantically.

Q43. What is pragmatic ambiguity?

Pragmatic ambiguity occurs when the meaning of an utterance depends on the context, situation or speaker intent, rather than the literal interpretation of the words themselves. It arises from how language is used in communication, not from grammatical or lexical ambiguity.

**Examples:

1. “Can you pass the salt?”

2. “I’ll meet you at the bank.”

Q44. What are Hugging Face Transformers and how are they used in NLP?

Hugging Face Transformers is an open-source library that provides pretrained Transformer-based models for a wide range of NLP tasks. It enables easy access to models like BERT, GPT, RoBERTa, T5, and DistilBERT, along with tools for training, fine-tuning, and deploying them efficiently.

**Key Features:

**Applications in NLP:

Q.45. Apply a full text preprocessing pipeline.

Text preprocessing is the process of cleaning and transforming raw text into a structured format suitable for NLP tasks. It helps remove noise, standardize the input and prepare features for downstream models. Using NLTK (Natural Language Toolkit), we can implement a complete preprocessing pipeline.

**1. Import Necessary Libraries

We will be importing nltk, regex, string and inflect.

Python `

import nltk import string import re import inflect from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer from nltk.stem.porter import PorterStemmer

`

**2. Convert to Lowercase

We convert the text lowercase to reduce the size of the vocabulary of our text data.

Python `

def text_lowercase(text): return text.lower()

input_str = "Hey, did you know that the summer break is coming? Amazing right !! It's only 5 more days !!" print(text_lowercase(input_str))

`

**Output:

hey, did you know that the summer break is coming? amazing right !! it's only 5 more days !!

**3. Removing Numbers

We can either remove numbers or convert the numbers into their textual representations. To remove the numbers we can use regular expressions.

Python `

def remove_numbers(text): return re.sub(r'\d+', '', text)

input_str = "There are 3 balls in this bag, and 12 in the other one." print(remove_numbers(input_str))

`

**Output:

There are balls in this bag and in the other one.

**4. Converting Numerical Values

We can also convert the numbers into words. This can be done by using the inflect library.

Python `

p = inflect.engine()

def convert_number(text): temp_str = text.split() new_string = [] for word in temp_str: if word.isdigit(): new_string.append(p.number_to_words(word)) else: new_string.append(word) return ' '.join(new_string)

input_str = "There are 3 balls in this bag, and 12 in the other one." print(convert_number(input_str))

`

**Output:

There are three balls in this bag and twelve in the other one.

**5. Removing Punctuation

We remove punctuations so that we don't have different forms of the same word. For example if we don't remove the punctuation then been. been, been! will be treated separately.

Python `

def remove_punctuation(text): translator = str.maketrans('', '', string.punctuation) return text.translate(translator)

input_str = "Hey, did you know that the summer break is coming? Amazing right !! It's only 5 more days !!" print(remove_punctuation(input_str))

`

**Output:

Hey did you know that the summer break is coming Amazing right Its only 5 more days

**6. Removing Whitespace

We can use the join and split functions to remove all the white spaces in a string.

Python `

def remove_whitespace(text): return " ".join(text.split())

input_str = "we don't need the given questions" print(remove_whitespace(input_str))

`

**Output:

we don't need the given questions

**7. Removing Stopwords

Stopwords are words that do not contribute much to the meaning of a sentence hence they can be removed.

Python `

nltk.download('punkt') nltk.download('stopwords') nltk.download('punkt_tab')

def remove_stopwords(text): stop_words = set(stopwords.words("english")) word_tokens = word_tokenize(text) filtered_text = [word for word in word_tokens if word.lower() not in stop_words] return filtered_text

example_text = "This is a sample sentence and we are going to remove the stopwords from this." print(remove_stopwords(example_text))

`

**Output:

['sample', 'sentence', 'going', 'remove', 'stopwords', '.']

**8. Applying Stemming

Stemming is the process of getting the root form of a word. Stem or root is the part to which affixes like -ed, -ize, -de, -s, etc are added. The stem of a word is created by removing the prefix or suffix of a word.

Python `

stemmer = PorterStemmer() def stem_words(text): word_tokens = word_tokenize(text) stems = [stemmer.stem(word) for word in word_tokens] return stems

text = "data science uses scientific methods algorithms and many types of processes" print(stem_words(text))

`

**Output:

['data', 'scienc', 'use', 'scientif', 'method', 'algorithm', 'and', 'mani', 'type', 'of', 'process']

**9. Applying Lemmatization

Lemmatization is an NLP technique that reduces a word to its root form. This can be helpful for tasks such as text analysis and search as it allows us to compare words that are related but have different forms

Python `

nltk.download('wordnet') lemmatizer = WordNetLemmatizer() def lemma_words(text): word_tokens = word_tokenize(text) lemmas = [lemmatizer.lemmatize(word) for word in word_tokens] return lemmas

input_str = "data science uses scientific methods algorithms and many types of processes" print(lemma_words(input_str))

`

**Output:

['data', 'science', 'us', 'scientific', 'method', 'algorithm', 'and', 'many', 'type', 'of', 'process']

**10. POS Tagging

POS tagging is the process of assigning each word in a sentence its grammatical category, such as noun, verb, adjective or adverb.

Python `

import nltk from nltk.tokenize import word_tokenize from nltk import pos_tag import os import sys

nltk_data_dir = '/usr/local/share/nltk_data' if nltk_data_dir not in nltk.data.path: nltk.data.path.append(nltk_data_dir)

nltk.download('averaged_perceptron_tagger_eng') def pos_tagging(text): word_tokens = word_tokenize(text) return pos_tag(word_tokens)

input_str = "Data science combines statistics, programming, and machine learning." print(pos_tagging(input_str))

`

**Output:

[('Data', 'NNP'), ('science', 'NN'), ('combines', 'NNS'), ('statistics', 'NNS'), (',', ','), ('programming', 'NN'), (',', ','), ('and', 'CC'), ('machine', 'NN'), ('learning', 'NN'), ('.', '.')]

**Where,