Inverted Index (original) (raw)

Last Updated : 18 Apr, 2026

An Inverted Index is a data structure used in information retrieval systems to efficiently retrieve documents or web pages containing a specific term or set of terms. In an inverted index, the index is organized by terms (words), and each term points to a list of documents or web pages that contain that term.

**Note: Inverted indexes are widely used in search engines, database systems, and other applications where efficient text search is required.

An inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a document or a set of documents. In simple words, it is a hashmap-like data structure that directs you from a word to a document or a web page. They are especially useful for large collections of documents, where searching through all the documents would be prohibitively slow.

Features of Inverted Indexes

Example: Consider the following documents.

To create an **inverted index for these documents, we first tokenize the documents into terms, as follows.

**Document 1: The quick brown fox jumped over the lazy dog.
**Document 2: The lazy dog slept in the sun.

Next, we create an index of the terms, where each term points to a list of documents that contain that term, as follows.

The -> Document 1, Document 2
Quick -> Document 1
Brown -> Document 1
Fox -> Document 1
Jumped -> Document 1
Over -> Document 1
Lazy -> Document 1, Document 2
Dog -> Document 1, Document 2
Slept -> Document 2
In -> Document 2
Sun -> Document 2

To search for documents containing a particular term or set of terms, the search engine queries the inverted index for those terms and retrieves the list of documents associated with each term. The search engine can then use this information to rank the documents based on relevance to the query and present them to the user in order of importance.

There are two types of inverted indexes:

Suppose we want to search the texts "hello everyone, " "this article is based on an inverted index, " and "which is **hashmap-like data structure". If we index by (text, word within the text), the index with a location in the text is:

hello (1, 1)
everyone (1, 2)
this (2, 1)
article (2, 2)
is (2, 3); (3, 2)
based (2, 4)
on (2, 5)
inverted (2, 6)
index (2, 7)
which (3, 1)
hashmap (3, 3)
like (3, 4)
data (3, 5)
structure (3, 6)

The word "hello" is in document 1 ("hello everyone") starting at word 1, so has an entry (1, 1), and the word "is" is in documents 2 and 3 at '3rd' and '2nd' positions respectively (here position is based on the word).

**Note: The index may have weights, frequencies, or other indicators.

**Steps to Build an Inverted Index

**Example:

Words Document
ant doc1
demo doc2
world doc1, doc2

Implementing Inverted Index

Python `

Define the documents

document1 = "The quick brown fox jumped over the lazy dog." document2 = "The lazy dog slept in the sun."

Step 1: Tokenize the documents

Convert each document to lowercase and split it into words

tokens1 = document1.lower().split() tokens2 = document2.lower().split()

Combine the tokens into a list of unique terms

terms = list(set(tokens1 + tokens2))

Step 2: Build the inverted index

Create an empty dictionary to store the inverted index

inverted_index = {}

For each term, find the documents that contain it

for term in terms: documents = [] if term in tokens1: documents.append("Document 1") if term in tokens2: documents.append("Document 2") inverted_index[term] = documents

Step 3: Print the inverted index

for term, documents in inverted_index.items(): print(term, "->", ", ".join(documents))

`

**Explanation of the Above Code

The first two lines define two sample documents to be used as input to the algorithm.

**Step 1: Tokenize the input documents by converting them to lowercase and splitting them into individual words. Then combine the resulting tokens from both documents into a single list of unique terms.

**Step 2: Create an empty dictionary to store the inverted index, and then iterate through each term in the list of unique terms. For each term, create an empty list of documents, and then check if the term appears in each input document.

**Note: If the term appears in a document, add the document to the list for that term. Finally, add an entry to the inverted index dictionary for the current term, with the list of documents that contain that term as its value.

**Step 3: Iterate through the entries in the inverted index dictionary and print out each term along with the list of documents that contain it.

Output

jumped -> Document 1 fox -> Document 1 lazy -> Document 1, Document 2 the -> Document 1, Document 2 in -> Document 2 dog. -> Document 1 quick -> Document 1 dog -> Document 2 slept -> Document 2 sun. -> Document 2 brown -> Document 1 over -> Document 1

**Advantages

Disadvantages

Read related article - Difference b/w Inverted and Forward Index