Build a Knowledge Graph in NLP (original) (raw)

Last Updated : 23 Jul, 2025

A **knowledge graph is a structured representation of knowledge that captures relationships and entities in a way that allows machines to understand and reason about information in the context of **natural language processing. This powerful concept has gained prominence in recent years because of the frequent rise of semantic web technologies and advancements in machine learning. Knowledge graphs in NLP aim to model real-world entities and the relationships between them, providing a contextual understanding of information extracted from text data. This enables more sophisticated and nuanced language understanding, making it a valuable tool for various NLP applications. In this article, we will discuss knowledge graphs and see the process of implementation.

What is a Knowledge graph?

A knowledge graph is a graph-based knowledge representation that connects entities through relationships. These graphs are useful as we can integrate the generated knowledge graph with natural language processing models for tasks like question answering, summarization, or context-aware language understanding.

Key Steps in Knowledge graph:

But to generate knowledge graphs, we need to perform several steps, which are discussed below:

  1. **Data Acquisition: Gathering relevant textual data from diverse sources, which could include books, articles, websites, or domain-specific documents.
  2. **Entity Recognition: Then we need to use NLP techniques to identify entities (e.g., people, organizations, locations) within the text. Named Entity Recognition (NER) is an advanced method for this step.
  3. **Relation Extraction: Determining the relationships between identified entities This can involve parsing the syntactic and semantic structure of sentences to extract meaningful connections, which is called relationship extraction.
  4. **Graph Construction: Finally, building a graph structure where entities are nodes and relationships are edges. This step involves organizing the extracted information into a coherent graph representation. For advanced cases, we can enhance the graph by incorporating additional information like entity attributes, sentiment analysis or contextual details derived from the text but that are very complex, time-consuming and costly tasks.

What are the benefits of building a knowledge graph?

Some of the key benefits of the Knowledge graph are as follows:

Knowledge Graph step-by-step implementation

Importing required modules

At first, we need to import all required Python modules like Pandas, Matplotlib, Networkx and NLTK etc.

Python3 `

import pandas as pd import networkx as nx import matplotlib.pyplot as plt from nltk import sent_tokenize, word_tokenize from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer import nltk

`

Downloading NLTK resources

As we have discussed previously that generating knowledge graph requires several NLP processing so we need to download some extra resources which will be used to pre-process the sentence texts.

Python3 `

Download NLTK resources

nltk.download('punkt') nltk.download('stopwords') nltk.download('wordnet')

`

**Output:

[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Package wordnet is already up-to-date!

Dataset loading

For this implementation, we will use a custom dataset or synthetic dataset for simple visualization. Then we will initialize the wordNet lemmatizer to preprocess the sentences using a small function (preprocess_text).

Python3 `

Create a small custom dataset with sentences

data = { 'sentence': ["Sandeep Jain founded GeeksforGeeks.", "GeeksforGeeks is also known as GFG.", "GeeksforGeeks is a website.", "Authors write for GFG."], 'source': ["Sandeep Jain", "GeeksforGeeks", "GeeksforGeeks", "Authors"], 'target': ["GeeksforGeeks", "GFG", "website", "GFG"], 'relation': ["founded", "known as", "is", "write for"], }

df = pd.DataFrame(data) print(df)

`

**Output:

                          sentence         source         target  \

0 Sandeep Jain founded GeeksforGeeks. Sandeep Jain GeeksforGeeks
1 GeeksforGeeks is also known as GFG. GeeksforGeeks GFG
2 GeeksforGeeks is a website. GeeksforGeeks website
3 Authors write for GFG. Authors GFG
relation
0 founded
1 known as
2 is
3 write for

Data pre-processing

Python3 `

NLP Preprocessing

stop_words = set(stopwords.words('english')) lemmatizer = WordNetLemmatizer()

def preprocess_text(text): words = [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(text) if word.isalnum() and word.lower() not in stop_words] return ' '.join(words)

Apply preprocessing to sentences in the dataframe

df['processed_sentence'] = df['sentence'].apply(preprocess_text) print(df)

`

**Output:

                          sentence         source         target  \

0 Sandeep Jain founded GeeksforGeeks. Sandeep Jain GeeksforGeeks
1 GeeksforGeeks is also known as GFG. GeeksforGeeks GFG
2 GeeksforGeeks is a website. GeeksforGeeks website
3 Authors write for GFG. Authors GFG
relation processed_sentence
0 founded sandeep jain founded geeksforgeeks
1 known as geeksforgeeks also known gfg
2 is geeksforgeeks website
3 write for author write gfg

Knowlwdge Graph Edges adding loop

Now we will define a for loop to iterate over the dataset and extracting the subject, object and relationships from each sentences. This step is very important because here we will create the nodes of the graph and their corresponding relationships will create the edges of the graph.

Python3 `

Initialize a directed graph

G = nx.DiGraph()

Add edges to the graph based on predefined source, target and relations

for _, row in df.iterrows(): source = row['source'] target = row['target'] relation = row['relation']

G.add_node(source)
G.add_node(target)
G.add_edge(source, target, relation=relation)

`

Visualizing the knowledge graph

We have already got the nodes and edges of our knowledge graph. Now it is time to just draw the graph for visualization. We will different node colors to make the graph more understandable. We will calculate node degree which is the number to connection one node have to assign different colors to less connected nodes and strong connected nodes.

Python3 `

Visualize the knowledge graph with colored nodes

Calculate node degrees

node_degrees = dict(G.degree)

Assign colors based on node degrees

node_colors = ['lightgreen' if degree == max(node_degrees.values()) else 'lightblue' for degree in node_degrees.values()]

Adjust the layout for better spacing

pos = nx.spring_layout(G, seed=42, k=1.5)

labels = nx.get_edge_attributes(G, 'relation') nx.draw(G, pos, with_labels=True, font_weight='bold', node_size=700, node_color=node_colors, font_size=8, arrowsize=10) nx.draw_networkx_edge_labels(G, pos, edge_labels=labels, font_size=8) plt.show()

`

**Output:

Knowlwdge Graph-Geeksforgeeks

The generated knowledge graph

Conclusion

We can conclude that building a knowledge graph in NLP consisting of several steps. But we can make it easier by using Python modules of NLP processing and these graphs are very important for various real-time applications. However, we can face various challenges in the time of utilizing Knowledge graphs like data integration, maintaining quality and accuracy, scalability and storage, semantic heterogeneity and more.

Knowledge graphs aim to represent entities and relationships in continuous vector spaces which provide more clear understanding of semantic relationships and in future, knowledge graph may dynamically evolve to adapt to real-time changes, enabling system to stay current and responsive to dynamic environments.