BeautifulSoup4 Module Python (original) (raw)

Last Updated : 23 Jul, 2025

**BeautifulSoup4 is a user-friendly Python library designed for parsing HTML and XML documents. It simplifies the process of web scraping by allowing developers to effortlessly navigate, search and modify the parse tree of a webpage. With BeautifulSoup4, we can extract specific elements, attributes and text from complex web pages using intuitive methods. This library abstracts away the complexities of HTML and XML structures, enabling us to focus on retrieving and processing the data we need. BeautifulSoup4 supports multiple parsers (like Python’s built-in html.parser, lxml, and html5lib), giving us the flexibility to choose the best tool for our task. Whether we’re gathering data for research, automating data extraction or building web applications.

**For example:

Python `

from bs4 import BeautifulSoup

html_doc = """

Test Page

Hello, BeautifulSoup!

"""

Parsing the HTML content

soup = BeautifulSoup(html_doc, 'html.parser') print(soup.title)

**Output:

Test Page

**Explanation:

BeautifulSoup() function parses the provided HTML content.
Accessing soup.title retrieves the tag from the HTML.</li> </ul> <p>Table of Content</p> <ul> <li><a href="#syntax-of-beautifulsoup4" title="null">Syntax of BeautifulSoup4</a></li> <li><a href="#parsing-html-with-beautifulsoup4" title="null">Parsing HTML with BeautifulSoup4</a></li> <li><a href="#extracting-data-with-beautifulsoup4" title="null">Extracting Data with BeautifulSoup4</a></li> <li><a href="#navigating-the-parse-tree-with-beautifulsoup4" title="null">Navigating the Parse Tree with BeautifulSoup4</a></li> <li><a href="#using-css-selectors-with-beautifulsoup4" title="null">Using CSS Selectors with BeautifulSoup4</a></li> </ul> <h2 id="importing-beautifulsoup4"><a class="anchor" aria-hidden="true" tabindex="-1" href="#importing-beautifulsoup4"><svg class="octicon octicon-link" viewBox="0 0 16 16" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Importing BeautifulSoup4</h2><blockquote> <p>from bs4 import BeautifulSoup<br>soup = BeautifulSoup(html_doc, 'html.parser')</p> </blockquote> <p>**<strong>Parameters</strong> :</p> <ul> <li>html_doc is a string containing the HTML or XML content to be parsed.</li> <li>'html.parser' is the parser to use. (Alternatives include 'lxml' or 'html5lib'.)</li> </ul> <p>**<strong>Return Type</strong> : Returns a BeautifulSoup object that represents the parsed document.</p> <h2 id="parsing-html-with-beautifulsoup4"><a class="anchor" aria-hidden="true" tabindex="-1" href="#parsing-html-with-beautifulsoup4"><svg class="octicon octicon-link" viewBox="0 0 16 16" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Parsing HTML with BeautifulSoup4</h2><p>BeautifulSoup4 converts raw HTML content into a navigable parse tree.</p> <p>Python `</p> <p>from bs4 import BeautifulSoup</p> <p>html_doc = """</p> <html> <head> <title>Page Title
Welcome to BeautifulSoup4

This is a sample page.
"""
Parsing the HTML content
soup = BeautifulSoup(html_doc, 'html.parser')

Finding the first
tag
header = soup.find('h1') print(header.text)

`

**Output:

Welcome to BeautifulSoup4

**Explanation:
- find() method searches for the first
  tag in the document.
- Printing header.text outputs the text content of the
  tag.
BeautifulSoup4 offers methods like find_all() to extract multiple elements from an HTML document.

Python `

from bs4 import BeautifulSoup

html_doc = """
List Example
- Item 1
- Item 2
- Item 3
"""
Parsing the HTML content
soup = BeautifulSoup(html_doc, 'html.parser')

Finding all
tags
items = soup.find_all('li') for item in items: print(item.text)

`

**Output:

Item 1
Item 2
Item 3

**Explanation:
- find_all() method retrieves all
- elements.
- Iterating through the returned list prints the text of each list item.
Navigating the Parse Tree with BeautifulSoup4
Beyond simple extraction, BeautifulSoup4 allows you to traverse the document structure using attributes like .parent, .children and .siblings.

Python `

from bs4 import BeautifulSoup

html_doc = """

Title

Paragraph content

"""
Parsing the HTML content
soup = BeautifulSoup(html_doc, 'html.parser')

Accessing the container and navigating to its parent
container = soup.find('div', class_='container') print("Parent tag:", container.parent.name)

`

**Output:

Parent tag: html

**Explanation: .parent attribute returns the immediate parent of the found tag, allowing you to traverse upwards in the DOM tree.

Using CSS Selectors with BeautifulSoup4
select() method lets you search for elements using CSS selector syntax.

Python `

from bs4 import BeautifulSoup

html_doc = """
CSS Selector Example

Info Paragraph 1

Info Paragraph 2

"""
Parsing the HTML content
soup = BeautifulSoup(html_doc, 'html.parser')

Using a CSS selector to find all
tags with class "info" inside the div with id "main"
elements = soup.select('div#main p.info') for element in elements: print(element.get_text())

`

**Output:

Info Paragraph 1
Info Paragraph 2

**Explanation:
- CSS selector 'div#main p.info' locates all
  tags with class "info" that are descendants of the
  with id "main".
- select() method returns a list of matching elements.

BeautifulSoup4 Module Python (original) (raw)

Parsing the HTML content

Welcome to BeautifulSoup4

Parsing the HTML content

Finding the first

tag

tag in the document.

tag.

Parsing the HTML content

Finding all tags

Navigating the Parse Tree with BeautifulSoup4

Title

Parsing the HTML content

Accessing the container and navigating to its parent

Using CSS Selectors with BeautifulSoup4

Parsing the HTML content

Using a CSS selector to find all tags with class "info" inside the div with id "main"

Finding all
tags

Using a CSS selector to find all
tags with class "info" inside the div with id "main"