BeautifulSoup Scraping Paragraphs from HTML (original) (raw)
Last Updated : 23 Jul, 2025
In this article, we will discuss how to scrap paragraphs from HTML using Beautiful Soup
Method 1: using bs4 and urllib.
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. For installing the module-
pip install bs4.
- urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.
pip install urllib
Below is the implementation:
Python3 `
importing modules
import urllib.request from bs4 import BeautifulSoup
providing url
url = "https://www.geeksforgeeks.org/python/how-to-automate-an-excel-sheet-in-python/"
opening the url for reading
html = urllib.request.urlopen(url)
parsing the html file
htmlParse = BeautifulSoup(html, 'html.parser')
getting all the paragraphs
for para in htmlParse.find_all("p"): print(para.get_text())
`
Output:

Methods 2: using requests and bs4
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import module
- Create an HTML document and specify the ā
ā tag into the code
- Pass the HTML document into the Beautifulsoup() function
- Use the āPā tag to extract paragraphs from the Beautifulsoup object
- Get text from the HTML document with get_text().
Code:
Python3 `
import module
import requests import pandas as pd from bs4 import BeautifulSoup
link for extract html data
def getdata(url): r = requests.get(url) return r.text
htmldata = getdata("https://www.geeksforgeeks.org/python/how-to-automate-an-excel-sheet-in-python/") soup = BeautifulSoup(htmldata, 'html.parser') data = '' for data in soup.find_all("p"): print(data.get_text())
`
Output: