BeautifulSoup Scraping Paragraphs from HTML (original) (raw)

Last Updated : 23 Jul, 2025

In this article, we will discuss how to scrap paragraphs from HTML using Beautiful Soup

Method 1: using bs4 and urllib.

Module Needed:

pip install bs4.

pip install urllib

The html file contains several tags and like the anchor tag , span tag , paragraph tag

etc. So, the beautiful soup helps us to parse the html file and get our desired output such as getting the paragraphs from a particular url/html file.

Explanation:

After importing the modules urllib and bs4 we will provide a variable with a url which is to be read, the urllib.request.urlopen() function forwards the requests to the server for opening the url. BeautifulSoup() function helps us to parse the html file or you say the encoding in html. The loop used here with find_all() finds all the tags containing paragraph tag

and the text between them are collected by the get_text() method.

Below is the implementation:

Python3 `

importing modules

import urllib.request from bs4 import BeautifulSoup

providing url

url = "https://www.geeksforgeeks.org/python/how-to-automate-an-excel-sheet-in-python/"

opening the url for reading

html = urllib.request.urlopen(url)

parsing the html file

htmlParse = BeautifulSoup(html, 'html.parser')

getting all the paragraphs

for para in htmlParse.find_all("p"): print(para.get_text())

`

Output:

Methods 2: using requests and bs4

Module Needed:

pip install bs4

pip install requests

Approach:

Code:

Python3 `

import module

import requests import pandas as pd from bs4 import BeautifulSoup

link for extract html data

def getdata(url): r = requests.get(url) return r.text

htmldata = getdata("https://www.geeksforgeeks.org/python/how-to-automate-an-excel-sheet-in-python/") soup = BeautifulSoup(htmldata, 'html.parser') data = '' for data in soup.find_all("p"): print(data.get_text())

`

Output: