Scraping Indeed Job Data Using Python (original) (raw)

Last Updated : 23 Jul, 2025

In this article, we are going to see how to scrape Indeed job data using python. Here we will use Beautiful Soup and the request module to scrape the data.

Module needed

pip install bs4

pip install requests

Approach:

Syntax:

requests.get(url, args)

In the given image we see the link, where we search the job and its location then the URL becomes something like this https://in.indeed.com/jobs?q="+job+"&l="+Location, Hence we will format our string into this format.

Syntax: soup = BeautifulSoup(r.content, 'html5lib')

Parameters:

Functions used:

The code for this implementation is divided into user defined functions to increase the readability of the code and add ease of use.

Program:

Python3 `

import module

import requests from bs4 import BeautifulSoup

user define function

Scrape the data

and get in string

def getdata(url): r = requests.get(url) return r.text

Get Html code using parse

def html_code(url):

# pass the url
# into getdata function
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')

# return html code
return(soup)

filter job data using

find_all function

def job_data(soup):

# find the Html tag
# with find()
# and convert into string
data_str = ""
for item in soup.find_all("a", class_="jobtitle turnstileLink"):
    data_str = data_str + item.get_text()
result_1 = data_str.split("\n")
return(result_1)

filter company_data using

find_all function

def company_data(soup):

# find the Html tag
# with find()
# and convert into string
data_str = ""
result = ""
for item in soup.find_all("div", class_="sjcl"):
    data_str = data_str + item.get_text()
result_1 = data_str.split("\n")

res = []
for i in range(1, len(result_1)):
    if len(result_1[i]) > 1:
        res.append(result_1[i])
return(res)

driver nodes/main function

if name == "main":

# Data for URL
job = "data+science+internship"
Location = "Noida%2C+Uttar+Pradesh"
url = "https://in.indeed.com/jobs?q="+job+"&l="+Location

# Pass this URL into the soup
# which will return
# html string
soup = html_code(url)

# call job and company data
# and store into it var
job_res = job_data(soup)
com_res = company_data(soup)

# Traverse the both data
temp = 0
for i in range(1, len(job_res)):
    j = temp
    for j in range(temp, 2+temp):
        print("Company Name and Address : " + com_res[j])

    temp = j
    print("Job : " + job_res[i])
    print("-----------------------------")

`

Output: