How to read Emails from Gmail using Gmail API in Python ? (original) (raw)

Last Updated : 01 Oct, 2020

In this article, we will see how to read Emails from your Gmail using Gmail API in Python. Gmail API is a RESTful API that allows users to interact with your Gmail account and use its features with a Python script.

So, let’s go ahead and write a simple Python script to read emails.

Requirements

Installation

Install the required libraries by running these commands:

pip install –upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Run this to install Beautiful Soup:

pip install beautifulsoup4

Now, you have to set up your Google Cloud console to interact with the Gmail API. So, follow these steps:

Create a New Project

Go to APIs and Services

Go to Enable APIs and Services

Enable Gmail API

Configure Consent screen

Enter Application name

Go to Credentials

Create an OAuth Client ID

Please keep your Client ID and Client Secrets confidential.

Now, everything is set up, and we are ready to write the code. So, let’s go.

Code

Approach :

The file ‘token.pickle‘ contains the User’s access token, so, first, we will check if it exists or not. If it does not exist or is invalid, our program will open up the browser and ask for access to the User’s Gmail and save it for next time. If it exists, we will check if the token needs to be refreshed and refresh if needed.

Now, we will connect to the Gmail API with the access token. Once connected, we will request a list of messages. This will return a list of IDs of the last 100 emails (default value) for that Gmail account. We can ask for any number of Emails by passing an optional argument ‘maxResults‘.

The output of this request is a dictionary in which the value of the key ‘messages‘ is a list of dictionaries. Each dictionary contains the ID of an Email and the Thread ID.

Now, We will go through all of these dictionaries and request the Email’s content through their IDs.

This again returns a dictionary in which the key ‘payload‘ contains the main content of Email in form of Dictionary.

This dictionary contains ‘headers‘, ‘parts‘, ‘filename‘ etc. So, we can now easily find headers such as sender, subject, etc. from here. The key ‘parts‘ which is a list of dictionaries contains all the parts of the Email’s body such as text, HTML, Attached file details, etc. So, we can get the body of the Email from here. It is generally in the first element of the list.

The body is encoded in Base 64 encoding. So, we have to convert it to a readable format. After decoding it, the obtained text is in ‘lxml‘. So, we will parse it using the BeautifulSoup library and convert it to text format.

At last, we will print the Subject, Sender, and Email.

Python3

from googleapiclient.discovery import build

from google_auth_oauthlib.flow import InstalledAppFlow

from google.auth.transport.requests import Request

import pickle

import os.path

import base64

import email

from bs4 import BeautifulSoup

def getEmails():

`` creds = None

`` if os.path.exists( 'token.pickle' ):

`` with open ( 'token.pickle' , 'rb' ) as token:

`` creds = pickle.load(token)

`` if not creds or not creds.valid:

`` if creds and creds.expired and creds.refresh_token:

`` creds.refresh(Request())

`` else :

`` flow = InstalledAppFlow.from_client_secrets_file( 'credentials.json' , SCOPES)

`` creds = flow.run_local_server(port = 0 )

`` with open ( 'token.pickle' , 'wb' ) as token:

`` pickle.dump(creds, token)

`` service = build( 'gmail' , 'v1' , credentials = creds)

`` result = service.users().messages(). list (userId = 'me' ).execute()

`` messages = result.get( 'messages' )

`` for msg in messages:

`` txt = service.users().messages().get(userId = 'me' , id = msg[ 'id' ]).execute()

`` try :

`` payload = txt[ 'payload' ]

`` headers = payload[ 'headers' ]

`` for d in headers:

`` if d[ 'name' ] = = 'Subject' :

`` subject = d[ 'value' ]

`` if d[ 'name' ] = = 'From' :

`` sender = d[ 'value' ]

`` parts = payload.get( 'parts' )[ 0 ]

`` data = parts[ 'body' ][ 'data' ]

`` data = data.replace( "-" , "+" ).replace( "_" , "/" )

`` decoded_data = base64.b64decode(data)

`` soup = BeautifulSoup(decoded_data , "lxml" )

`` body = soup.body()

`` print ( "Subject: " , subject)

`` print ( "From: " , sender)

`` print ( "Message: " , body)

`` print ( '\n' )

`` except :

`` pass

getEmails()

Now, run the script with

python3 email_reader.py

This will attempt to open a new window in your default browser. If it fails, copy the URL from the console and manually open it in your browser.

Now, Log in to your Google account if you aren’t already logged in. If there are multiple accounts, you will be asked to choose one of them. Then, click on the Allow button.

Your Application asking for Permission

After the authentication has been completed, your browser will display a message: “The authentication flow has been completed. You may close this window”.

The script will start printing the Email data in the console.

You can also extend this and save the emails in separate text or csv files to make a collection of Emails from a particular sender.