Twitter Sentiment Analysis using Python (original) (raw)

Last Updated : 9 Jul, 2025

Twitter Sentiment Analysis is the process of using Python to understand the emotions or opinions expressed in tweets automatically. By analyzing the text we can classify tweets as positive, negative or neutral. This helps businesses and researchers track public mood, brand reputation or reactions to events in real time. Python libraries like TextBlob, Tweepy and NLTK make it easy to collect tweets, process the text and perform sentiment analysis efficiently.

How is Twitter Sentiment Analysis Useful?

Twitter Sentiment Analysis is important because it helps people and businesses understand what the public thinks in real time.
Millions of tweets are posted every day, sharing opinions about brands, products, events or social issues. By analyzing this huge stream of data, companies can measure customer satisfaction, spot trends early, handle negative feedback quickly and make better decisions based on how people actually feel.
It’s also useful for researchers and governments to monitor public mood during elections, crises or big events as it turns raw tweets into valuable insights.

Step by Step Implementation

Step 1: Install Necessary Libraries

This block installs and imports the required libraries. It uses pandas to load and handle data, TfidfVectorizer to turn text into numbers and scikit learn to train model.

Python `

pip install pandas scikit-learn import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split from sklearn.naive_bayes import BernoulliNB from sklearn.linear_model import LogisticRegression from sklearn.svm import LinearSVC from sklearn.metrics import accuracy_score, classification_report

Step 2: Load Dataset

Here we loads the Sentiment140 dataset from a zipped CSV file, you can download it from Kaggle.
We keep only the polarity and tweet text columns, renames them for clarity and prints the first few rows to check the data. Python `

df = pd.read_csv('training.1600000.processed.noemoticon.csv.zip', encoding='latin-1', header=None) df = df[[0, 5]] df.columns = ['polarity', 'text'] print(df.head())

**Output:

Output

Step 3: Keep Only Positive and Negative Sentiments

Here we removes neutral tweets where polarity is 2, maps the labels so 0 stays negative and 4 becomes 1 for positive.
Then we print how many positive and negative tweets are left in the data. Python `

df = df[df.polarity != 2]

df['polarity'] = df['polarity'].map({0: 0, 4: 1})

print(df['polarity'].value_counts())

**Output:

Screenshot-2025-07-09-092140

Output

Step 4: Clean the Tweets

Here we define a simple function to convert all text to lowercase for consistency, applies it to every tweet in the dataset.
Then shows the original and cleaned versions of the first few tweets. Python `

def clean_text(text): return text.lower()

df['clean_text'] = df['text'].apply(clean_text)

print(df[['text', 'clean_text']].head())

**Output:

Output

Step 5: Train Test Split

This code splits the clean_text and polarity columns into training and testing sets using an 80/20 split.
random_state=42 ensures reproducibility. Python `

X_train, X_test, y_train, y_test = train_test_split( df['clean_text'], df['polarity'], test_size=0.2, random_state=42 )

print("Train size:", len(X_train)) print("Test size:", len(X_test))

**Output:

Train size: 1280000
Test size: 320000

Step 6: Perform Vectorization

This code creates a TF IDF vectorizer that converts text into numerical features using unigrams and bigrams limited to 5000 features.
It fits and transforms the training data and transforms the test data and then prints the shapes of the resulting TF IDF matrices. Python `

vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1,2))

X_train_tfidf = vectorizer.fit_transform(X_train) X_test_tfidf = vectorizer.transform(X_test)

print("TF-IDF shape (train):", X_train_tfidf.shape) print("TF-IDF shape (test):", X_test_tfidf.shape)

**Output:

TF-IDF shape (train): (1280000, 5000)
TF-IDF shape (test): (320000, 5000)

Step 7: Train Bernoulli Naive Bayes model

Here we train a Bernoulli Naive Bayes classifier on the TF IDF features from the training data.
It predicts sentiments for the test data and then prints the accuracy and a detailed classification report. Python `

bnb = BernoulliNB() bnb.fit(X_train_tfidf, y_train)

bnb_pred = bnb.predict(X_test_tfidf)

print("Bernoulli Naive Bayes Accuracy:", accuracy_score(y_test, bnb_pred)) print("\nBernoulliNB Classification Report:\n", classification_report(y_test, bnb_pred))

**Output:

Output

Step 9: Train Support Vector Machine (SVM) model

This code trains a Support Vector Machine (SVM) with a maximum of 1000 iterations on the TF IDF features.
It predicts test labels then prints the accuracy and a detailed classification report showing how well the SVM performed. Python `

svm = LinearSVC(max_iter=1000) svm.fit(X_train_tfidf, y_train)

svm_pred = svm.predict(X_test_tfidf)

print("SVM Accuracy:", accuracy_score(y_test, svm_pred)) print("\nSVM Classification Report:\n", classification_report(y_test, svm_pred))

**Output:

Output

Step 10: Train Logistic Regression model

This code trains a Logistic Regression model with up to 100 iterations on the TF IDF features.
It predicts sentiment labels for the test data and prints the accuracy and detailed classification report for model evaluation. Python `

logreg = LogisticRegression(max_iter=100) logreg.fit(X_train_tfidf, y_train)

logreg_pred = logreg.predict(X_test_tfidf)

print("Logistic Regression Accuracy:", accuracy_score(y_test, logreg_pred)) print("\nLogistic Regression Classification Report:\n", classification_report(y_test, logreg_pred))

**Output:

Output

Step 11: Make Predictions on sample Tweets

This code takes three sample tweets and transforms them into TF IDF features using the same vectorizer.
It then predicts their sentiment using the trained BernoulliNB, SVM and Logistic Regression models and prints the results for each classifier.
Where 1 stands for Positive and 0 for Negative. C++ `

sample_tweets = ["I love this!", "I hate that!", "It was okay, not great."] sample_vec = vectorizer.transform(sample_tweets)

print("\nSample Predictions:") print("BernoulliNB:", bnb.predict(sample_vec)) print("SVM:", svm.predict(sample_vec)) print("Logistic Regression:", logreg.predict(sample_vec))

**Output:

Output

We can see that our models are working fine and giving same predictions even with different approaches.

You can download the Source code from here- Twitter Sentiment Analysis using Python