Twitter Sentiment Analysis using Python (original) (raw)

Last Updated : 9 Jul, 2025

Twitter Sentiment Analysis is the process of using Python to understand the emotions or opinions expressed in tweets automatically. By analyzing the text we can classify tweets as positive, negative or neutral. This helps businesses and researchers track public mood, brand reputation or reactions to events in real time. Python libraries like TextBlob, Tweepy and NLTK make it easy to collect tweets, process the text and perform sentiment analysis efficiently.

How is Twitter Sentiment Analysis Useful?

Step by Step Implementation

Step 1: Install Necessary Libraries

This block installs and imports the required libraries. It uses pandas to load and handle data, TfidfVectorizer to turn text into numbers and scikit learn to train model.

Python `

pip install pandas scikit-learn import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split from sklearn.naive_bayes import BernoulliNB from sklearn.linear_model import LogisticRegression from sklearn.svm import LinearSVC from sklearn.metrics import accuracy_score, classification_report

`

Step 2: Load Dataset

df = pd.read_csv('training.1600000.processed.noemoticon.csv.zip', encoding='latin-1', header=None) df = df[[0, 5]] df.columns = ['polarity', 'text'] print(df.head())

`

**Output:

Output

Output

Step 3: Keep Only Positive and Negative Sentiments

df = df[df.polarity != 2]

df['polarity'] = df['polarity'].map({0: 0, 4: 1})

print(df['polarity'].value_counts())

`

**Output:

Screenshot-2025-07-09-092140

Output

Step 4: Clean the Tweets

def clean_text(text): return text.lower()

df['clean_text'] = df['text'].apply(clean_text)

print(df[['text', 'clean_text']].head())

`

**Output:

Output

Output

Step 5: Train Test Split

X_train, X_test, y_train, y_test = train_test_split( df['clean_text'], df['polarity'], test_size=0.2, random_state=42 )

print("Train size:", len(X_train)) print("Test size:", len(X_test))

`

**Output:

Train size: 1280000
Test size: 320000

Step 6: Perform Vectorization

vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1,2))

X_train_tfidf = vectorizer.fit_transform(X_train) X_test_tfidf = vectorizer.transform(X_test)

print("TF-IDF shape (train):", X_train_tfidf.shape) print("TF-IDF shape (test):", X_test_tfidf.shape)

`

**Output:

TF-IDF shape (train): (1280000, 5000)
TF-IDF shape (test): (320000, 5000)

Step 7: Train Bernoulli Naive Bayes model

bnb = BernoulliNB() bnb.fit(X_train_tfidf, y_train)

bnb_pred = bnb.predict(X_test_tfidf)

print("Bernoulli Naive Bayes Accuracy:", accuracy_score(y_test, bnb_pred)) print("\nBernoulliNB Classification Report:\n", classification_report(y_test, bnb_pred))

`

**Output:

Output

Output

Step 9: Train Support Vector Machine (SVM) model

svm = LinearSVC(max_iter=1000) svm.fit(X_train_tfidf, y_train)

svm_pred = svm.predict(X_test_tfidf)

print("SVM Accuracy:", accuracy_score(y_test, svm_pred)) print("\nSVM Classification Report:\n", classification_report(y_test, svm_pred))

`

**Output:

Output

Output

Step 10: Train Logistic Regression model

logreg = LogisticRegression(max_iter=100) logreg.fit(X_train_tfidf, y_train)

logreg_pred = logreg.predict(X_test_tfidf)

print("Logistic Regression Accuracy:", accuracy_score(y_test, logreg_pred)) print("\nLogistic Regression Classification Report:\n", classification_report(y_test, logreg_pred))

`

**Output:

Output

Output

Step 11: Make Predictions on sample Tweets

sample_tweets = ["I love this!", "I hate that!", "It was okay, not great."] sample_vec = vectorizer.transform(sample_tweets)

print("\nSample Predictions:") print("BernoulliNB:", bnb.predict(sample_vec)) print("SVM:", svm.predict(sample_vec)) print("Logistic Regression:", logreg.predict(sample_vec))

`

**Output:

Output

Output

We can see that our models are working fine and giving same predictions even with different approaches.

You can download the Source code from here- Twitter Sentiment Analysis using Python