Building a Text Classification System — TextBlob 0.19.0 documentation (original) (raw)

The textblob.classifiers module makes it simple to create custom classifiers.

As an example, let’s create a custom sentiment analyzer.

Loading Data and Creating a Classifier

First we’ll create some training and test data.

train = [ ... ("I love this sandwich.", "pos"), ... ("this is an amazing place!", "pos"), ... ("I feel very good about these beers.", "pos"), ... ("this is my best work.", "pos"), ... ("what an awesome view", "pos"), ... ("I do not like this restaurant", "neg"), ... ("I am tired of this stuff.", "neg"), ... ("I can't deal with this", "neg"), ... ("he is my sworn enemy!", "neg"), ... ("my boss is horrible.", "neg"), ... ] test = [ ... ("the beer was good.", "pos"), ... ("I do not enjoy my job", "neg"), ... ("I ain't feeling dandy today.", "neg"), ... ("I feel amazing!", "pos"), ... ("Gary is a friend of mine.", "pos"), ... ("I can't believe I'm doing this.", "neg"), ... ]

Now we’ll create a Naive Bayes classifier, passing the training data into the constructor.

from textblob.classifiers import NaiveBayesClassifier cl = NaiveBayesClassifier(train)

Loading Data from Files

You can also load data from common file formats including CSV, JSON, and TSV.

CSV files should be formatted like so:

I love this sandwich.,pos This is an amazing place!,pos I do not like this restaurant,neg

JSON files should be formatted like so:

[ {"text": "I love this sandwich.", "label": "pos"}, {"text": "This is an amazing place!", "label": "pos"}, {"text": "I do not like this restaurant", "label": "neg"} ]

You can then pass the opened file into the constructor.

with open('train.json', 'r') as fp: ... cl = NaiveBayesClassifier(fp, format="json")

Classifying Text

Call the classify(text) method to use the classifier.

cl.classify("This is an amazing library!") 'pos'

You can get the label probability distribution with the prob_classify(text) method.

prob_dist = cl.prob_classify("This one's a doozy.") prob_dist.max() 'pos' round(prob_dist.prob("pos"), 2) 0.63 round(prob_dist.prob("neg"), 2) 0.37

Classifying TextBlobs

Another way to classify text is to pass a classifier into the constructor of TextBlob and call its classify() method.

from textblob import TextBlob blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=cl) blob.classify() 'pos'

The advantage of this approach is that you can classify sentences within a TextBlob.

for s in blob.sentences: ... print(s) ... print(s.classify()) ... The beer is good. pos But the hangover is horrible. neg

Evaluating Classifiers

To compute the accuracy on our test set, use the accuracy(test_data) method.

cl.accuracy(test) 0.8333333333333334

Note

You can also pass in a file object into the accuracy method. The file can be in any of the formats listed in the Loading Data section.

Use the show_informative_features() method to display a listing of the most informative features.

cl.show_informative_features(5) Most Informative Features contains(my) = True neg : pos = 1.7 : 1.0 contains(an) = False neg : pos = 1.6 : 1.0 contains(I) = True neg : pos = 1.4 : 1.0 contains(I) = False pos : neg = 1.4 : 1.0 contains(my) = False pos : neg = 1.3 : 1.0

Updating Classifiers with New Data

Use the update(new_data) method to update a classifier with new training data.

new_data = [ ... ("She is my best friend.", "pos"), ... ("I'm happy to have a new friend.", "pos"), ... ("Stay thirsty, my friend.", "pos"), ... ("He ain't from around here.", "neg"), ... ] cl.update(new_data) True cl.accuracy(test) 1.0

Next Steps

Be sure to check out the API Reference for the classifiers module.

Want to try different POS taggers or noun phrase chunkers with TextBlobs? Check out the Advanced Usage guide.