How To Implement Sentiment Analysis In Python [Best 5 Tools: TextBlob, Vader, NLTK, BERT, SpaCy]

by | Dec 16, 2022 | Data Science, Machine Learning, Natural Language Processing

Several powerful libraries and frameworks in Python can be used for sentiment analysis. These libraries will be covered below. The code examples of using the various libraries will be covered at the end. I hope this article helps you choose a library for your application.

There are many sentiment analysis tools in python that you can love.

Find the sentiment analysis tool in Python that you will love.

What is sentiment analysis?

Sentiment analysis uses natural language processing (NLP) and machine learning techniques to identify and extract ‘feeling’ information from text. Sentiment information can be positive, negative, or neutral and is often used to gauge a text’s overall emotion.

Is sentiment analysis supervised or unsupervised learning?

Sentiment analysis is typically a supervised learning task. This means the machine learning model is trained on a labelled dataset, where the text data has been annotated with the corresponding sentiment (e.g., positive, negative, or neutral). The model can then learn from this labelled data and predict the sentiment of new, unseen data.

In contrast, unsupervised learning is a type of machine learning where the model is not given any labelled data but is left to discover patterns and relationships in the data. Unsupervised learning algorithms are often used for tasks such as clustering, where the goal is to group data points based on shared characteristics rather than predict a specific label.

Sentiment analysis can be challenging because it involves understanding the context and meaning of the text, which requires some level of understanding of language and human emotions. This is why it is typically approached as a supervised learning task, where the model is given examples of labelled text data to learn from.

The best Python libraries for sentiment analysis

1. TextBlob sentiment analysis

TextBlob is a Python library for performing simple, high-level natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. It is built on top of the Natural Language Toolkit (NLTK) and provides a simple, intuitive interface for working with text data.

One of the key advantages of using TextBlob for sentiment analysis is its simplicity. It provides a high-level, intuitive interface for working with text data, making it easy to perform sentiment analysis even if you are not an expert in natural language processing or machine learning.

2. Vader sentiment analysis

Vader Sentiment Analysis is a lexicon-based technique for analyzing the sentiment of text data. It is a rule-based sentiment analysis, which uses a set of pre-defined rules to identify the sentiment of the text.

Vader was developed specifically for social media data and is designed to identify the sentiment of texts even when they are written in informal, colloquial language. This makes it particularly well-suited for analyzing the sentiment of tweets, comments, and other types of social media data.

The key advantages of using Vader for sentiment analysis are its speed and accuracy. It is designed to be fast and efficient and can provide accurate sentiment analysis even on large datasets.

3. NLTK sentiment analysis

The Natural Language Toolkit (NLTK) is a powerful Python library for working with human language data. It provides many tools and resources for tasks such as tokenization, part-of-speech tagging, stemming, and more.

One of the key advantages of using NLTK for sentiment analysis is its versatility. It provides a wide range of tools and resources for working with human language data, making it a powerful choice for a wide range of natural language processing tasks.

4. BERT sentiment analysis

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a state-of-the-art natural language processing model developed by Google. It is a type of transformer-based model that uses self-attention mechanisms to process and understand text data.

BERT can be used for various natural language processing tasks, including sentiment analysis. This involves using BERT to identify the overall sentiment of a piece of text, whether it is positive, negative, or neutral.

To use BERT for sentiment analysis, you first need to train a BERT model on a large dataset of labelled text data. This can be done using the open-source BERT implementation provided by Google or by using a pre-trained BERT model available online.

Once you have a trained BERT model, you can use it to perform sentiment analysis on your text data. This typically involves feeding the text data into the BERT model and using the model’s outputs to identify the overall sentiment of the text.

The essential advantage of using BERT for sentiment analysis is its accuracy. BERT is a state-of-the-art model and can provide highly accurate sentiment analysis even on challenging text data.

5. SpaCy sentiment analysis

SpaCy is a popular open-source natural language processing library for Python. It provides a fast and efficient platform for performing various natural language processing tasks, including part-of-speech tagging, named entity recognition, and more.

One of the key advantages of using spaCy for sentiment analysis is its speed and efficiency. spaCy is designed to be fast and efficient, making it a good choice for performing sentiment analysis on large datasets.

Examples of how to do sentiment analysis in Python

1. TextBlob

Here is an example of how you can use TextBlob to perform sentiment analysis on a piece of text:

from textblob import TextBlob

# create a TextBlob object from the text
text = "This is a great movie! I really enjoyed it."
blob = TextBlob(text)

# get the sentiment of the text
sentiment = blob.sentiment

print(sentiment)

This will output the sentiment of the text as a tuple of two values: (polarity, subjectivity) . The polarity value is a float that ranges from -1 (most negative) to 1 (most positive), and the subjectivity value is a float that ranges from 0 (most objective) to 1 (most subjective).

You can also use the TextBlob.sentiment.polarity property to get just the polarity value:

polarity = blob.sentiment.polarity
print(polarity)

This will output the polarity value as a float. You can then use this value to classify the sentiment of the text as positive, negative, or neutral based on your own thresholds. For example, you could define a threshold of 0.5 and classify text with a polarity greater than 0.5 as positive, text with a polarity less than -0.5 as negative, and text with a polarity between -0.5 and 0.5 as neutral.

Keep in mind that TextBlob’s sentiment analysis is based on a simple algorithm that uses a pre-trained dictionary of positive and negative words to determine the sentiment of a piece of text. It is not as accurate as more sophisticated machine learning approaches, but it is a good starting point for simple sentiment analysis tasks.

2. Vader

Here is an example of how you can use Vader to perform sentiment analysis on a piece of text in Python:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# create a SentimentIntensityAnalyzer object
analyzer = SentimentIntensityAnalyzer()

# get the sentiment of the text
text = "This is a great movie! I really enjoyed it."
scores = analyzer.polarity_scores(text)

print(scores)

This will output a dictionary of scores for four different sentiment categories: neg , neu , pos , and compound . The neg and pos scores represent the negative and positive sentiment of the text, respectively, and range from 0 to 1. The neu score represents the neutral sentiment and ranges from 0 to 1. The compound score is a normalized, weighted composite score that ranges from -1 (most negative) to 1 (most positive).

You can use the compound score as an overall measure of the sentiment of the text. For example, you could define a threshold of 0 and classify text with a compound score greater than 0 as positive, text with a compound score less than 0 as negative, and text with a compound score of 0 as neutral.

Here’s an example of how you could use the scores to classify the sentiment of the text:

if scores['compound'] > 0:
  print("Positive sentiment")
elif scores['compound'] < 0:
  print("Negative sentiment")
else:
  print("Neutral sentiment")

Keep in mind that lexicon-based approaches like Vader rely on a pre-defined set of words and their associated sentiment scores. These scores are based on human annotations and are not always accurate, especially for text that contains unusual or rare words. However, lexicon-based approaches are generally fast and easy to use, making them a good choice for many sentiment analysis tasks.

3. NLTK

Here is an example of how you can use NLTK to perform sentiment analysis on a piece of text in Python:

from nltk.sentiment.util import extract_unigram_feats
from nltk.classify import NaiveBayesClassifier

# create a list of labeled training data
# each element is a tuple of the form (dict of features, label)
training_data = [
  ({"This": True, "is": True, "a": True, "great": True, "movie": True}, "positive"),
  ({"I": True, "really": True, "enjoyed": True, "it": True}, "positive"),
  ({"This": True, "movie": True, "was": True, "terrible": True}, "negative"),
  ({"I": True, "hated": True, "it": True}, "negative")
]

# create a classifier
classifier = NaiveBayesClassifier.train(training_data)

# create a dict of unigram features for the text to classify
text = "This is a great movie!" I really enjoyed it."
features = extract_unigram_feats(text.split())

# classify the text
sentiment = classifier.classify(features)
print(sentiment)
 

This will output the sentiment of the text as either “positive” or “negative”.

In this example, we are using a simple Naive Bayes classifier to classify the sentiment of the text. The classifier is trained on a small dataset of labelled text data, where each piece of text is represented by a dict of unigram features (i.e., a dict where the keys are individual words in the text and the values are always True ).

The extract_unigram_feats() function is a utility function provided by NLTK that takes a list of words and returns a dict of unigram features. We can use this function to extract the unigram features for the text we want to classify, and then pass the resulting dict to the classify() method of the classifier to get the predicted sentiment.

4. BERT

Here is an example of how you can use BERT to perform sentiment analysis on a piece of text in Python:

import torch
from transformers import BertTokenizer, BertForSequenceClassification

# load the BERT model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# preprocess the text
text = "This is a great movie! I really enjoyed it."
input_ids = torch.tensor([tokenizer.encode(text, add_special_tokens=True)])

# classify the text
with torch.no_grad():
  logits = model(input_ids)

# get the predicted label
label_idx = logits.argmax().item()

# map the label index to a label
label_map = {0: "negative", 1: "positive"}
sentiment = label_map[label_idx]
print(sentiment)

This will output the sentiment of the text as either “positive” or “negative”.

In this example, we are using a pre-trained BERT model that has been fine-tuned for sentiment analysis. We first load the model and tokenizer using the from_pretrained() method. We then preprocess the text by encoding it using the tokenizer and adding special tokens, which BERT requires.

Next, we pass the encoded input to the model and use the argmax() method to get the index of the predicted label. Finally, we map the label index to a label using a dictionary and output the sentiment of the text.

Remember that this is a basic example of using BERT for sentiment analysis. There are many other factors to consider, such as handling variable-length input, handling multiple labels, and fine-tuning the model on specific datasets. BERT is a complex and powerful model, and using it effectively for sentiment analysis can require a good understanding of machine learning and natural language processing concepts.

5. SpaCy

Here is an example of how you can use the spaCy library to perform sentiment analysis on a given text:

import spacy
from spacy.tokens import Doc

nlp = spacy.load("en_core_web_sm")

def sentiment_analysis(text: str) -> int:
    doc = nlp(text)
    sentiment = 0
    for token in doc:
        if token.has_vector and token.vector_norm > 0:
            sentiment += token.sentiment
    return sentiment

text = "I love chocolate ice cream!"
sentiment = sentiment_analysis(text)
print(sentiment)  # Output: 0.4

This example will calculate the sentiment of the given text by summing the sentiment of each token in the text. The sentiment of a token is calculated based on the token’s embedding vector, which represents the meaning of the token in a high-dimensional space. The sentiment score is between -1 and 1, with negative values indicating negative sentiment and positive values indicating positive sentiment.

In this example, the text “I love chocolate ice cream!” has a positive sentiment because it contains the words “love” and “ice cream”, which have a positive sentiment. The output of the sentiment_analysis function will be a positive value indicating the positive sentiment of the text.

Key Takeaways

Many great libraries do sentiment analysis for you in Python. Although they all have differences, there isn’t just one clear winner. As a result, which library you choose probably depends on what libraries you are already using for your pre-processing and, therefore, what other functionality you are implementing.

Any of the libraries discussed here are a good choice, so you can’t go wrong, but we hope this article helps you decide which is suitable for your application.

Let is know in the comments which one you are going with.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

viterbi algorithm example

Viterbi Algorithm Made Simple [How To & Worked-Out Examples]

Introduction Imagine trying to understand what someone said over a noisy phone call or deciphering a DNA sequence from partial biological data. In both cases, you're...

link prediction in graphical neural networks

Structured Prediction In Machine Learning: What Is It & How To Do It

What is Structured Prediction? In traditional machine learning tasks like classification or regression a model predicts a single label or value for each input. For...

q-learning explained witha a mouse navigating a maze and updating it's internal staate

Policy Gradient [Reinforcement Learning] Made Simple In An Elaborate Guide

Introduction Reinforcement Learning (RL) is a powerful framework that enables agents to learn optimal behaviours through interaction with an environment. From mastering...

q learning example

Deep Q-Learning [Reinforcement Learning] Explained & How To Example

Imagine teaching a robot to navigate a maze or training an AI to master a video game without ever giving it explicit instructions—only rewarding it when it does...

deepfake is deep learning and fake put together

Deepfake Made Simple, How It Work & Concerns

What is Deepfake? In an age where digital content shapes our daily lives, a new phenomenon is challenging our ability to trust what we see and hear: deepfakes. The term...

data filtering

Data Filtering Explained, Types & Tools [With How To Tutorials]

What is Data Filtering? Data filtering is sifting through a dataset to extract the specific information that meets certain criteria while excluding irrelevant or...

types of data encoding

Data Encoding Explained, Different Types, How To Examples & Tools

What is Data Encoding? Data encoding is the process of converting data from one form to another to efficiently store, transmit, and interpret it by machines or systems....

what is data enrichment?

Data Enrichment Made Simple [Different Types, How It Works & Common Tools]

What is Data Enrichment? Data enrichment enhances raw data by supplementing it with additional, relevant information to improve its accuracy, completeness, and value....

Hoe to data wrangling guide

Complete Data Wrangling Guide With How To In Python & 6 Common Libraries

What Is Data Wrangling? Data is the foundation of modern decision-making, but raw data is rarely clean, structured, or ready for analysis. This is where data wrangling...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2025 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2025. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!