How To Implement Sentiment Analysis In Python [Best 5 Tools: TextBlob, Vader, NLTK, BERT, SpaCy]

by | Dec 16, 2022 | Data Science, Machine Learning, Natural Language Processing

Several powerful libraries and frameworks in Python can be used for sentiment analysis. These libraries will be covered below. The code examples of using the various libraries will be covered at the end. I hope this article helps you choose a library for your application.

There are many sentiment analysis tools in python that you can love.

Find the sentiment analysis tool in Python that you will love.

What is sentiment analysis?

Sentiment analysis uses natural language processing (NLP) and machine learning techniques to identify and extract ‘feeling’ information from text. Sentiment information can be positive, negative, or neutral and is often used to gauge a text’s overall emotion.

Is sentiment analysis supervised or unsupervised learning?

Sentiment analysis is typically a supervised learning task. This means the machine learning model is trained on a labelled dataset, where the text data has been annotated with the corresponding sentiment (e.g., positive, negative, or neutral). The model can then learn from this labelled data and predict the sentiment of new, unseen data.

In contrast, unsupervised learning is a type of machine learning where the model is not given any labelled data but is left to discover patterns and relationships in the data. Unsupervised learning algorithms are often used for tasks such as clustering, where the goal is to group data points based on shared characteristics rather than predict a specific label.

Sentiment analysis can be challenging because it involves understanding the context and meaning of the text, which requires some level of understanding of language and human emotions. This is why it is typically approached as a supervised learning task, where the model is given examples of labelled text data to learn from.

The best Python libraries for sentiment analysis

1. TextBlob sentiment analysis

TextBlob is a Python library for performing simple, high-level natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. It is built on top of the Natural Language Toolkit (NLTK) and provides a simple, intuitive interface for working with text data.

One of the key advantages of using TextBlob for sentiment analysis is its simplicity. It provides a high-level, intuitive interface for working with text data, making it easy to perform sentiment analysis even if you are not an expert in natural language processing or machine learning.

2. Vader sentiment analysis

Vader Sentiment Analysis is a lexicon-based technique for analyzing the sentiment of text data. It is a rule-based sentiment analysis, which uses a set of pre-defined rules to identify the sentiment of the text.

Vader was developed specifically for social media data and is designed to identify the sentiment of texts even when they are written in informal, colloquial language. This makes it particularly well-suited for analyzing the sentiment of tweets, comments, and other types of social media data.

The key advantages of using Vader for sentiment analysis are its speed and accuracy. It is designed to be fast and efficient and can provide accurate sentiment analysis even on large datasets.

3. NLTK sentiment analysis

The Natural Language Toolkit (NLTK) is a powerful Python library for working with human language data. It provides many tools and resources for tasks such as tokenization, part-of-speech tagging, stemming, and more.

One of the key advantages of using NLTK for sentiment analysis is its versatility. It provides a wide range of tools and resources for working with human language data, making it a powerful choice for a wide range of natural language processing tasks.

4. BERT sentiment analysis

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a state-of-the-art natural language processing model developed by Google. It is a type of transformer-based model that uses self-attention mechanisms to process and understand text data.

BERT can be used for various natural language processing tasks, including sentiment analysis. This involves using BERT to identify the overall sentiment of a piece of text, whether it is positive, negative, or neutral.

To use BERT for sentiment analysis, you first need to train a BERT model on a large dataset of labelled text data. This can be done using the open-source BERT implementation provided by Google or by using a pre-trained BERT model available online.

Once you have a trained BERT model, you can use it to perform sentiment analysis on your text data. This typically involves feeding the text data into the BERT model and using the model’s outputs to identify the overall sentiment of the text.

The essential advantage of using BERT for sentiment analysis is its accuracy. BERT is a state-of-the-art model and can provide highly accurate sentiment analysis even on challenging text data.

5. SpaCy sentiment analysis

SpaCy is a popular open-source natural language processing library for Python. It provides a fast and efficient platform for performing various natural language processing tasks, including part-of-speech tagging, named entity recognition, and more.

One of the key advantages of using spaCy for sentiment analysis is its speed and efficiency. spaCy is designed to be fast and efficient, making it a good choice for performing sentiment analysis on large datasets.

Examples of how to do sentiment analysis in Python

1. TextBlob

Here is an example of how you can use TextBlob to perform sentiment analysis on a piece of text:

from textblob import TextBlob

# create a TextBlob object from the text
text = "This is a great movie! I really enjoyed it."
blob = TextBlob(text)

# get the sentiment of the text
sentiment = blob.sentiment

print(sentiment)

This will output the sentiment of the text as a tuple of two values: (polarity, subjectivity) . The polarity value is a float that ranges from -1 (most negative) to 1 (most positive), and the subjectivity value is a float that ranges from 0 (most objective) to 1 (most subjective).

You can also use the TextBlob.sentiment.polarity property to get just the polarity value:

polarity = blob.sentiment.polarity
print(polarity)

This will output the polarity value as a float. You can then use this value to classify the sentiment of the text as positive, negative, or neutral based on your own thresholds. For example, you could define a threshold of 0.5 and classify text with a polarity greater than 0.5 as positive, text with a polarity less than -0.5 as negative, and text with a polarity between -0.5 and 0.5 as neutral.

Keep in mind that TextBlob’s sentiment analysis is based on a simple algorithm that uses a pre-trained dictionary of positive and negative words to determine the sentiment of a piece of text. It is not as accurate as more sophisticated machine learning approaches, but it is a good starting point for simple sentiment analysis tasks.

2. Vader

Here is an example of how you can use Vader to perform sentiment analysis on a piece of text in Python:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# create a SentimentIntensityAnalyzer object
analyzer = SentimentIntensityAnalyzer()

# get the sentiment of the text
text = "This is a great movie! I really enjoyed it."
scores = analyzer.polarity_scores(text)

print(scores)

This will output a dictionary of scores for four different sentiment categories: neg , neu , pos , and compound . The neg and pos scores represent the negative and positive sentiment of the text, respectively, and range from 0 to 1. The neu score represents the neutral sentiment and ranges from 0 to 1. The compound score is a normalized, weighted composite score that ranges from -1 (most negative) to 1 (most positive).

You can use the compound score as an overall measure of the sentiment of the text. For example, you could define a threshold of 0 and classify text with a compound score greater than 0 as positive, text with a compound score less than 0 as negative, and text with a compound score of 0 as neutral.

Here’s an example of how you could use the scores to classify the sentiment of the text:

if scores['compound'] > 0:
  print("Positive sentiment")
elif scores['compound'] < 0:
  print("Negative sentiment")
else:
  print("Neutral sentiment")

Keep in mind that lexicon-based approaches like Vader rely on a pre-defined set of words and their associated sentiment scores. These scores are based on human annotations and are not always accurate, especially for text that contains unusual or rare words. However, lexicon-based approaches are generally fast and easy to use, making them a good choice for many sentiment analysis tasks.

3. NLTK

Here is an example of how you can use NLTK to perform sentiment analysis on a piece of text in Python:

from nltk.sentiment.util import extract_unigram_feats
from nltk.classify import NaiveBayesClassifier

# create a list of labeled training data
# each element is a tuple of the form (dict of features, label)
training_data = [
  ({"This": True, "is": True, "a": True, "great": True, "movie": True}, "positive"),
  ({"I": True, "really": True, "enjoyed": True, "it": True}, "positive"),
  ({"This": True, "movie": True, "was": True, "terrible": True}, "negative"),
  ({"I": True, "hated": True, "it": True}, "negative")
]

# create a classifier
classifier = NaiveBayesClassifier.train(training_data)

# create a dict of unigram features for the text to classify
text = "This is a great movie!" I really enjoyed it."
features = extract_unigram_feats(text.split())

# classify the text
sentiment = classifier.classify(features)
print(sentiment)
 

This will output the sentiment of the text as either “positive” or “negative”.

In this example, we are using a simple Naive Bayes classifier to classify the sentiment of the text. The classifier is trained on a small dataset of labelled text data, where each piece of text is represented by a dict of unigram features (i.e., a dict where the keys are individual words in the text and the values are always True ).

The extract_unigram_feats() function is a utility function provided by NLTK that takes a list of words and returns a dict of unigram features. We can use this function to extract the unigram features for the text we want to classify, and then pass the resulting dict to the classify() method of the classifier to get the predicted sentiment.

4. BERT

Here is an example of how you can use BERT to perform sentiment analysis on a piece of text in Python:

import torch
from transformers import BertTokenizer, BertForSequenceClassification

# load the BERT model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# preprocess the text
text = "This is a great movie! I really enjoyed it."
input_ids = torch.tensor([tokenizer.encode(text, add_special_tokens=True)])

# classify the text
with torch.no_grad():
  logits = model(input_ids)

# get the predicted label
label_idx = logits.argmax().item()

# map the label index to a label
label_map = {0: "negative", 1: "positive"}
sentiment = label_map[label_idx]
print(sentiment)

This will output the sentiment of the text as either “positive” or “negative”.

In this example, we are using a pre-trained BERT model that has been fine-tuned for sentiment analysis. We first load the model and tokenizer using the from_pretrained() method. We then preprocess the text by encoding it using the tokenizer and adding special tokens, which BERT requires.

Next, we pass the encoded input to the model and use the argmax() method to get the index of the predicted label. Finally, we map the label index to a label using a dictionary and output the sentiment of the text.

Remember that this is a basic example of using BERT for sentiment analysis. There are many other factors to consider, such as handling variable-length input, handling multiple labels, and fine-tuning the model on specific datasets. BERT is a complex and powerful model, and using it effectively for sentiment analysis can require a good understanding of machine learning and natural language processing concepts.

5. SpaCy

Here is an example of how you can use the spaCy library to perform sentiment analysis on a given text:

import spacy
from spacy.tokens import Doc

nlp = spacy.load("en_core_web_sm")

def sentiment_analysis(text: str) -> int:
    doc = nlp(text)
    sentiment = 0
    for token in doc:
        if token.has_vector and token.vector_norm > 0:
            sentiment += token.sentiment
    return sentiment

text = "I love chocolate ice cream!"
sentiment = sentiment_analysis(text)
print(sentiment)  # Output: 0.4

This example will calculate the sentiment of the given text by summing the sentiment of each token in the text. The sentiment of a token is calculated based on the token’s embedding vector, which represents the meaning of the token in a high-dimensional space. The sentiment score is between -1 and 1, with negative values indicating negative sentiment and positive values indicating positive sentiment.

In this example, the text “I love chocolate ice cream!” has a positive sentiment because it contains the words “love” and “ice cream”, which have a positive sentiment. The output of the sentiment_analysis function will be a positive value indicating the positive sentiment of the text.

Key Takeaways

Many great libraries do sentiment analysis for you in Python. Although they all have differences, there isn’t just one clear winner. As a result, which library you choose probably depends on what libraries you are already using for your pre-processing and, therefore, what other functionality you are implementing.

Any of the libraries discussed here are a good choice, so you can’t go wrong, but we hope this article helps you decide which is suitable for your application.

Let is know in the comments which one you are going with.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

cloud vs edge computing

NLP And Edge Computing: How It Works & Top 7 Technologies for Offline Computing

In the age of digital transformation, Natural Language Processing (NLP) has emerged as a cornerstone of intelligent applications. From chatbots and voice assistants to...

elastic net vs l1 and l2 regularization

Elastic Net Made Simple & How To Tutorial In Python

What is Elastic Net Regression? Elastic Net regression is a statistical and machine learning technique that combines the strengths of Ridge (L2) and Lasso (L1)...

how recursive feature engineering works

Recursive Feature Elimination (RFE) Made Simple: How To Tutorial

What is Recursive Feature Elimination? In machine learning, data often holds the key to unlocking powerful insights. However, not all data is created equal. Some...

high dimensional dat challenges

How To Handle High-Dimensional Data In Machine Learning [Complete Guide]

What is High-Dimensional Data? High-dimensional data refers to datasets that contain a large number of features or variables relative to the number of observations or...

in-distribution vs out-of-distribution example

Out-of-Distribution In Machine Learning Made Simple & How To Detect It

What is Out-of-Distribution Detection? Out-of-Distribution (OOD) detection refers to identifying data that differs significantly from the distribution on which a...

types of anomalies in LLMs

Anomaly Detection In LLM Responses [How To Monitor & Mitigate]

What is Anomaly Detection in LLMs? Anomaly detection in the context of Large Language Models (LLMs) involves identifying outputs, patterns, or behaviours that deviate...

types of text annotation

Text Annotation Made Simple And 7 Popular Tools

What is Text Annotation? Text annotation is the process of labelling or tagging text data with specific information, making it more understandable and usable for...

average rating by sentiment

How To Process Text In Python With Pandas Made Simple

Introduction Text data is everywhere—from social media posts and customer reviews to emails and product descriptions. For data scientists and analysts, working with...

causes of missing data

Handling Missing Data In Machine Learning: Top 8 Techniques & How To Tutorial In Python

What is Missing Data in Machine Learning? In machine learning, the quality and completeness of data are often just as important as the algorithms and models we choose....

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!