This is a complete guide on utilising NLTK to build a whole preprocessing pipeline. Take the time to read through the different components so you know how to start building your pipeline. What is an...

This is a complete guide on utilising NLTK to build a whole preprocessing pipeline. Take the time to read through the different components so you know how to start building your pipeline. What is an...
In this guide, we cover how to start with the bag-of-words technique in Python. We first cover what a bag-of-words approach is and provide an example. We then cover the advantages and disadvantages...
Text classification is an important natural language processing (NLP) technique that allows us to turn unstructured data into structured data; many different algorithms allow you to do this, and so...
Text similarity is a really useful natural language processing (NLP) tool. It allows you to find similar pieces of text and has many real-world use cases. This article discusses text similarity, its...
What is text generation in NLP? Text generation is a subfield of natural language processing (NLP) that deals with generating text automatically. It has a wide range of applications, including...
This guide covers how to translate text in Python. Machine translation is a prominent natural language processing (NLP) application that is not very straightforward. We start by covering what is...
What is sentence embedding? Sentence embedding is a technique for representing a natural language sentence as a fixed-length numerical vector. The goal is to encode the semantic meaning and content...
Several powerful libraries and frameworks in Python can be used for sentiment analysis. These libraries will be covered below. The code examples of using the various libraries will be covered at the...
What is topic modelling? Topic modelling is a technique used in natural language processing (NLP) to automatically identify and group similar words or phrases in a text. This lets us figure out the...
What is stemming? Stemming is the process of reducing a word to its base or root form. For example, the stem of the word "running" is "run," and the stem of the word "swimming" is "swim." Stemming...
What is Keyword extraction? Keyword extraction is figuring out which words and phrases in a piece of text are the most important. These keywords can be used to summarise the content of the text. A...
What is a self-learning system? A self-learning system is a type of artificial intelligence (AI) system that is able to improve its performance over time. In essence, it can do this without the need...
What is stop word removal? Stop words are commonly used words that have very little meaning, such as "a," "an," "the," or "in." Stopwords are typically excluded from natural language processing...
Lemmatization is the conversion of a word to its base form or lemma. This differs from stemming, which takes a word down to its root form by removing its prefixes and suffixes. Lemmatization, on the...
Tokenization is a process in natural language processing (NLP) where a piece of text is split into smaller units called tokens. This is important for a lot of NLP tasks because it lets the model...
What is Named Entity Recognition (NER)? Named entity recognition (NER) is a part of natural language processing (NLP) that involves finding and classifying named entities in text. Named entities are...
Text summarization is so prominent in natural language processing (NLP) that it made our top ten list of NLP techniques to know. Natural Language Processing (NLP) text summarization has a sizable...
Word embedding is used in natural language processing (NLP) to describe how words are represented for text analysis. Typically, this representation takes the form of a real-valued vector that...
Get a FREE PDF with expert predictions for 2025. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?
Find out this and more by subscribing* to our NLP newsletter.