The Curse Of Dimensionality, When It Occurs And How To Overcome It

by | Nov 29, 2022 | Artificial Intelligence, Data Science, Machine Learning, Natural Language Processing

What is the curse of dimensionality?

When dealing with high-dimensional data, several issues are known as the “Curse of Dimensionality.” A dataset’s quantity of attributes or features is called the dataset’s dimension. High-dimensional data refers to a dataset with many details, typically on the order of 100 or more. The problem with high-dimensional data is that it’s hard to draw the correct conclusions. As the number of dimensions increases, confusing noise for real correlations becomes easier because the data error increases.

A typical example of high-dimensional data is text data. We give each word a number when converting text to a numerical representation (or a vector). See our article on tf-idf as to how to do this. As a result, every word becomes a feature. We quickly have over one hundred features, so we must deal with high-dimensional data. As a result, we need to be aware of the curse of dimensionality before further processing or making any form of decisions based on the data.

curse of dimensionality

High-dimensional data typically has more than a hundred data points.

The problem with high dimensional data

High-dimensional data presents several challenges when analyzing or visualizing the data to find patterns and develop machine learning models.

The “curse of dimensionality” says that:

The error grows with the number of features.

This alludes to the fact that high-dimensional machine learning algorithms are more challenging to design as patterns in the data are hard to distinguish from the noise in the data.

These algorithms frequently have running times that are exponentially related to the dimensions.

What domains are affected by the curse of dimensionality?

There are a lot of domains directly affected by the curse of dimensionality. For example, any field with data with many attributes would face this issue.

Natural Language Processing (NLP)

When working with textual data, we often turn text into vectors to get numerical input that can then be passed to a machine learning model. Unfortunately, turning text into numbers results in sparse datasets with complicated patterns. This results in the “curse of dimensionality”, which is an issue in most NLP solutions. To combat this, we often spend more time on feature engineering to reduce the number of features or use more data to increase the size of the data set.

Anomaly Detection

Finding unexpected elements or events in a dataset requires anomaly detection. Anomalies in high-dimensional data frequently display numerous attributes unrelated to their actual nature.

For example, network traffic is monitored for threats and unusual activity in cyber security. But with so much activity originating from so many different sources, it is hard to distinguish “normal” activity from a threat to the system.

Machine Learning

To maintain the same level of performance in a machine learning model, a slight increase in dimensionality necessitates a significant increase in data volume. The opposite is also true. If we can reduce the number of features in our data set, we need to train our models on much fewer data. So when working on feature selection, it is crucial to stick to many features that don’t lead to the curse of dimensionality.

How to combat the curse of dimensionality?

Dimensionality reduction allows us to cut the number of features and, therefore, also solve the curse of dimensionality. Dimensionality reduction transforms a high-dimensional space into a lower-dimensional space without changing its properties. As a result, this process reduces the number of input variables in a dataset. This process removes the additional variables making it easy for analysts to analyze the data, which helps algorithms produce faster and better results.

feature selection

Dimensionality reduction selects which features to keep and which to discard.

Many dimensionality reduction algorithms broadly fall into two categories: “feature selection” or “feature extraction” techniques.

Feature selection techniques

In feature selection techniques, the attributes are tested to determine their value before being chosen or rejected. The methods for feature selection that are most frequently used are discussed below.

Low Variance Filter

This method disregards attributes with a very low variance after comparing the variance in the dataset’s distribution of all the features. As a result, fewer variable attributes will be assumed to have a nearly constant value and will not improve the model’s predictability.

High Correlation Filter

The pair-wise correlation between attributes is found using this method. One feature is dropped in the pairs with a very high correlation while the other is kept. As a result, the retained feature captures the variation in the eliminated attributes.

Multicollinearity

If each attribute is regressed as a function of the others, we may see that the others entirely capture the variability of some features. Sometimes, a high correlation may not be found for pairs of attributes. Multicollinearity is the term for this feature, and the variance inflation factor (VIF) is widely used to identify multicollinearity. High VIF values—generally greater than 10—eliminate attributes.

Feature Ranking

The attributes can be ranked according to their significance or contribution to the model’s predictability using decision tree models like CART. Some lower-rated variables in high-dimensional data may be removed to reduce the dimensions.

Feature Extraction Techniques 

The high dimensional attributes are combined into low dimensional components (PCA or ICA) or factored into low dimensional factors in feature extraction techniques (FA).

Principal Component Analysis (PCA)

A dimensionality-reduction technique known as principal component analysis (PCA) transforms highly correlated, high-dimensional data into a set of uncorrelated, lower-dimensional components known as principal components. The lower-dimensional principal components capture the majority of the data in the high-dimensional dataset. A subset of these principal components is chosen based on the percentage of variance in the data intended to be captured through the principle components after n-dimensional data is transformed into n-principal components. A straightforward example of transforming 10-dimensional data into 10 principal components is when only 3 principal components are required to account for 90% of the variance in the data. As a result, it is possible to condense a 10-dimensional dataset into just 3.

Factor Analysis (FA)

A dataset’s observed attributes are all assumed to be able to be represented as a weighted linear combination of latent factors in factor analysis. This method’s underlying premise is that n dimensions of data can be represented by m factors (mn). The primary distinction between PCA and FA is that, whereas PCA builds components from the fundamental attributes, FA breaks down the attributes into latent factors.

Independent Component Analysis (ICA)

ICA resolves the variables into a combination of these independent components. It does this under the assumption that all attributes are a mixture of separate components. ICA is typically used when PCA and FA fail because it is thought to be more reliable than PCA.

Key Takeaways – Curse of Dimensionality

  • When dealing with high-dimensional data, there are several issues known as the “Curse of Dimensionality.” First, the error grows with the number of features. Second, high-dimensional machine learning algorithms are more challenging to design because patterns are hard to distinguish from the noise in the data.
  • Natural language processing, abnormality detection, and more general machine learning problems are the three main areas that are affected by the curse of dimensionality.
  • Dimensionality reduction is transforming a high-dimensional data set into a lower-dimensional one. Dimensionality reduction reduces the number of input variables in a dataset. This makes it easier for analysts to analyze and more intuitive for algorithms. In addition, the lack of additional variables makes analysis faster and more effective.
  • Two types of dimensionality reduction algorithms exist: “feature selection” and “feature extraction.” The main feature selection algorithms are; low variance filter, high correlation filter, Multicollinearity and feature ranking. The most prominent feature extraction algorithms are; Principal Component Analysis (PCA), Factor Analysis (FA) and Independent Component Analysis (ICA).
  • Feature selection is the process of testing attributes to determine their value before they are chosen or rejected. In contrast, feature extraction techniques focus on combining multiple different features back into a different, more rich set of more minor features.

The curse of dimensionality at Spot Intelligence

The curse of dimensionality is a genuine problem that needs to be carefully considered when developing machine learning models or doing an analysis. Without it, you can find all sorts of correlations in your data that aren’t significant or representative of your data. This will lead to inaccurate results or decisions being made on incorrect analysis.

At Spot Intelligence, we process text and use many natural language processing techniques. As we often transform text into vectors, we create a lot of high-dimensional data. This high-dimensional data suffers from the “curse of dimensionality.” So we, too, must be very careful when processing our data.

A good pre-processing pipeline that optimizes the number of features for a given problem and data set helps us manage this problem. Working with data that has its feature space reduced helps remove the noise in our predictions and extractions. However, we must also be careful with what we remove, as we don’t want to remove those features with predictive power.

Have you faced the curse of dimensionality in your projects? Have you heard of the curse of variability? What are your favourite techniques to combat the problem? We would love to hear about them in the comment section below.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

glove vector example "king" is to "queen" as "man" is to "woman"

Text Representation: A Simple Explanation Of Complex Techniques

What is Text Representation? Text representation refers to how text data is structured and encoded so that machines can process and understand it. Human language is...

wavelet transform: a wave vs a wavelet

Wavelet Transform Made Simple [Foundation, Applications, Advantages]

Introduction to Wavelet Transform What is Signal Processing? Signal processing is critical in various fields, from telecommunications to medical diagnostics and...

ROC curve

Precision And Recall In Machine Learning Made Simple: How To Handle The Trade-off

What is Precision and Recall? When evaluating a classification model's performance, it's crucial to understand its effectiveness at making predictions. Two essential...

Confusion matrix explained

Confusion Matrix: A Beginners Guide & How To Tutorial In Python

What is a Confusion Matrix? A confusion matrix is a fundamental tool used in machine learning and statistics to evaluate the performance of a classification model. At...

ordinary least square is a linear relationship

Understand Ordinary Least Squares: How To Beginner’s Guide [Tutorials In Python, R & Excell]

What is Ordinary Least Squares (OLS)? Ordinary Least Squares (OLS) is a fundamental technique in statistics and econometrics used to estimate the parameters of a linear...

how does METEOR work

METEOR Metric In NLP: How It Works & How To Tutorial In Python

What is the METEOR Score? The METEOR score, which stands for Metric for Evaluation of Translation with Explicit ORdering, is a metric designed to evaluate the text...

glove vector example "king" is to "queen" as "man" is to "woman"

BERTScore – A Powerful NLP Evaluation Metric Explained & How To Tutorial In Python

What is BERTScore? BERTScore is an innovative evaluation metric in natural language processing (NLP) that leverages the power of BERT (Bidirectional Encoder...

Perplexity in NLP explained

Perplexity In NLP: Understand How To Evaluate LLMs [Practical Guide]

Introduction to Perplexity in NLP In the rapidly evolving field of Natural Language Processing (NLP), evaluating the effectiveness of language models is crucial. One of...

BLEU Score In NLP: What Is It & How To Implement In Python

What is the BLEU Score in NLP? BLEU, Bilingual Evaluation Understudy, is a metric used to evaluate the quality of machine-generated text in NLP, most commonly in...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!