How To Guide To Bias-Variance Trade-Off [2 Examples In Python: Polynomial Regression & SVM]

by | Apr 11, 2023 | Data Science, Machine Learning

What are bias, variance and the bias-variance trade-off?

The bias-variance trade-off is a fundamental concept in supervised machine learning that refers to the trade-off between the error due to bias and the error due to variance in a model.

What is bias?

Bias is the difference between the predicted values by a model and the actual values, and it represents how much a model’s predictions deviate from the correct values. High bias indicates that the model is underfitting the data, meaning it cannot capture the underlying patterns.

What is variance?

Conversely, variance represents the variability of the model’s predictions for different training datasets. High variance indicates that the model is overfitting the data, meaning it memorises the training data and cannot generalise well to new, unseen data.

The trade-off

A good model aims to find a balance between bias and variance. A too-simple model will have a high bias and low variance, while a too-complex model will have a low bias and high variance.

To achieve a good balance between bias and variance, various techniques such as cross-validation, regularisation, and ensemble methods are used to control the complexity of the model and reduce the error due to bias and variance.

High bias, low variance

A model that has high bias and low variance is said to be underfitting the data. This means that the model is too simple and cannot capture the underlying patterns in the data. As a result, the model has high training and testing errors. In other words, the model cannot fit the training data well or generalise to new, unseen data.

Increasing the complexity of the model, adding more features, or using a more robust model may help reduce the error due to bias and improve the model’s performance. Another approach to reducing bias is to improve the quality or quantity of the training data, for example, by collecting more data or improving the data preprocessing steps. However, it’s essential to remember that increasing the model’s complexity may also increase the error due to variance, so finding the right balance is crucial.

Low bias high variance overfitting

A model with low bias and high variance is said to be overfitting the data. This means the model is too complex and memorises the training data instead of learning the underlying patterns. As a result, the model has a low training error but a high testing error. In other words, the model can fit the training data well but not generalise to new, unseen data.

In such cases, reducing the complexity of the model, removing irrelevant features, or using regularisation techniques such as L1/L2 regularisation or dropout may help reduce the error due to variance and improve the model’s performance. Another approach is to increase the training data or use data augmentation techniques to generate more training samples.

It’s important to note that reducing the error due to variance may increase the error due to bias, so finding the right balance between bias and variance is essential to develop a good model. One way to achieve this balance is by using cross-validation techniques to evaluate the model’s performance on multiple data splits and choosing the model that performs well on both the training and testing data.

Bias-variance trade-off in machine learning

In machine learning, the bias-variance trade-off refers to the relationship between the complexity of a model and its ability to fit the data. A model with high bias is too simple and cannot capture the genuine relationship between the input and output variables. On the other hand, a model with high variance is too complex and captures the random noise in the data, resulting in poor generalisation of new data.

Machine learning aims to develop a model that can generalise well to new, unseen data. To achieve this, we need to find a balance between bias and variance. A too-simple model will have a high bias and low variance, while a too-complex model will have a low bias and high variance.

the bias variance trade off is important for all machine learning models including trading algorithms

The bias-variance trade-off is important for all machine learning models, including trading algorithms

We must evaluate the model’s performance on the training and testing data to find the optimal balance between bias and variance. If the model has a high bias, we need to increase its complexity by adding more features, using a more complex algorithm, or increasing the number of iterations. Conversely, if the model has high variance, we must reduce its complexity by removing irrelevant features, using regularisation techniques, or increasing the training data size.

Cross-validation is a helpful technique to evaluate a model’s performance and select the best model to balance bias and variance. By splitting the data into training, validation, and testing sets, we can evaluate the model’s performance on multiple data splits and choose the model that performs well on both the training and testing data.

Bias-variance trade-off example

Let’s consider an example of the bias-variance trade-off in the context of polynomial regression. Suppose we have a set of data points that follow a non-linear relationship and want to fit a model that can capture this relationship. Then, we can use polynomial regression to fit a polynomial function to the data.

If we fit a linear function (i.e., a straight line) to the data, the model will have high bias, as it is too simple to capture the non-linear relationship between the input and output variables. As a result, the model will have high training and testing errors, indicating that it is underfitting the data.

On the other hand, if we fit a high-degree polynomial function to the data, the model will have low bias but high variance. As a result, the model will have a low training error, as it can fit the data very well, but it will have a high testing error, as it overfits the data and fails to generalise to new, unseen data.

We can use cross-validation to evaluate the model’s performance on multiple data splits to find the optimal balance between bias and variance. We can train models with different degrees of polynomial functions and select the model that achieves the best balance between bias and variance, i.e., the model that has the lowest testing error.

For example, if a quadratic polynomial function (i.e., a second-degree polynomial) achieves the best balance between bias and variance. This model is more complex than a linear model, but it is simple enough to overfit the data. By finding the right balance between bias and variance, we can develop a model that can capture the non-linear relationship between the input and output variables and generalise well to new, unseen data.

Bias-variance trade-off Python code example

Here’s an example code snippet in Python to illustrate the bias-variance trade-off using polynomial regression:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate some synthetic data with a non-linear relationship
np.random.seed(0)
x = np.linspace(-5, 5, num=100)
y = x ** 3 + np.random.normal(size=100)

# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

# Fit polynomial regression models with different degrees of polynomials
degrees = [1, 2, 3, 4, 5]
train_errors, test_errors = [], []
for degree in degrees:
    # Transform the features to polynomial features
    poly_features = PolynomialFeatures(degree=degree)
    x_poly_train = poly_features.fit_transform(x_train.reshape(-1, 1))
    x_poly_test = poly_features.transform(x_test.reshape(-1, 1))

    # Fit the linear regression model to the polynomial features
    model = LinearRegression()
    model.fit(x_poly_train, y_train)

    # Evaluate the model on the training and testing data
    y_pred_train = model.predict(x_poly_train)
    y_pred_test = model.predict(x_poly_test)
    train_error = mean_squared_error(y_train, y_pred_train)
    test_error = mean_squared_error(y_test, y_pred_test)
    train_errors.append(train_error)
    test_errors.append(test_error)

# Plot the training and testing errors as a function of the degree of polynomial
import matplotlib.pyplot as plt
plt.plot(degrees, train_errors, label='Training error')
plt.plot(degrees, test_errors, label='Testing error')
plt.legend()
plt.xlabel('Degree of polynomial')
plt.ylabel('Mean squared error')
plt.show()

In this code, we generate synthetic data with a non-linear relationship and split it into training and testing sets. Then, we fit polynomial regression models with different degrees of polynomials and evaluate their performance on the training and testing data. Finally, we plot the training and testing errors as a function of the degree of polynomial to visualise the bias-variance trade-off.

degree of polynomial
degree of polynomials to determine the bias-variance trade-off

Observing the plot shows that the training error decreases as the degree of polynomial increases, indicating that the model becomes more complex and better fits the training data. 

Due to the scale on the graph, we can’t see that the test error starts to increase again as the polynomial increases. When we print the test error, we can observe this phenomenon:

print(test_errors)
367.3606600042872
367.89470510195736
0.8264371039076602
0.8460879311084801
0.8399674514960231

The testing error decreases and increases as the degree of polynomial increases, indicating that the model first achieves a good balance between bias and variance and then overfits the data. In this case, we might choose a second-degree polynomial as it reaches the best balance between bias and variance.

Bias-variance trade-off SVM

Support Vector Machines (SVM) is a robust machine learning algorithm that can be used for classification and regression tasks. The bias-variance trade-off is also applicable to SVMs.

In SVMs, the trade-off between bias and variance is controlled by choice of the regularisation parameter C and the kernel function. The regularisation parameter C controls the penalty for misclassifying points in the training data. A higher value of C leads to a more complex model that can fit the training data better but may overfit. Conversely, a lower value of C leads to a simpler model with a higher bias but lower variance.

The choice of the kernel function also affects the bias-variance trade-off in SVMs. The linear kernel is a simple, low-variance option that works well when the data is linearly separable. On the other hand, non-linear kernels such as the polynomial or Gaussian (RBF) kernels are more complex and can capture more complex patterns in the data. However, they may lead to overfitting and higher variance if the regularisation parameter is not chosen carefully.

To find the optimal value of the regularisation parameter C and the kernel function, cross-validation is used to evaluate the model’s performance on multiple data splits. This helps select the hyperparameters that best balance bias and variance.

The bias-variance trade-off must be remembered when working with SVMs or other machine learning algorithms. Choosing an appropriate regularisation parameter and kernel function can help to strike a good balance between underfitting and overfitting and lead to models that generalise well to new data.

Bias-variance trade-off in SVMs Python code example

Here’s an example code snippet in Python using scikit-learn to demonstrate the bias-variance trade-off in SVMs:

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import make_classification

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a range of values for the regularization parameter C to search over
param_grid = {'C': [0.01, 0.1, 1, 10, 100]}

# Create a grid search object to search over hyperparameters
grid_search = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=5)

# Fit the grid search object to the training data
grid_search.fit(X_train, y_train)

# Print the best hyperparameters found by the grid search
print("Best hyperparameters:", grid_search.best_params_)

# Evaluate the model on the test data
svm = grid_search.best_estimator_
print("Test accuracy:", svm.score(X_test, y_test))

We first generate a synthetic dataset with 1000 samples and 10 features here. Then we split the data into training and test sets using an 80/20 split.

Next, we define a range of values for the regularisation parameter C to search over using a grid search object. Finally, we use 5-fold cross-validation to evaluate the performance of each combination of hyperparameters.

We fit the grid search object to the training data and then print the best hyperparameters found by the grid search. We then evaluate the performance of the SVM model with the best hyperparameters on the test data using the scoring method.

We can find an optimal balance between bias and variance in the SVM model by searching over a range of values for the regularisation parameter C and selecting the best hyperparameters based on cross-validation performance. This approach can help prevent overfitting and ensure the model generalises well to new, unseen data.

Conclusion

The bias-variance trade-off is a fundamental concept in machine learning that refers to the trade-off between the ability of a model to fit the training data (low bias) and its ability to generalise to new, unseen data (low variance). A model with high bias is too simple to capture the underlying pattern in the data, while a model with high variance needs to be more complex and overfit the data.

The goal of a machine learning practitioner is to find the optimal balance between bias and variance by choosing an appropriate model complexity and regularisation technique. Then, cross-validation can be used to evaluate the model’s performance on multiple data splits and select the model that achieves the best balance between bias and variance.

By understanding the bias-variance trade-off, machine learning practitioners can develop models that can capture the underlying pattern in the data and generalise well to new, unseen data.

Now that you understand the bias-variance trade-off, make sure you also understand endogenous and exogenous variables and the problems they cause.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

glove vector example "king" is to "queen" as "man" is to "woman"

Text Representation: A Simple Explanation Of Complex Techniques

What is Text Representation? Text representation refers to how text data is structured and encoded so that machines can process and understand it. Human language is...

wavelet transform: a wave vs a wavelet

Wavelet Transform Made Simple [Foundation, Applications, Advantages]

Introduction to Wavelet Transform What is Signal Processing? Signal processing is critical in various fields, from telecommunications to medical diagnostics and...

ROC curve

Precision And Recall In Machine Learning Made Simple: How To Handle The Trade-off

What is Precision and Recall? When evaluating a classification model's performance, it's crucial to understand its effectiveness at making predictions. Two essential...

Confusion matrix explained

Confusion Matrix: A Beginners Guide & How To Tutorial In Python

What is a Confusion Matrix? A confusion matrix is a fundamental tool used in machine learning and statistics to evaluate the performance of a classification model. At...

ordinary least square is a linear relationship

Understand Ordinary Least Squares: How To Beginner’s Guide [Tutorials In Python, R & Excell]

What is Ordinary Least Squares (OLS)? Ordinary Least Squares (OLS) is a fundamental technique in statistics and econometrics used to estimate the parameters of a linear...

how does METEOR work

METEOR Metric In NLP: How It Works & How To Tutorial In Python

What is the METEOR Score? The METEOR score, which stands for Metric for Evaluation of Translation with Explicit ORdering, is a metric designed to evaluate the text...

glove vector example "king" is to "queen" as "man" is to "woman"

BERTScore – A Powerful NLP Evaluation Metric Explained & How To Tutorial In Python

What is BERTScore? BERTScore is an innovative evaluation metric in natural language processing (NLP) that leverages the power of BERT (Bidirectional Encoder...

Perplexity in NLP explained

Perplexity In NLP: Understand How To Evaluate LLMs [Practical Guide]

Introduction to Perplexity in NLP In the rapidly evolving field of Natural Language Processing (NLP), evaluating the effectiveness of language models is crucial. One of...

BLEU Score In NLP: What Is It & How To Implement In Python

What is the BLEU Score in NLP? BLEU, Bilingual Evaluation Understudy, is a metric used to evaluate the quality of machine-generated text in NLP, most commonly in...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!