One-shot Learning Explained, How It Works & How To Tutorial In Python

by | Aug 24, 2023 | Data Science, Machine Learning

What is one-shot learning?

One-shot learning is a machine learning paradigm that trains models to recognize new objects or patterns based on a single example or a minimal number of examples. Traditional machine learning algorithms often require large amounts of labelled data for training. However, in many real-world scenarios, obtaining such data can be expensive, time-consuming, or even practically impossible.

One-shot learning aims to bridge this gap by focusing on training models that can generalize from a single instance. The primary challenge in one-shot learning is to design algorithms that can effectively extract relevant features from the limited data available and make accurate predictions or classifications when encountering new, unseen examples.

one-shot learning : predict with only a single example

One-shot learning: generalise from a single example.

We often come across zero-, one- and few-shot learning. Have you ever wondered what the difference is?

What is the difference between zero-shot, one-shot and few-shot learning models?

Zero-, one- and few-shot learning are all techniques designed to address the challenge of learning from limited labelled data. Each approach refers to the number of examples available for training and the level of generalization required. Here’s a breakdown of the differences:

Zero-Shot Learning:

  • Number of Examples: The model is trained with no examples for certain classes.
  • Generalization: The goal is to classify instances from classes not seen during training. The model learns to generalize its knowledge to unseen classes.
  • Common Technique: Transfer learning, semantic embeddings, and attribute-based methods are often used in zero-shot learning. These approaches leverage auxiliary information about classes to perform classification.

One-Shot Learning:

  • Number of Examples: The model is trained with just one example per class in.
  • Generalization: The goal is to learn a representation that can differentiate between classes based on a single example.
  • Common Technique: Siamese, matching, and prototypical networks are often used in one-shot learning. These architectures focus on learning relationships between examples.

Few-Shot Learning:

  • Number of Examples: This is a more general term that encompasses both one-shot learning and scenarios where a small number of examples (more than one) per class are available.
  • Generalization: The goal is to enable the model to classify instances from classes with very few training examples.
  • Common Techniques: These methods include variants of one-shot learning models and meta-learning strategies. These techniques teach the ability to adapt quickly to new tasks or classes.

The key differences lie in the number of examples available for training and the level of generalization required. Zero-shot learning deals with recognizing classes not seen during training, one-shot learning aims to classify with just one example per class, and few-shot learning encompasses scenarios where the training data is limited but consists of more than one example per class. All these techniques are important when collecting abundant labelled data is challenging or not feasible.

Now that we understand the differences let’s get back to one-shot learning.

How does one-shot learning work?

One-shot learning is a machine learning approach that trains models to recognize or classify new objects or patterns based on a single example per class. The primary goal is to enable a model to generalize effectively from a minimal amount of labelled training data. Here’s a high-level overview of how it works:

1. Dataset Preparation:

  • In traditional machine learning, large datasets with multiple examples per class are used for training. However, each class has only one example (or very few) available during training.
  • The dataset is organized into classes, each associated with a single example or a small set of examples.

2. Feature Extraction:

  • The examples in the dataset are typically raw data such as images, text, or audio.
  • Feature extraction is performed to transform the raw data into a format that the model can process. This step is crucial for capturing relevant information from the limited data.

3. Model Architecture:

  • One-shot learning models often use specialized architectures that can effectively extract features and learn relationships between examples.
  • Architectures like Siamese, matching, and prototypical networks are commonly used in one-shot learning tasks. These architectures are designed to work well with small training datasets.

4. Training:

  • During training, the model is exposed to pairs of examples. For each pair, one example is treated as the “query” example, and the other is treated as a “support” or “reference” example.
  • The model learns to differentiate between similar and dissimilar pairs. It learns to embed the examples so that similar examples are close in the embedding space and distinct examples are far apart.

5. Loss Function:

  • The loss function used for training depends on the chosen architecture and the learning objective. For instance, Siamese networks might use a contrastive loss that encourages the embeddings of similar examples to be close and embeddings of dissimilar examples to be distant.

6. Testing and Inference:

  • After training, the model’s performance is evaluated on new, unseen examples.
  • To make predictions, the model typically calculates distances or similarities between the embeddings of the query example and the support examples.
  • The class associated with the closest support example is predicted as the class for the query example.

7. Fine-Tuning and Transfer Learning (Optional):

  • Sometimes, one-shot learning models can benefit from pretraining on a larger dataset and fine-tuning on the one-shot learning dataset. This leverages knowledge gained from the larger dataset to improve generalization.

One-shot learning is instrumental in scenarios where obtaining ample labelled data is challenging. It has applications in various fields, including computer vision, natural language processing, medical imaging, and more, where learning from very few examples is essential. However, due to the limited data, the success of one-shot learning highly depends on the model architecture, data representation, and task complexity.

What are the different approaches to one-shot learning?

There are several approaches to one-shot learning.

  1. Siamese Networks: Siamese networks involve training a neural network to learn a similarity metric between pairs of input examples. This allows the network to distinguish between similar and dissimilar instances, making it suitable for one-shot learning tasks like face recognition.
  2. Matching Networks: Matching networks combine the concepts of attention mechanisms and recurrent networks to make predictions based on a context set of examples. These networks learn to weigh the importance of each sample in the context when making predictions for a new instance.
  3. Prototypical Networks: Prototypical networks learn a prototype representation for each class based on a few examples. During testing, new samples are compared to these prototypes to make predictions.
  4. Memory-Augmented Neural Networks: These networks incorporate external memory structures to store information about seen examples. This allows them to learn and remember patterns even with limited data.
  5. Transfer Learning and Fine-Tuning: Transfer learning involves pre-training a model on a large dataset and then fine-tuning it on the limited one-shot learning dataset. This leverages the knowledge gained from the larger dataset to improve performance on the smaller dataset.

What is one-shot learning used for?

One-shot learning is used in various applications where labelled training data are scarce, making traditional machine learning approaches less effective. It’s advantageous when collecting a large amount of labelled data for each class is impractical, expensive, or time-consuming. Some of the typical applications of one-shot learning include:

  1. Object Recognition: It can be applied to recognize objects or visual patterns from a single example. This is useful in scenarios where you want a model to identify rare or novel things that aren’t present in the training dataset.
  2. Face Recognition: It is relevant in face recognition tasks, where the goal is to identify individuals based on a single image or a few images. This is valuable for security and authentication systems.
  3. Handwriting Recognition: In handwriting recognition, we can create models that recognize characters or words written by different individuals without needing a large corpus of each individual’s handwriting.
  4. Medical Imaging: In medical imaging, obtaining labelled data can be challenging due to privacy concerns and the need for expert annotations. It can help diagnose diseases based on a limited number of medical images.
  5. Species Identification: For identifying plant or animal species, it can be employed when dealing with rare or endangered species, where collecting ample data is difficult.
  6. Rare Event Detection: It can aid in detecting rare events or anomalies in various domains such as surveillance, industrial monitoring, and finance.
  7. Text Categorization: In natural language processing, it can be used to classify text documents with limited labelled examples, enabling models to generalize to new topics.
  8. Art Restoration and Authentication: The model can help identify a specific artist’s work or detect forged artworks.
  9. Language Learning: For language translation, language modelling, and speech recognition, these approaches can be used to generalize from a limited number of examples.
  10. Zero-shot Learning Extension: One-shot learning is also closely related to zero-shot learning, where the model is expected to recognize classes not seen during training. This has applications in various domains, including semantic labelling and cross-modal retrieval.

While one-shot learning offers promising solutions for these applications, it’s important to note that its effectiveness depends on the choice of algorithm, architecture, and the specific problem at hand.

Matching networks

Matching networks are neural network architectures designed for one-shot learning tasks. They were introduced to address the challenge of learning from a few examples, often just one, per class. Matching networks combine the concepts of attention mechanisms and recurrent networks to make predictions based on a context set of examples.

The basic idea behind matching networks is to use an attention mechanism to dynamically weigh the importance of different examples in the context set when predicting a new sample. This enables the model to focus on the most relevant information in the context set while predicting new examples.

Here’s an overview of how matching networks work:

  1. Context Set: The context set consists of a few examples (support set) for each class in the training data. These examples represent the information available for each class during training. Each class has its own context set.
  2. Query: The query is the new example that needs to be classified based on the available context sets. The goal is to predict the class of the query.
  3. Embedding: The context set examples, and the query are embedded using neural networks. These embeddings capture the relevant features of each example.
  4. Attention Mechanism: An attention mechanism is applied to the embedded context set examples concerning the query. This attention mechanism computes weights for each context set example based on its similarity to the query. Examples that are more relevant to the query are assigned higher weights.
  5. Weighted Sum: The embeddings of the context set examples are combined using the attention weights to create a context representation specific to the query.
  6. Classification: The context representation is used to make predictions for the class of the query. This can be done through various methods, such as feeding the context representation through a neural network classifier or computing distances to class prototypes.
  7. Training: The model uses gradient descent and a loss function that encourages correct predictions. During training, the model learns to adjust the attention mechanism and embeddings to utilize the context information for one-shot learning effectively.

Matching networks are versatile and can be applied to various data types, including images, text, and more. They are well-suited for situations where the amount of training data is limited, and a model needs to generalize from a small number of examples.

How to implement one-shot learning in machine learning with Python

Implementations in Python typically involve deep learning frameworks like TensorFlow or PyTorch. Here’s a high-level outline of how you might approach building a simple model using Python and TensorFlow as an example:

  1. Data Preparation: Organize your dataset into classes, each with one or a few examples. You’ll need to structure the data to support one-shot learning, often by creating pairs or sets of examples.
  2. Data Loading: Use data loaders or generators to load the data efficiently. You might need to create pairs or examples to train your model effectively.
  3. Model Architecture: Design your neural network architecture. For example, you might use a Siamese network or a matching network. Define the network using TensorFlow’s or PyTorch’s APIs.
  4. Loss Function: Define the loss function that encourages similar examples to have similar embeddings and dissimilar examples to have dissimilar embeddings. For example, you might use a contrastive loss or a triplet loss.
  5. Training Loop: Implement a training loop where you feed pairs or sets of examples through the network, calculate the loss, and update the model’s weights using backpropagation. Use an optimizer like SGD or Adam.
  6. Evaluation: Evaluate your model’s performance on a validation or test set after training. You might use metrics like accuracy or precision-recall curves.
  7. Inference: Your model’s generalisation ability from a single example is crucial for one-shot learning. Test your model with new examples to see how well it can classify them.

Here’s a simplified example using TensorFlow and a Siamese network:

import tensorflow as tf
from tensorflow.keras.layers import Input, Flatten, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Define the Siamese network architecture
def siamese_network(input_shape):
    input_a = Input(shape=input_shape)
    input_b = Input(shape=input_shape)

    # Shared subnetwork
    shared_network = tf.keras.Sequential([
        Dense(128, activation='relu'),
        Dense(64, activation='relu'),
        Dense(32, activation='relu')

    embedding_a = shared_network(input_a)
    embedding_b = shared_network(input_b)

    return Model(inputs=[input_a, input_b], outputs=[embedding_a, embedding_b])

# Create a Siamese network instance
input_shape = (28, 28)  # Example input shape
siamese_model = siamese_network(input_shape)

# Compile the model
siamese_model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

# Train the model
# ... Load and prepare your data ...[data_a, data_b], labels, epochs=num_epochs, batch_size=batch_size)

# Evaluate the model
# ... Prepare your validation or test set ...
loss = siamese_model.evaluate([val_data_a, val_data_b], val_labels)

# Make predictions for new examples
new_example = ...  # Load your new example
embedding = siamese_model.predict([new_example, existing_example])
# Use the embeddings for classification or similarity comparison

Remember that this is a simplified example, and the actual implementation might require more complexities depending on your specific task and dataset. Additionally, you might need to consider techniques like data augmentation, learning rate scheduling, and regularization to improve your model’s performance.


One-shot learning is a remarkable approach to machine learning that addresses the challenge of recognizing or classifying objects or patterns based on a single example per class. This paradigm has gained significance when acquiring abundant labelled data is impractical or expensive. It has shown promise in various fields by leveraging specialized neural network architectures and innovative training methodologies.

The advent of architectures like Siamese networks, matching networks, and prototypical networks has provided practical tools to tackle one-shot learning tasks. These architectures capture essential relationships between examples and generalize from limited data.

Furthermore, one-shot learning is closely related to zero-shot learning, which aims to classify instances from classes not seen during training. Both approaches aim to extend the model’s capabilities beyond the training data.

However, it’s essential to recognize that while one-shot learning offers exciting opportunities, it still faces challenges related to the complexity of learning from minimal examples and the potential for overfitting. The choice of architecture, appropriate loss functions, and thoughtful data representation remain crucial for achieving success in one-shot learning tasks.

In the broader context, one-shot learning, along with other related techniques such as few-shot learning and zero-shot learning, contributes to advancing the field of machine learning and artificial intelligence, enabling models to exhibit impressive generalization and adaptability even when data is scarce.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

online machine learning process

Online Machine Learning Explained & How To Build A Powerful Adaptive Model

What is Online Machine Learning? Online machine learning, also known as incremental or streaming learning, is a type of machine learning in which models are updated...

data drift in machine learning over time

Data Drift In Machine Learning Explained: How To Detect & Mitigate It

What is Data Drift Machine Learning? In machine learning, the accuracy and effectiveness of models heavily rely on the quality and consistency of the data on which they...

precision and recall explained

Classification Metrics In Machine Learning Explained & How To Tutorial In Python

What are Classification Metrics in Machine Learning? In machine learning, classification tasks are omnipresent. From spam detection in emails to medical diagnosis and...

example of a co-occurance matrix for NLP

Co-occurrence Matrices Explained: How To Use Them In NLP, Computer Vision & Recommendation Systems [6 Tools]

What are Co-occurrence Matrices? Co-occurrence matrices serve as a fundamental tool across various disciplines, unveiling intricate statistical relationships hidden...

use cases of query understanding

Query Understanding In NLP Simplified & How It Works [5 Techniques]

What is Query Understanding? Understanding user queries lies at the heart of efficient communication between humans and machines in the vast digital information and...

distributional semantics example

Distributional Semantics Simplified & 7 Techniques [How To Understand Language]

What is Distributional Semantics? Understanding the meaning of words has always been a fundamental challenge in natural language processing (NLP). How do we decipher...

4 common regression metrics

10 Regression Metrics For Machine Learning & Practical How To Guide

What are Evaluation Metrics for Regression Models? Regression analysis is a fundamental tool in statistics and machine learning used to model the relationship between a...

find the right document

Natural Language Search Explained [10 Powerful Tools & How To Tutorial In Python]

What is Natural Language Search? Natural language search refers to the capability of search engines and other information retrieval systems to understand and interpret...

the difference between bagging, boosting and stacking

Bagging, Boosting & Stacking Made Simple [3 How To Tutorials In Python]

What is Bagging, Boosting and Stacking? Bagging, boosting and stacking represent three distinct ensemble learning techniques used to enhance the performance of machine...


Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!