One-shot Learning Explained, How It Works & How To Tutorial In Python

by | Aug 24, 2023 | Data Science, Machine Learning

What is one-shot learning?

One-shot learning is a machine learning paradigm that trains models to recognize new objects or patterns based on a single example or a minimal number of examples. Traditional machine learning algorithms often require large amounts of labelled data for training. However, in many real-world scenarios, obtaining such data can be expensive, time-consuming, or even practically impossible.

One-shot learning aims to bridge this gap by focusing on training models that can generalize from a single instance. The primary challenge in one-shot learning is to design algorithms that can effectively extract relevant features from the limited data available and make accurate predictions or classifications when encountering new, unseen examples.

one-shot learning : predict with only a single example

One-shot learning: generalise from a single example.

We often come across zero-, one- and few-shot learning. Have you ever wondered what the difference is?

What is the difference between zero-shot, one-shot and few-shot learning models?

Zero-, one- and few-shot learning are all techniques designed to address the challenge of learning from limited labelled data. Each approach refers to the number of examples available for training and the level of generalization required. Here’s a breakdown of the differences:

Zero-Shot Learning:

  • Number of Examples: The model is trained with no examples for certain classes.
  • Generalization: The goal is to classify instances from classes not seen during training. The model learns to generalize its knowledge to unseen classes.
  • Common Technique: Transfer learning, semantic embeddings, and attribute-based methods are often used in zero-shot learning. These approaches leverage auxiliary information about classes to perform classification.

One-Shot Learning:

  • Number of Examples: The model is trained with just one example per class in.
  • Generalization: The goal is to learn a representation that can differentiate between classes based on a single example.
  • Common Technique: Siamese, matching, and prototypical networks are often used in one-shot learning. These architectures focus on learning relationships between examples.

Few-Shot Learning:

  • Number of Examples: This is a more general term that encompasses both one-shot learning and scenarios where a small number of examples (more than one) per class are available.
  • Generalization: The goal is to enable the model to classify instances from classes with very few training examples.
  • Common Techniques: These methods include variants of one-shot learning models and meta-learning strategies. These techniques teach the ability to adapt quickly to new tasks or classes.

The key differences lie in the number of examples available for training and the level of generalization required. Zero-shot learning deals with recognizing classes not seen during training, one-shot learning aims to classify with just one example per class, and few-shot learning encompasses scenarios where the training data is limited but consists of more than one example per class. All these techniques are important when collecting abundant labelled data is challenging or not feasible.

Now that we understand the differences let’s get back to one-shot learning.

How does one-shot learning work?

One-shot learning is a machine learning approach that trains models to recognize or classify new objects or patterns based on a single example per class. The primary goal is to enable a model to generalize effectively from a minimal amount of labelled training data. Here’s a high-level overview of how it works:

1. Dataset Preparation:

  • In traditional machine learning, large datasets with multiple examples per class are used for training. However, each class has only one example (or very few) available during training.
  • The dataset is organized into classes, each associated with a single example or a small set of examples.

2. Feature Extraction:

  • The examples in the dataset are typically raw data such as images, text, or audio.
  • Feature extraction is performed to transform the raw data into a format that the model can process. This step is crucial for capturing relevant information from the limited data.

3. Model Architecture:

  • One-shot learning models often use specialized architectures that can effectively extract features and learn relationships between examples.
  • Architectures like Siamese, matching, and prototypical networks are commonly used in one-shot learning tasks. These architectures are designed to work well with small training datasets.

4. Training:

  • During training, the model is exposed to pairs of examples. For each pair, one example is treated as the “query” example, and the other is treated as a “support” or “reference” example.
  • The model learns to differentiate between similar and dissimilar pairs. It learns to embed the examples so that similar examples are close in the embedding space and distinct examples are far apart.

5. Loss Function:

  • The loss function used for training depends on the chosen architecture and the learning objective. For instance, Siamese networks might use a contrastive loss that encourages the embeddings of similar examples to be close and embeddings of dissimilar examples to be distant.

6. Testing and Inference:

  • After training, the model’s performance is evaluated on new, unseen examples.
  • To make predictions, the model typically calculates distances or similarities between the embeddings of the query example and the support examples.
  • The class associated with the closest support example is predicted as the class for the query example.

7. Fine-Tuning and Transfer Learning (Optional):

  • Sometimes, one-shot learning models can benefit from pretraining on a larger dataset and fine-tuning on the one-shot learning dataset. This leverages knowledge gained from the larger dataset to improve generalization.

One-shot learning is instrumental in scenarios where obtaining ample labelled data is challenging. It has applications in various fields, including computer vision, natural language processing, medical imaging, and more, where learning from very few examples is essential. However, due to the limited data, the success of one-shot learning highly depends on the model architecture, data representation, and task complexity.

What are the different approaches to one-shot learning?

There are several approaches to one-shot learning.

  1. Siamese Networks: Siamese networks involve training a neural network to learn a similarity metric between pairs of input examples. This allows the network to distinguish between similar and dissimilar instances, making it suitable for one-shot learning tasks like face recognition.
  2. Matching Networks: Matching networks combine the concepts of attention mechanisms and recurrent networks to make predictions based on a context set of examples. These networks learn to weigh the importance of each sample in the context when making predictions for a new instance.
  3. Prototypical Networks: Prototypical networks learn a prototype representation for each class based on a few examples. During testing, new samples are compared to these prototypes to make predictions.
  4. Memory-Augmented Neural Networks: These networks incorporate external memory structures to store information about seen examples. This allows them to learn and remember patterns even with limited data.
  5. Transfer Learning and Fine-Tuning: Transfer learning involves pre-training a model on a large dataset and then fine-tuning it on the limited one-shot learning dataset. This leverages the knowledge gained from the larger dataset to improve performance on the smaller dataset.

What is one-shot learning used for?

One-shot learning is used in various applications where labelled training data are scarce, making traditional machine learning approaches less effective. It’s advantageous when collecting a large amount of labelled data for each class is impractical, expensive, or time-consuming. Some of the typical applications of one-shot learning include:

  1. Object Recognition: It can be applied to recognize objects or visual patterns from a single example. This is useful in scenarios where you want a model to identify rare or novel things that aren’t present in the training dataset.
  2. Face Recognition: It is relevant in face recognition tasks, where the goal is to identify individuals based on a single image or a few images. This is valuable for security and authentication systems.
  3. Handwriting Recognition: In handwriting recognition, we can create models that recognize characters or words written by different individuals without needing a large corpus of each individual’s handwriting.
  4. Medical Imaging: In medical imaging, obtaining labelled data can be challenging due to privacy concerns and the need for expert annotations. It can help diagnose diseases based on a limited number of medical images.
  5. Species Identification: For identifying plant or animal species, it can be employed when dealing with rare or endangered species, where collecting ample data is difficult.
  6. Rare Event Detection: It can aid in detecting rare events or anomalies in various domains such as surveillance, industrial monitoring, and finance.
  7. Text Categorization: In natural language processing, it can be used to classify text documents with limited labelled examples, enabling models to generalize to new topics.
  8. Art Restoration and Authentication: The model can help identify a specific artist’s work or detect forged artworks.
  9. Language Learning: For language translation, language modelling, and speech recognition, these approaches can be used to generalize from a limited number of examples.
  10. Zero-shot Learning Extension: One-shot learning is also closely related to zero-shot learning, where the model is expected to recognize classes not seen during training. This has applications in various domains, including semantic labelling and cross-modal retrieval.

While one-shot learning offers promising solutions for these applications, it’s important to note that its effectiveness depends on the choice of algorithm, architecture, and the specific problem at hand.

Matching networks

Matching networks are neural network architectures designed for one-shot learning tasks. They were introduced to address the challenge of learning from a few examples, often just one, per class. Matching networks combine the concepts of attention mechanisms and recurrent networks to make predictions based on a context set of examples.

The basic idea behind matching networks is to use an attention mechanism to dynamically weigh the importance of different examples in the context set when predicting a new sample. This enables the model to focus on the most relevant information in the context set while predicting new examples.

Here’s an overview of how matching networks work:

  1. Context Set: The context set consists of a few examples (support set) for each class in the training data. These examples represent the information available for each class during training. Each class has its own context set.
  2. Query: The query is the new example that needs to be classified based on the available context sets. The goal is to predict the class of the query.
  3. Embedding: The context set examples, and the query are embedded using neural networks. These embeddings capture the relevant features of each example.
  4. Attention Mechanism: An attention mechanism is applied to the embedded context set examples concerning the query. This attention mechanism computes weights for each context set example based on its similarity to the query. Examples that are more relevant to the query are assigned higher weights.
  5. Weighted Sum: The embeddings of the context set examples are combined using the attention weights to create a context representation specific to the query.
  6. Classification: The context representation is used to make predictions for the class of the query. This can be done through various methods, such as feeding the context representation through a neural network classifier or computing distances to class prototypes.
  7. Training: The model uses gradient descent and a loss function that encourages correct predictions. During training, the model learns to adjust the attention mechanism and embeddings to utilize the context information for one-shot learning effectively.

Matching networks are versatile and can be applied to various data types, including images, text, and more. They are well-suited for situations where the amount of training data is limited, and a model needs to generalize from a small number of examples.

How to implement one-shot learning in machine learning with Python

Implementations in Python typically involve deep learning frameworks like TensorFlow or PyTorch. Here’s a high-level outline of how you might approach building a simple model using Python and TensorFlow as an example:

  1. Data Preparation: Organize your dataset into classes, each with one or a few examples. You’ll need to structure the data to support one-shot learning, often by creating pairs or sets of examples.
  2. Data Loading: Use data loaders or generators to load the data efficiently. You might need to create pairs or examples to train your model effectively.
  3. Model Architecture: Design your neural network architecture. For example, you might use a Siamese network or a matching network. Define the network using TensorFlow’s or PyTorch’s APIs.
  4. Loss Function: Define the loss function that encourages similar examples to have similar embeddings and dissimilar examples to have dissimilar embeddings. For example, you might use a contrastive loss or a triplet loss.
  5. Training Loop: Implement a training loop where you feed pairs or sets of examples through the network, calculate the loss, and update the model’s weights using backpropagation. Use an optimizer like SGD or Adam.
  6. Evaluation: Evaluate your model’s performance on a validation or test set after training. You might use metrics like accuracy or precision-recall curves.
  7. Inference: Your model’s generalisation ability from a single example is crucial for one-shot learning. Test your model with new examples to see how well it can classify them.

Here’s a simplified example using TensorFlow and a Siamese network:

import tensorflow as tf
from tensorflow.keras.layers import Input, Flatten, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Define the Siamese network architecture
def siamese_network(input_shape):
    input_a = Input(shape=input_shape)
    input_b = Input(shape=input_shape)

    # Shared subnetwork
    shared_network = tf.keras.Sequential([
        Dense(128, activation='relu'),
        Dense(64, activation='relu'),
        Dense(32, activation='relu')

    embedding_a = shared_network(input_a)
    embedding_b = shared_network(input_b)

    return Model(inputs=[input_a, input_b], outputs=[embedding_a, embedding_b])

# Create a Siamese network instance
input_shape = (28, 28)  # Example input shape
siamese_model = siamese_network(input_shape)

# Compile the model
siamese_model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

# Train the model
# ... Load and prepare your data ...[data_a, data_b], labels, epochs=num_epochs, batch_size=batch_size)

# Evaluate the model
# ... Prepare your validation or test set ...
loss = siamese_model.evaluate([val_data_a, val_data_b], val_labels)

# Make predictions for new examples
new_example = ...  # Load your new example
embedding = siamese_model.predict([new_example, existing_example])
# Use the embeddings for classification or similarity comparison

Remember that this is a simplified example, and the actual implementation might require more complexities depending on your specific task and dataset. Additionally, you might need to consider techniques like data augmentation, learning rate scheduling, and regularization to improve your model’s performance.


One-shot learning is a remarkable approach to machine learning that addresses the challenge of recognizing or classifying objects or patterns based on a single example per class. This paradigm has gained significance when acquiring abundant labelled data is impractical or expensive. It has shown promise in various fields by leveraging specialized neural network architectures and innovative training methodologies.

The advent of architectures like Siamese networks, matching networks, and prototypical networks has provided practical tools to tackle one-shot learning tasks. These architectures capture essential relationships between examples and generalize from limited data.

Furthermore, one-shot learning is closely related to zero-shot learning, which aims to classify instances from classes not seen during training. Both approaches aim to extend the model’s capabilities beyond the training data.

However, it’s essential to recognize that while one-shot learning offers exciting opportunities, it still faces challenges related to the complexity of learning from minimal examples and the potential for overfitting. The choice of architecture, appropriate loss functions, and thoughtful data representation remain crucial for achieving success in one-shot learning tasks.

In the broader context, one-shot learning, along with other related techniques such as few-shot learning and zero-shot learning, contributes to advancing the field of machine learning and artificial intelligence, enabling models to exhibit impressive generalization and adaptability even when data is scarce.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Related Articles

Continual learning addresses these challenges by allowing machine learning models to adapt and evolve alongside changing data and tasks.

Continual Learning Made Simple, How To Get Started & Top 4 Models

The need for continual learning In the ever-evolving landscape of machine learning and artificial intelligence, the ability to adapt and learn continuously (continual...

Sequence-to-sequence encoder-decoder architecture

Sequence-to-Sequence Architecture Made Easy & How To Tutorial In Python

What is sequence-to-sequence? Sequence-to-sequence (Seq2Seq) is a deep learning architecture used in natural language processing (NLP) and other sequence modelling...

Cross-entropy can be interpreted as a measure of how well the predicted probability distribution aligns with the true distribution.

Cross-Entropy Loss — Crucial In Machine Learning — Complete Guide & How To Use It

What is cross-entropy loss? Cross-entropy Loss, often called "cross-entropy," is a loss function commonly used in machine learning and deep learning, particularly in...

nlg can generate product descriptions

Natural Language Generation Explained & 2 How To Tutorials In Python

What is natural language generation? Natural Language Generation (NLG) is a subfield of artificial intelligence (AI) and natural language processing (NLP) that focuses...

y_actual - y_predicted

Top 8 Loss Functions Made Simple & How To Implement Them In Python

What are loss functions? Loss functions, also known as a cost or objective functions, are critical component in training machine learning models. It quantifies a...

chatbots are commonly used for Cross-lingual Transfer Learning

How To Implement Cross-lingual Transfer Learning In 5 Different Ways

What is cross-lingual transfer learning? Cross-lingual transfer learning is a machine learning technique that involves transferring knowledge or models from one...

In text labelling and classification, each document or piece of text is assigned to one or more predefined categories or classes

Text Labelling Made Simple With How To Guide & Tools List

What is text labelling? Text labelling, or text annotation or tagging, assigns labels or categories to text data to make it more understandable and usable for various...

Automatically identifying these languages is crucial for search engines, content recommendation systems, and social media platforms.

Language Identification Complete How To Guide In Python [With & Without Libraries]

What is language identification? Language identification is a critical component of Natural Language Processing (NLP), a field dedicated to interacting with computers...

Multilingual NLP is important for an ever globalising world

Multilingual NLP Made Simple — Challenges, Solutions & The Future

Understanding Multilingual NLP In the era of globalization and digital interconnectedness, the ability to understand and process multiple languages is no longer a...


Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2023 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2023. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!