Simulated Annealing Made Simple & How To Tutorial In Python

by | Oct 22, 2025 | Data Science, Machine Learning

Introduction: The Search for the Best Solution

Imagine you’re trying to find the fastest route through a city with hundreds of streets, or the optimal design for a complex circuit. The number of possible solutions can be staggering, and finding the absolute best one can feel like searching for a needle in an enormous haystack. This is the world of optimization problems — problems where the goal is to find the best solution according to some criteria.

Traditional methods, such as brute-force search, quickly become impractical as the number of possibilities increases. Even sophisticated algorithms like gradient descent can become stuck in local minima, where a solution appears optimal in its immediate neighbourhood but is far from the global optimum.

This is where Simulated Annealing (SA) comes in. Inspired by the way metals are heated and slowly cooled to form stable structures, SA is a clever approach that balances exploration and exploitation. It doesn’t just settle for the first promising solution it finds; it sometimes accepts worse solutions temporarily, giving it the ability to escape local traps and steadily move toward a globally optimal or near-optimal solution.

In this post, we’ll explore the intuition behind Simulated Annealing, break down how it works, and see why this algorithm remains a powerful tool for solving some of the most challenging optimization problems in science, engineering, and beyond.

The Inspiration: Annealing in Metallurgy

Simulated Annealing gets its name from a process in metallurgy called annealing, where metals are heated to a high temperature and then cooled slowly to remove defects and achieve a stable crystalline structure.

When a metal is heated, its atoms vibrate wildly, exploring different arrangements. If it’s cooled too quickly, the atoms can get “stuck” in a disordered state, resulting in a brittle or flawed material. But if it’s cooled slowly, the atoms have time to settle into an optimal, low-energy configuration — the strongest and most stable arrangement possible.

In the world of optimization, we can think of potential solutions as the arrangement of atoms. A “good” solution corresponds to a low-energy state, while a “bad” solution is a high-energy state. Just like the metal, our solution space is full of local traps (suboptimal solutions) that can prevent us from finding the best overall answer.

Simulated Annealing borrows this idea: by introducing a controlled amount of randomness (like the heat in annealing) and gradually reducing it (cooling), the algorithm can explore widely at first, then slowly focus on the most promising solutions. This makes it a powerful tool for tackling problems where traditional methods get stuck too easily.

The Core Idea Behind Simulated Annealing

At its heart, Simulated Annealing is about balancing exploration and exploitation. Unlike algorithms that always move toward better solutions, SA sometimes takes a step backwards — accepting a worse solution temporarily. This counterintuitive move enables the algorithm to escape local minima — regions where traditional methods often get stuck — and continue searching for a global optimum.

The algorithm relies on a few key components:

  • Solution Space: The set of all possible solutions to the problem. For example, in route optimization, each possible path is a point in this space.
  • Objective Function: A measure of how good a solution is. Lower cost or higher efficiency typically indicates a better solution.
  • Temperature (T): A control parameter that determines how likely the algorithm is to accept worse solutions. High temperatures allow more freedom to explore, while low temperatures make the algorithm more selective.
  • Cooling Schedule: The plan for reducing the temperature over time. Slow cooling encourages careful refinement, while rapid cooling can cause the algorithm to settle too early.

The brilliance of Simulated Annealing lies in its ability to mimic nature: just as a hot metal slowly cools to a stable state, the algorithm starts with high randomness to explore widely, then gradually focuses on the best solutions as the “temperature” drops.

This simple yet powerful idea enables SA to tackle problems that are too complex for exhaustive search and too irregular for standard gradient-based methods.

The Simulated Annealing Algorithm Step-by-Step

Simulated Annealing is surprisingly simple once you break it down. Here’s how it works in practice:

  1. Begin with an initial solution by selecting a random one from the solution space.
  2. Evaluate its quality – calculate the cost or objective function value.
  3. Generate a neighbouring solution – make a small random change to the current solution.
  4. Decide whether to accept the new solution:
    • If it’s better (lower cost), accept it.
    • If it’s worse, accept it with a probability that decreases as the temperature drops.
  5. Reduce the temperature according to the cooling schedule.
  6. Repeat steps 3–5 until the system “freezes” (temperature is low or no significant improvements are observed).

Pseudocode Example:

current_solution = random_solution()
current_cost = evaluate(current_solution)
T = initial_temperature

while T > final_temperature:
    new_solution = neighbor(current_solution)
    new_cost = evaluate(new_solution)
    if new_cost < current_cost:
        current_solution = new_solution
        current_cost = new_cost
    else:
        probability = exp(-(new_cost - current_cost) / T)
        if random() < probability:
            current_solution = new_solution
            current_cost = new_cost
    T = cool(T)

Visual intuition:

  • Early on, the algorithm jumps around freely, exploring widely.
  • As temperature decreases, the “jumps” become smaller and more selective, honing in on promising solutions.
simulated annealing

This balance of randomness and refinement is what makes Simulated Annealing effective at avoiding local traps and gradually approaching an optimal solution.

Why Accept Worse Solutions?

At first glance, it might seem counterintuitive to accept a worse solution. After all, shouldn’t an optimization algorithm always move toward improvement? The magic of Simulated Annealing lies in its willingness to take occasional steps backwards — and here’s why:

Escaping Local Minima:

In complex solution spaces, it’s easy to get stuck in a “valley” that isn’t the lowest point overall. By occasionally accepting worse solutions, SA can jump out of these local traps and continue searching for a better global solution.

This stochasticity imbues SGD with the ability to traverse the optimization landscape more dynamically, potentially avoiding local minima and converging to better solutions.

The Role of Temperature:

SA uses a temperature parameter T to control the likelihood of accepting worse solutions. Early in the process, when T is high, the algorithm is more adventurous, exploring a wide range of solutions. As the system cools, the algorithm becomes more conservative, focusing on refinement.

Boltzmann Probability:

The probability of accepting a worse solution is calculated using the formula:

Boltzmann probability formula

Where ΔE is the increase in cost, a larger increase is less likely to be accepted, and lower temperatures make acceptance rarer. This mechanism mimics the behaviour of particles in a heated metal as it slowly cools into a stable structure.

By embracing occasional setbacks, Simulated Annealing cleverly balances exploration and exploitation. It doesn’t blindly climb uphill; it roams widely at first, then gradually homes in on the best solutions — much like a traveller cautiously descending into the deepest valley after surveying the landscape.

Practical Applications of Simulated Annealing

Simulated Annealing isn’t just a theoretical concept — it’s a powerful tool for solving real-world optimization problems across many fields. Here are some of the areas where it shines:

Travelling Salesman Problem (TSP):

Finding the shortest route that visits a set of cities and returns to the start is notoriously tricky. SA can efficiently explore different paths, sometimes accepting longer routes temporarily to find a near-optimal tour ultimately.

Circuit Design & Layout Optimization:

Engineers use SA to place components on a chip or route circuits efficiently, balancing multiple constraints like space, heat, and wiring length.

Scheduling and Resource Allocation:

From airline crew schedules to factory production lines, SA helps find solutions that satisfy multiple constraints while optimizing efficiency.

Machine Learning Hyperparameter Tuning:

Tuning parameters for models like neural networks can be seen as a high-dimensional optimization problem. SA can explore parameter combinations that gradient-based methods might miss.

Portfolio Optimization in Finance:

Investors can use SA to select a mix of assets that maximises returns while controlling risk, navigating a complex landscape of possibilities.

In essence, any problem where there’s a vast space of possible solutions and multiple local optima is a candidate for Simulated Annealing. Its flexibility and simplicity make it a go-to method when traditional optimization techniques struggle.

Tuning and Pitfalls of Simulated Annealing

While Simulated Annealing is powerful, its effectiveness depends heavily on how it’s tuned. A few key factors can make the difference between finding a near-optimal solution and wasting time wandering aimlessly through the solution space:

  1. Cooling Schedule:
    • The temperature must decrease gradually. Too fast → the algorithm “freezes” before finding a good solution. Too slow → long runtime with diminishing returns.
    • Common strategies include linear, geometric, or adaptive cooling schedules.
  2. Initial Temperature:
    • It should be high enough to allow exploration, but not so high that almost every solution is accepted randomly.
    • It often requires experimentation or domain knowledge to set up effectively.
  3. Neighbourhood Function:
    • Determines how new candidate solutions are generated from the current one.
    • Small changes encourage fine-tuning, significant changes enhance exploration. The right balance is crucial.
  4. Stopping Criteria:
    • SA can stop when the temperature reaches a minimum, after a set number of iterations, or when improvements become negligible.
    • Choosing the wrong criterion can either cut the search short or waste computational resources.
  5. Common Pitfalls:
    • Premature convergence: Cooling too fast leads to suboptimal solutions.
    • Excessive randomness: Too slow cooling or overly large steps can prevent convergence.
    • Problem-specific challenges: Poorly designed objective functions or neighbourhoods can make exploration ineffective.

Tip: Many practical implementations combine SA with local search or other optimization methods to balance global exploration and local refinement, improving efficiency and solution quality.

Comparing Simulated Annealing to Other Methods

Simulated Annealing sits between simple local search methods and more complex global optimization algorithms. Understanding its strengths and limitations helps clarify when it’s the right tool for the job:

  1. Vs. Gradient Descent:
    • Gradient descent relies on derivatives and moves steadily downhill toward a local minimum.
    • SA doesn’t require derivatives and can escape local minima by occasionally accepting worse solutions.
    • Advantage: SA is particularly effective for problems involving rugged, discontinuous, or non-differentiable landscapes.
  2. Vs. Hill Climbing:
    • Hill climbing always moves to a better neighbouring solution.
    • It can easily get stuck in local minima.
    • SA introduces randomness to enable “downhill” moves, thereby improving the likelihood of finding a global optimum.
  3. Vs. Genetic Algorithms (GA):
    • GA uses populations of solutions and mimics natural selection with crossover and mutation.
    • SA works with a single solution and relies on probabilistic acceptance of worse solutions.
    • SA is simpler and often faster for problems where a single solution trajectory is sufficient, while GA is better for exploring diverse solution spaces.
Illustration of batch gradient descent

Key Takeaway:

Simulated Annealing is a flexible, robust, and relatively simple optimization technique. It excels in problems where local search methods fail and population-based methods are unnecessarily complex. Its stochastic nature makes it particularly effective for combinatorial and highly irregular optimization problems.

Simulated Annealing How To Tutorial

Let’s see Simulated Annealing in action with a simple, visual example: finding the minimum of a 2D function. Imagine a landscape of hills and valleys representing the function’s values — the goal is to reach the lowest valley.

Step-by-step visualisation:

  1. Start randomly: The algorithm begins at a random point in the landscape. Early on, the “temperature” is high so that the algorithm can jump to higher points (worse solutions) as well as lower ones.
  2. Exploration: The algorithm moves around, occasionally climbing hills to escape local valleys. This prevents it from getting stuck in suboptimal minima.
  3. Cooling: As the temperature decreases, the algorithm becomes more selective, favouring moves that improve the solution.
  4. Convergence: Eventually, the system “cools” and the algorithm settles in the lowest valley it has found — ideally close to the global minimum.

Visual intuition:

  • Early steps involve large, random jumps that cover the landscape.
  • Middle steps: selective exploration of promising valleys.
  • Final steps: fine-tuning around the lowest found point.
simulated annealing

Python Code Example:

import numpy as np

# Simple 1D function: f(x) = x^2 + 10*sin(x)
def f(x):
    return x**2 + 10*np.sin(x)

current_x = np.random.uniform(-10, 10)
T = 10.0
cooling_rate = 0.95

for i in range(1000):
    new_x = current_x + np.random.uniform(-1, 1)
    delta = f(new_x) - f(current_x)
    if delta < 0 or np.random.rand() < np.exp(-delta / T):
        current_x = new_x
    T *= cooling_rate

print("Approximate minimum at x =", current_x)

This example illustrates how SA explores broadly first and refines later, giving a near-optimal solution without exhaustively checking every point.

Summary and Key Takeaways

Simulated Annealing is a stochastic, nature-inspired optimization algorithm that balances exploration and exploitation to find near-optimal solutions in complex problem spaces. By occasionally accepting worse solutions early on and gradually reducing randomness, it avoids local traps and steadily hones in on the best possibilities.

Key points to remember:

  • Inspired by physics: Mimics how metals are heated and slowly cooled to reach stable, low-energy states.
  • Accepts setbacks strategically: Temporary “worse” moves help escape local minima.
  • Temperature is key: High temperature = more exploration; low temperature = more refinement.
  • Flexible and widely applicable: Works for combinatorial problems, scheduling, route optimization, hyperparameter tuning, and more.
  • Tuning matters: Cooling schedule, initial temperature, neighborhood design, and stopping criteria all affect performance.

In short, Simulated Annealing is a simple yet powerful tool for tackling optimization problems that are too complex for exhaustive search and too irregular for traditional methods. Its combination of randomness and refinement makes it a timeless technique in the toolbox of engineers, data scientists, and problem solvers alike.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

simulated annealing

Simulated Annealing Made Simple & How To Tutorial In Python

Introduction: The Search for the Best Solution Imagine you’re trying to find the fastest route through a city with hundreds of streets, or the optimal design for a...

flock of birds feeding in an open field

Particle Swarm Optimization (PSO) Explained With How To Python Tutorial

Introduction Optimization lies at the heart of nearly every scientific and engineering challenge — from tuning the hyperparameters of a machine learning model to...

machine learning pipeline for documents

Machine Learning For Documents [How It Works & 15 Popular Tools]

Introduction Every organisation today is flooded with documents — contracts, invoices, reports, customer feedback, medical records, research papers, and more. These...

stratagies low resource NLP

Low-Resource NLP Made Simple [Challenges, Strategies, Tools & Libraries]

Introduction Natural Language Processing (NLP) powers many of the technologies we use every day—search engines, chatbots, translation tools, and voice assistants....

top python nlp libraries

Top 14 Python Natural Language Processing (NLP) Libraries With How To Tutorials

Introduction Language is at the heart of human communication—and in today's digital world, making sense of language at scale is more important than ever. From powering...

distributional semantics example

Embedding Models Explained, How To Use Them & 10 Tools/Frameworks

What Are Embedding Models? At their core, embedding models are tools that convert complex data—such as words, sentences, images, or even audio—into numerical...

glove vector example "king" is to "queen" as "man" is to "woman"

Vector Embeddings Made Simple & How To Tutorial In Python

What Are Vector Embeddings? Imagine trying to explain to a computer that the words "cat" and "dog" are more similar to each other than to "car". Computers don't...

the four steps of the monte carlo tree search

Monte Carlo Tree Search Explained & How To Implement [With Code]

What is Monte Carlo Tree Search? Monte Carlo Tree Search (MCTS) is a decision-making algorithm that helps an agent figure out the best action when the possible outcomes...

fibonacci numbers

Dynamic Programming Explained & How To Tutorial In Python

What is Dynamic Programming? Dynamic Programming (DP) is a powerful algorithmic technique used to solve complex problems by breaking them down into simpler, overlapping...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2025 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2025. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!