Restricted Boltzmann Machines Explained & How To Tutorial

How are RBMs used in deep learning? Examples, applications and how it is used in collaborative filtering. With a step-by-step tutorial in Python.

Table of Contents

What are Restricted Boltzmann Machines?

Restricted Boltzmann Machines (RBMs) are generative neural network models that learn to show the probability distribution of a set of binary inputs. RBMs comprise two layers of nodes, a visible layer and a hidden layer, with each node being a binary unit that can take on the value of 0 or 1.

The main goal of RBMs is to learn a set of weights between the visible and hidden layers that can be used to model the input data distribution. The learning process is typically done using a variant of gradient descent called contrastive divergence, which updates the weights by minimising the difference between the input data and the model distribution.

One of the critical features of Restricted Boltzmann Machines is their ability to learn distributed representations of the input data in the hidden layer. This means that each hidden node represents a different combination of features from the input data, and the activations of these nodes can be used to reconstruct the input data.

RBMs are often used as building blocks for more complex deep learning models

RBMs have found many applications in machine learning, including dimensionality reduction, feature learning, and generative modelling. They have also been used as building blocks for more complex deep learning models, such as deep belief networks and deep autoencoders.

Restricted Boltzmann Machines (RBM) in deep learning

Restricted Boltzmann Machines (RBMs) are building blocks for various deep learning architectures, including deep belief networks, deep autoencoders, and deep Boltzmann machines. RBMs offer a way to learn unsupervised representations of the input data that can then be fed into a supervised learning algorithm for a classification or regression task.

In a deep belief network (DBN), multiple RBMs are stacked on top of each other, with the hidden layer of one RBM as the visible layer of the next RBM. Then, using an unsupervised learning algorithm like contrastive divergence, the DBN is trained layer by layer to learn a hierarchy of representations of the input data that are more and more abstract.

Deep autoencoders are another type of deep learning architecture that can be built using RBMs. In a deep autoencoder, the input data is first encoded into a compressed representation using an RBM and then decoded back into the original input using a second RBM. The autoencoder is trained to minimise the reconstruction error between the initial input and the reconstructed output, encouraging the Restricted Boltzmann Machines to learn a good representation of the input data.

Finally, deep Boltzmann machines (DBMs) are deep learning architectures with multiple RBM layers. In a DBM, the visible layer is connected to numerous hidden layers, with each hidden layer receiving input from the layer below it. The DBM is taught a hierarchy of increasingly abstract representations of the input data using an unsupervised learning algorithm like contrastive divergence.

Overall, Restricted Boltzmann Machines are a helpful tool for deep learning because they make it possible to find complex and hierarchical representations of input data without being told how to do it.

Restricted Boltzmann Machines example

Let’s consider an example to illustrate how a Restricted Boltzmann Machine (RBM) can learn to represent a distribution of binary inputs.

Suppose we have a dataset of binary images, where each image is a 28×28 pixel array of black and white pixels. Each pixel can be represented as a binary value of 0 or 1, with 0 indicating a black pixel and 1 marking a white pixel. Our goal is to use an RBM to learn a set of weights that can be used to generate new images that are similar to the images in our dataset.

Training the Restricted Boltzmann Machines

To train the RBM, we start by randomly initialising the weights between the visible and hidden layers. We then present each image in our dataset to the RBM as input. Finally, we use an unsupervised learning algorithm, such as contrastive divergence, to better update the weights to represent the input data distribution.

During training, the RBM learns to represent the input data distribution by learning a set of hidden features that capture the most salient patterns in the input images. Each hidden unit in the RBM is a binary feature turned on when the input image has a particular combination of pixels.

Once the RBM is trained, we can generate new images similar to the images in our dataset. First, we randomly initialise the visible layer to develop a new image to a set of binary values. We then use the RBM to sample a set of binary values for the hidden layer and then use these hidden values to generate a new set of binary values for the visible layer. We repeat this process for a number of iterations until the visible layer converges into a new image.

In this way, the Restricted Boltzmann Machines can be used as a generative model for the distribution of binary images, allowing us to generate new images similar to the images in our dataset.

Restricted Boltzmann machines applications

Restricted Boltzmann Machines (RBMs) have been used for many things in machine learning, such as:

Dimensionality reduction: RBMs can be used to learn a low-dimensional representation of high-dimensional data. By learning a compressed representation of the input data in the hidden layer, RBMs can reduce the dimensionality of the input while preserving most of the salient features.
Feature learning: RBMs can be used to learn a set of features from the input data, which can be used as input to a supervised learning algorithm for classification or regression tasks. RBMs can improve the performance of a supervised learning algorithm by teaching it a set of hidden features that capture the most basic patterns in the input data.
Collaborative filtering: RBMs can be used for recommender systems and collaborative filtering. By learning a set of hidden features that capture the preferences of users and the attributes of items, RBMs can predict the ratings that a user would give to a new thing.
Image generation: RBMs can generate new images similar to those in the training data. RBMs can make unique images that show the most critical parts of the input data by taking samples from the learned distribution of the hidden layer.
Natural language processing: RBMs have been used for language modelling and text generation. By learning a set of hidden features that capture the relationships between words in a corpus of text, RBMs can generate new text similar to the training data.
Anomaly detection: RBMs can detect anomalies in the input data. By learning a set of hidden features that capture the most common patterns in the input data, RBMs can identify input that deviates significantly from the learned distribution and flag it as an anomaly.

Illustration of Item-Based Collaborative Filtering

Overall, RBMs are a powerful and flexible tool in machine learning that has been used in many fields.

Restricted Boltzmann Machines collaborative filtering

What is it?

Restricted Boltzmann Machines have been successfully applied to collaborative filtering, which predicts user preferences for items based on their past behaviour or the behaviour of similar users. Collaborative filtering is often used in recommendation systems to show users things they might like based on what they already like.

Collaborative filtering is often used in recommendation systems

In an RBM-based collaborative filtering approach, the input to the RBM is a matrix of user-item ratings, where each row represents a user and each column means an item. Each entry in the matrix represents the rating given by a user for an item or a missing value if the user has not rated the article.

Training an Restricted Boltzmann Machines

The RBM is trained to learn a set of hidden features that capture the preferences of users and the attributes of items. Each hidden unit in the RBM represents a binary feature activated when certain combinations of objects and users are present in the input matrix. An unsupervised learning algorithm, like contrastive divergence, is used to train the RBM to learn a set of weights that maximise the likelihood of the input matrix.

Using the RBM to make predictions

Once the Restricted Boltzmann Machine is trained, we can use it to predict the rating a user would give a new item. To do this, we first fix the user’s ratings for all other items in the input matrix and then use the RBM to sample a set of binary values for the hidden layer. We then use these hidden values to generate a new set of binary values for the visible layer corresponding to the new item. The expected value of the user’s visible unit is then used to predict the rating for the new item.

This RBM-based collaborative filtering approach is effective in predicting user preferences for items and has been used in various recommendation systems, such as movie and music recommendation systems. The RBM-based approach can also handle missing data in the input matrix, a common problem in collaborative filtering.

Restricted Boltzmann Machine Tutorial in Python

Here is a step-by-step guide on how to use Python and TensorFlow to make a Restricted Boltzmann Machine (RBM):

Step 1: Import the necessary libraries

import numpy as np import tensorflow as tf

Step 2: Load the data Load the input data that will be used to train the RBM

In this example, we will use the MNIST dataset of handwritten digits.

from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Step 3: Set the hyperparameters

Set the hyperparameters that will be used to train the RBM. In this example, we will use the following hyperparameters:

learning_rate = 0.01 
training_epochs = 50 
batch_size = 100 
n_hidden = 256 
n_visible = 784

Step 4: Define the input and output placeholders

Define the input and output placeholders that will be used to feed data into the RBM.

X = tf.placeholder("float", [None, n_visible], name='X') 
X_noise = tf.placeholder("float", [None, n_visible], name='X_noise')

Step 5: Define the RBM parameters

Define the RBM parameters, including the weights and biases for the visible and hidden layers.

W = tf.Variable(tf.random_normal([n_visible, n_hidden], 0.01), name='W') 
b_visible = tf.Variable(tf.zeros([n_visible]), name='b_visible') 
b_hidden = tf.Variable(tf.zeros([n_hidden]), name='b_hidden')

Step 6: Define the activation functions

Define the activation functions for the visible and hidden layers.

def sigmoid(x): 
  return 1. / (1. + tf.exp(-x)) 
  
def sample_prob(probs): 
  return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))

Step 7: Define the positive phase

Define the positive phase, which calculates the expected value of the hidden layer activations given the input data.

h_prob = sigmoid(tf.matmul(X_noise, W) + b_hidden) 
h_state = sample_prob(h_prob)

Step 8: Define the negative phase

Define the negative phase, which calculates the expected value of the visible layer activations given the hidden layer activations.

v_prob = sigmoid(tf.matmul(h_state, tf.transpose(W)) + b_visible) 
v_state = sample_prob(v_prob)

Step 9: Define the reconstruction error

Define the reconstruction error, which measures the difference between the input data and the reconstructed data.

err = tf.reduce_mean(tf.square(X - v_state))

Step 10: Define the training operation

Define the training operation, which updates the RBM parameters to minimize the reconstruction error.

train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(err)

Step 11: Train the RBM

Train the RBM on the input data.

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples / batch_size)
        for i in range(total_batch):
            batch_xs, _ = mnist.train.next_batch(batch_size)
            batch_xs_noise = batch_xs + 0.3*np.random.randn(*batch_xs.shape)
            batch_xs_noise = np.clip(batch_xs_noise, 0., 1.)
            batch_xs = np.clip(batch_xs, 0., 1.)
            _, c = sess
    sess.run(train_op, feed_dict={X: batch_xs, X_noise: batch_xs_noise})
    print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c))
print("RBM training finished!")

# Test the RBM on the test data
batch_xs, _ = mnist.test.next_batch(10)
batch_xs_noise = batch_xs + 0.3*np.random.randn(*batch_xs.shape)
batch_xs_noise = np.clip(batch_xs_noise, 0., 1.)
batch_xs = np.clip(batch_xs, 0., 1.)
err_test = sess.run(err, feed_dict={X: batch_xs, X_noise: batch_xs_noise})
print("RBM test error:", err_test)

In this example, we train the RBM for 50 epochs with a learning rate of 0.01 and a batch size of 100. We also add Gaussian noise to the input data and clip the values between 0 and 1. Finally, we test the RBM on 10 test images and calculate the reconstruction error.

Note that this is just a basic example of how to implement an RBM in TensorFlow. The RBM can be changed and improved in many ways, such as by adding regularization, using contrastive divergence for training, or using a different activation function.

Conclusion

A Restricted Boltzmann Machine (RBM) is a generative model that can learn a compressed input data representation. RBMs have been used in various applications, such as collaborative filtering, feature learning, and dimensionality reduction.

In this tutorial, we showed how to implement an RBM in TensorFlow using the MNIST dataset of handwritten digits. We loaded the data, defined the RBM parameters, defined the activation functions, defined the positive and negative phases, defined the reconstruction error, defined the training operation, and trained the RBM on the input data.

While this tutorial provides a basic understanding of how to implement an RBM in TensorFlow, there are many ways to customize and optimize the RBM for different applications. With its ability to learn a compressed representation of the input data, RBMs can be a powerful tool for deep learning practitioners looking to build generative models for a variety of use cases.