Hand Digit Recognition using Recurrent Neural Network in Pytorch

3 min readJun 27, 2020

This tutorial demonstrate how to use recurrent neural network to predict handwritten digits in pytorch.

After completing this tutorial, you will know:

How to load the MNIST dataset in Pytorch.
How to implement Recurrent neural network for MNIST.
How to evaluate the performance of a model.

Since we will be using pytorch let’s import pytorch:

import torch

Pytorch provides a package called Torchvision that contains datasets, image transformations and models. To import the data we will have to add the following lines.

import torchvision.datasets as datasets
train_dataset = datasets.MNIST(root='../../data/',
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)                                             test_dataset = datasets.MNIST(root='../../data/',
                              train=False, 
                              transform=transforms.ToTensor())

The data is saved in the root parameter.
We have set train=True which will initialize the MNIST training dataset.
The dataset is downloaded only if the dataset is not present in our folder which is control by download=True parameter.
Transform parameter manipulates the image, here we are converting the entire array to tensor and then normalize it. This forces the network to be in the range of 0 and 1.

Lets understand the data. Each image is a 28 by 28 pixel square (784 pixels total) grayscale images. The training dataset contains 60,000 images and a separate test dataset contains 10,000 images.

The task is to classify a given image of a handwritten digit into one of 10 classes that represent values from 0 to 9.

Now lets create an iterable that will return the data in mini batches, this is handle by Dataloader in pytorch.

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

To start building our own neural network model, we can define a class that inherits PyTorch’s base class(nn.Module). For this model, we will be using 1 LSTM layer followed by fully-connected layer.

We will then define a forward pass function i.e., forward() as a class method. The tasks inside the forward function is executed sequentially. We will have to pass zero initialized hidden state and cell state through LSTM layer first, before passing the output to the fully-connected layer.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, 
    num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                            batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        h = torch.zeros(self.num_layers, x.size(0), 
                        self.hidden_size).to(device) 
        c = torch.zeros(self.num_layers, x.size(0), 
                        self.hidden_size).to(device)
        
        out, _ = self.lstm(x, (h, c))  
        
        out = self.fc(out[:, -1, :])
        return out

To set device dynamically, we will create the following class method.

def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
    
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)def __len__(self):
        """Number of batches"""
        return len(self.dl)

Let’s make an instance of the class.

device = get_default_device()

Now we will instantiate the model and define the hyper-parameters.

# Hyper parameters
learning_rate = 0.001
sequence_length = 28
hidden_size = 128
num_classes = 10
batch_size = 64
input_size = 28
num_layers = 2
num_epochs = 3model = RNN(input_size, hidden_size, num_layers, num_classes)
to_device(model, device)

We will use CrossEntropyLoss since our task is to classify the digits and the common optimizer Adam with learning rate of 0.001.

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Let’s begin our training,

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Let’s do the evaluation of the model.

# Evaluate the model
model.eval()
with torch.no_grad():
    right = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, 
                 input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        right += (predicted == labels).sum().item()print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * right / total))

Hand Digit Recognition using Recurrent Neural Network in Pytorch

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Kiran Prajapati

No responses yet