Face Landmarks Detection With PyTorch

Can computers really understand the human face?

Published in

Towards Data Science

5 min readJul 15, 2020

A sample landmark detection on a photo by Ayo Ogunseinde taken from Unsplash

Ever wondered how Instagram applies stunning filters to your face? The software detects key points on your face and projects a mask on top. This tutorial will guide you on how to build one such software using Pytorch.

Colab Notebook

The complete code can be found in the interactive Colab Notebook below.

Face Landmark Detection

colab.research.google.com

Dataset

In this tutorial, we will use the official DLib Dataset which contains 6666 images of varying dimensions. Additionally, labels_ibug_300W_train.xml (comes with the dataset) contains the coordinates of 68 landmarks for each face. The script below will download the dataset and unzip it in Colab Notebook.

Here is a sample image from the dataset. We can see that the face occupies a very small fraction of the entire image. If we feed the full image to the neural network, it will also process the background (irrelevant information), making it difficult for the model to learn. Therefore, we need to crop the image and feed only the face portion.

Sample Image and Landmarks from the Dataset

Data Preprocessing

To prevent the neural network from overfitting the training dataset, we need to randomly transform the dataset. We will apply the following operations to the training and validation dataset:

Since the face occupies a very small portion of the entire image, crop the image and use only the face for training.
Resize the cropped face into a (224x224) image.
Randomly change the brightness and saturation of the resized face.
Randomly rotate the face after the above three transformations.
Convert the image and landmarks into torch tensors and normalize them between [-1, 1].

Dataset Class

Now that we have our transformations ready, let’s write our dataset class. The labels_ibug_300W_train.xml contains the image path, landmarks and coordinates for the bounding box (for cropping the face). We will store these values in lists to access them easily during training. In this tutorial, the neural network will be trained on grayscale images.

Note: landmarks = landmarks - 0.5 is done to zero-centre the landmarks as zero-centred outputs are easier for the neural network to learn.

The output of the dataset after preprocessing will look something like this (landmarks have been plotted on the image).

Neural Network

We will use the ResNet18 as the basic framework. We need to modify the first and last layers to suit our purpose. In the first layer, we will make the input channel count as 1 for the neural network to accept grayscale images. Similarly, in the final layer, the output channel count should equal 68 * 2 = 136 for the model to predict the (x, y) coordinates of the 68 landmarks for each face.

Training the Neural Network

We will use the Mean Squared Error between the predicted landmarks and the true landmarks as the loss function. Keep in mind that the learning rate should be kept low to avoid exploding gradients. The network weights will be saved whenever the validation loss reaches a new minimum value. Train for at least 20 epochs to get the best performance.

Predict on Unseen Data

Use the code snippet below to predict landmarks in unseen images.

The above code snippet will not work in Colab Notebook as some functionality of the OpenCV is not supported in Colab yet. To run the above cell, use your local machine.

OpenCV Harr Cascade Classifier is used to detect faces in an image. Object detection using Haar Cascades is a machine learning-based approach where a cascade function is trained with a set of input data. OpenCV already contains many pre-trained classifiers for face, eyes, pedestrians, and many more. In our case, we will be using the face classifier for which you need to download the pre-trained classifier XML file and save it to your working directory.

Detected faces in the input image are then cropped, resized to (224, 224) and fed to our trained neural network to predict landmarks in them.

The predicted landmarks in the cropped faces are then overlayed on top of the original image. The result is the image shown below. Pretty impressive, right!

Similarly, landmarks detection on multiple faces:

Here, you can see that the OpenCV Harr Cascade Classifier has detected multiple faces including a false positive (a fist is predicted as a face). So, the network has plotted some landmarks on that.

That’s all folks!

If you made it till here, hats off to you! You just trained your very own neural network to detect face landmarks in any image. Try predicting face landmarks on your webcam feed!!

If you have any suggestions, please leave a comment. I write articles regularly so you should consider following me to get more such articles in your feed.

If you liked this article, you might as well love these:

Machine Learning - Visualized

A visual approach to understand machine learning

towardsdatascience.com

Principal Component Analysis - Visualized

Data compression using Principal Component Analysis (PCA)