The Matlab code for this post is provided on Mathworks at:
It also requires additional files such as Sift.exe; please email us at firstname.lastname@example.org to get a copy.
Face Recognition is an essential practical application and is used by law enforcing agencies around the world to identify criminals. It is also used in high security facilities to give access to only authorized employees for instance. Human beings have the ability to recognise and distinguish faces whereas it is more challenging for computers to do so. There are many techniques which can be used to carry out the task of face recognition. This post will concentrate on the widely recognised Scale Invariant Feature Transform (SIFT) proposed by Lowe .
Face recognition consists of finding out if a face image of a person matches face images stored in a database. Face recognition and matching is a difficult problem due to various factors such as different illumination, facial expressions and rotation. However SIFT features invariance to image rotation and scaling, changes in lighting, 3D camera view point and partial occlusion, make them suitable for face recognition.
The SIFT algorithm transforms images into scale-invariant coordinates relative to local features and consists of four main stages of computation: scale-space extrema detection, keypoint localization, orientation assignment and keypoints descriptor. The SIFT features extracted from an image consist of 1×128 vectors which are orientation-invariant and 1×4 vectors which represent location, x & y coordinates, scale and orientation. For comparing images, only the 1×128 vectors are used here.
Figure 1 below shows an image with the SIFT features.
Figure 1 – Image with Extracted SIFT features
Details of the algorithm
The face recognition algorithm was written in Matlab and based on the code provided by Lowes . SIFT usually generates a large number of features and the number of features generated from an image cannot be predicted. For instance, face images tends to have different number of features. To overcome this problem, simple classification such as Nearest Neighbour is used. The code provided by Lowes already contains a classification approach based on K-d tree algorithm.
The image set provided for this post contains images of 18 persons. Each person in the set has at least 10 images. Ten images of each person were put in a training set and the rest of the images were used for test images. So the training set consists of 180 images.
For the purpose of face recognition, the SIFT features of the training images are extracted and stored in a database. The training images are also assigned a group number such that the face images of the same person have the same group number. Then, to match an image from the test set to the training images, the SIFT features of the test image is extracted and each feature of the test image is compared individually with the training database. The best matching features are found by calculating the Euclidean distance between the features vectors. A feature from the test image matches a feature in a training image if the distance between the 2 features is the least and is below a threshold.
The training image with the highest number of matches is said to correspond to the test image. The test image is then assigned to the group number of the training image. The group number therefore tells the user which person is the closest match with the test image.
Figure 2- Face Detection and crop
In cluttered images such as Figure 1 shown above, the SIFT algorithm will extract features from the background as well. Many of these background features will produce false matches. To prevent this, a face detection algorithm based on Viola Jones Object Detection was written and included in the code . The Object detection uses OpenCV trained classifiers. The face detection locates the face region on the image and then crops the image up to the detected region. The process is shown in Figure 2. The face detection not only reduces the number of features/descriptors but also speed up the image matching computation.
To create the training database, the training images are run in a script one by one. The face detection algorithm is run on the image and it is cropped to the detected region. The SIFT algorithm is then run on the cropped image and the resulting features are saved in a structure.
For comparing a test image with the training data, the test image is first run through the face detection and then the SIFT features are extracted from the cropped image. Then the keypoint is classified using a classifier. For this post, the classifier used, is based on the code by Lowes.
The aim of this post is to show the use of local features design a simple face image recognition algorithm in Matlab.
The Matlab code is provided on Mathworks at:
First of all, you need to obtain a some face images (training images) to get the algorithm to work. A good source of face images is:
The code only accepts images in ‘*.pgm’ format; so you will need to convert the images to .pgm format.
A = imread('yalefaces/subject01.glasses');
In the above example, we are reading an image from the yalefaces database and saving it in .pgm format.
To create the training data, run the batchSift.m function. This function scans all the images saved in our Training folder and detects the face. The image is then cropped to leave only the face area. In our example, the cropped images are saved in a folder called ‘croptrain’. The training data is then saved in a .mat file.
So now we have the training data available and we can compare an image with our stored data. Law enforcement, such as the police, would have a database of thousands of known criminal for example and they would compare a subject image with the database to see if the subject is known to them already.
Our subject photo is stored in the testset folder, labelled ‘image_001.pgm’
To compare the subject photo with the database;
Type this in the matlab command/function line:
Matlab will then compare /testset/image_001.pgm with the training database and will found the suspect.
You should see two outputs:
Of course in a police database, it would probably display the person’s name as opposed to person 1. To make things easier in our case, we gave the suspects/training images a number.
 David G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
 Viola Paul and Jones Michael Rapid Object Detection using a Boosted Cascade of Simple Features [Journal]. – 2001. – Vol. ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION.
 Viola Paul and Jones Michael Robust Real-time Object Detection [Journal]. – VANCOUVER, CANADA : [s.n.], 2001. – Vols. SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION – MODELING, LEARNING, COMPUTING, AND SAMPLING.
Give the software a go! You could also take your personal photos to detect faces using this software. If you have CCTV in your house or business, you could have an automated system that compares all the captured still images against known photos of you and your family and give you a warning if the images does not match…. meaning that a stranger was detected by your CCTV system.
We currently this technique at BTS instead of paying an external company to go through our hours of camera surveillance. We have so far detected only one stranger entering our private offices… it was the postman! 🙂