Of the many ways that artificial intelligence can be applied, few have caused more stir than facial recognition. While there are some reservations over this technology, on its own, facial recognition is nothing to fear. With no will of its own, it is much like other tools developed by humanity, i.e. only as harmful as the way we choose to wield it.
Below, Lambert Wixson, Clarifai’s Head of Applied Machine Learning, will offer some clarity on the subject. With his guidance, you will learn the basics of how this application of AI works.
What is facial recognition?
Facial recognition is the ability for an algorithm to take images from a camera, detect faces in those images, and compare each face to a database of people so that it can identify the person.
It's important to note that there are several distinct use cases for face recognition.
Use Case #1: Face Verification
In some applications, you may already know the subject's identity. For example, at an ATM, a person may have their identity confirmed through facial recognition after inserting their card. In this case, the technology is just comparing the captured image with a model of the card owner's face to see whether the faces match. The same situation occurs with using your face to unlock your phone. Essentially, the algorithm doesn't have to compare the captured face against a population of faces from different people, just against the card or phone owner's face.
Use Case #2: Closed-set Face Recognition
This use case arises when you know that all the face images that will be presented to the algorithm are from people that are in your database. That is, there will be no faces from "unknown" subjects, so the algorithm can avoid having to say "this face is not in the database." This might occur, for example, if you are working with a set of archival images like portraits of world leaders in a particular year, where you are guaranteed to know all the subjects in the photos.
Use Case #3: Open-set Face Recognition
This is the most general, and hardest, use-case for this technology. In this situation, the algorithm must take a face image and either identify whether it matches with a specific person from its database or declare, "This face is not in the database." Many factors can affect the difficulty of a recognition scenario.
A key question for developers using this technology is to decide what angle subjects’ faces will be. For instance, will the subject always be facing the camera, or does the algorithm also have to work when the subject is looking away from the camera like when the subject's face is in profile? The angle the subject's face makes with the camera is called yaw. If a face is looking straight at the camera, it has a yaw of 0. By contrast, a face in profile view has yaw of 90 degrees. The more you can constrain the yaw in your application, the more reliable face recognition can be.
How does facial recognition work?
Typically, a face recognition algorithm is divided into stages, organized as a pipeline from one stage to the next. We will call this the detect > align > extract pipeline. Each stage is explained below:
Stage 1: Face Detection Module
The face detection module’s job is to find each face in an image. It doesn't have to associate each face with a specific person, it just needs to find all the faces.
Stage 2: Alignment Module
An alignment module takes a detected face and normalizes it to a frame of reference. For example, if the person's head is cocked to one side, the alignment module may try to estimate this "roll" and rotate the face image so that it is vertical.
Stage 3: Feature Extractor
The facial extractor is a deep neural network (a type of artificial neural network) that takes the normalized image and computes a set of feature responses to characterize the face. This network is often trained in advance by the user, using databases of hundreds of thousands to millions of images.
Stage 4: Vectors and Match Metrics
To build your database for the population of people in your application, you must take multiple images of each person. Ideally, these images capture the person in a variety of facial expressions and rotations. For each of these images, you will then run them through the detect > align > extract pipeline, so that each face is reduced to a set of feature responses. Typically these feature responses are stored in the database, in an ordered list called a vector.
Once the database has been constructed, many algorithms also learn a "match metric." This tells them how to compare the vector of one face image to the vector of another. Others use predefined match metrics. The ideal match metric assigns high matching scores to face vectors from the same person, and low scores when it compares two face vectors that come from different people.
To summarize, in operation, a facial recognition algorithm works as follows:
- A newly-captured image is run through the detect > align > extract pipeline.
- Its resulting feature vector is then compared, using the match metric, to the feature vectors for each image in the database. (Additional data structures can be used to avoid having to check every image in the database which can be slow if the database holds many individuals.)
- The closest-matching face in the database is then chosen as the identity.
Note: If the match score is lower than the pre-defined threshold, the algorithm declares the face to be unknown.
When it comes to building a facial recognition system, deep learning can be used to build some or all of the stages mentioned above. In some cases, deep learning may also be used to combine some adjacent stages (e.g. alignment and extraction) into an all-in-one network. The exact details depend on the size of the training set, the computational requirements of the platform where the recognizer is to be run once deployed, and the types of variation in face images that are likely to arise in the desired application.