As facial recognition aims for the mainstream – just how reliable is it?

Original press release was issued by University of Washington, written by Jennifer Langston.
In the last few years, several groups have announced that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.
But those tests were performed on a dataset with only 13,000 images — fewer people than attend an average professional U.S. soccer game. What happens to their performance as those crowds grow to the size of a major U.S. city?
University of Washington researchers answered that question with the MegaFace Challenge, the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million person scale. All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.

“We need to test facial recognition on a planetary scale to enable practical applications — testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” said Ira Kemelmacher-Shlizerman, a UW assistant professor of computer science and the project’s principal investigator. “We can’t just test it on a very small scale and say it works perfectly.”

The UW team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.
Google’s FaceNet showed the strongest performance on one test, dropping from near-perfect accuracy when confronted with a smaller number of images to 75 percent on the million person test. A team from Russia’s N-TechLab came out on top on another test set, dropping to 73 percent.
By contrast, the accuracy rates of other algorithms that had performed well — above 95 percent — at a small scale dropped by much larger percentages to as low as 33 percent accuracy when confronted with the harder task.
The MegaFace challenge tested the algorithms on verification, or how well they could correctly identify whether two photos were of the same person. That’s how an iPhone security feature, for instance, could recognize your face and decide whether to unlock your phone instead of asking you to type in a password.
“What happens if you lose your phone in a train station in Amsterdam and someone tries to steal it?” said Kemelmacher-Shlizerman, who co-leads the UW Graphics and Imaging Laboratory (GRAIL.) “I’d want certainty that my phone to can correctly identify me out of a million people — or 7 billion — not just 10,000 or so.”
They also tested the algorithms on identification, or how accurately they could find a match to the photo of a single individual to a different photo of the same person buried among a million “distractors.” That’s what happens, for instance, when law enforcement have a single photograph of a criminal suspect and are combing through images taken on a subway platform or airport to see if the person is trying to escape.
“You can see where the hard problems are — recognizing people across different ages is an unsolved problem. So is identifying people from their doppelgängers and matching people who are in varying poses like side views to frontal views,” said Kemelmacher-Shlizerman. The paper also analyses age and pose invariance in face recognition when evaluated at scale.
In general, algorithms that “learned” how to find correct matches out of larger image datasets outperformed those that only had access to smaller training datasets. But the SIAT MMLab algorithm developed by a research team from China, which learned on a smaller number of images, bucked that trend by outperforming many others.
The MegaFace challenge is ongoing and still accepting results.
The team’s next steps include assembling a half a million identities — each with a number of photographs — for a dataset that will be used to train facial recognition algorithms. This will help level the playing field and test which algorithms outperform others given the same amount of large scale training data, as most researchers don’t have access to image collections as large as Google’s or Facebook’s. The training set will be released towards the end of the summer.
“State-of-the-art deep neural network algorithms have millions of parameters to learn and require a plethora of examples to accurately tune them,” said Aaron Nech, a UW computer science and engineering master’s student working on the training dataset. “Unlike people, these models are initially a blank slate. Having diversity in the data, such as the intricate identity cues found across more than 500,000 unique individuals, can increase algorithm performance by providing examples of situations not yet seen.”