Face recognition, as one of the most well-studied problems in computer vision, consists of two subproblems, verification and identification. Face verification is to determine whether two given face images belong to the same person, while face identification is typically to fetch
the most similar faces in a gallery image set for any given query image. In this paper, we define our face recognition task as to determine the identity of a person from this individual’s face image by using all the possibly collected face images of this individual as training data.
More specifically, our task is to recognize the face image and link the face to a corresponding entity key in a knowledge base. With the unique key and the associated rich information provided by the knowledge base, our face recognition is an end-to-end simulation of the human behavior in
face recognition. For this purpose, we design a benchmark task, which is to recognize one million celebrities in the world from their face images, which probably lead to one of the largest classification problems in computer vision. We describe and provide both training and measurement datasets
to facilitate research in this area. Our datasets are larger than any existing datasets which are publicly available, and can help close the gap to the scale of the datasets used privately in industry.