Biometric authentication takes on many forms. Some of the more researched forms are fingerprint and facial authentication. Due to the amounts of research in these areas there are benchmark datasets easily accessible for new researchers to utilize when evaluating new systems. A newer, less researched biometric method is that of lip motion authentication. These systems entail a user producing a lip motion password to authenticate, meaning they must utter the same word or phrase to gain access. Because this method is less researched, there is no large-scale dataset that can be used to compare methods as well as determine the actual levels of security that they provide. We propose an automated dataset collection pipeline that extracts a lip motion authentication dataset from collections of videos. This dataset collection pipeline will enable the collection of large-scale datasets for this problem thus advancing the capability of lip motion authentication systems.
As facial authentication systems become an increasingly advantageous technology, the subtle inaccuracy under certain subgroups grows in importance. As researchers perform data augmentation to increase subgroup accuracies, it is critical that the data augmentation approaches are understood. We specifically research the impact that the data augmentation method of racial transformation has upon the identity of the individual according to a facial authentication network. This demonstrates whether the racial transformation maintains critical aspects to an individual identity or whether the data augmentation method creates the equivalence of an entirely new individual for networks to train upon. We demonstrate our method for racial transformation based on other top research articles methods, display the embedding distance distribution of augmented faces compared with the embedding distance of non-augmented faces and explain to what extent racial transformation maintains critical aspects to an individual’s identity.
DeepFakes are a recent trend in computer vision, posing a thread to authenticity of digital media. For the detection of DeepFakes most prominently neural network based approaches are used. Those detectors often lack explanatory power on why the given decision was made, due to their black-box nature. Furthermore, taking the social, ethical and legal perspective (e.g. the upcoming European Commission in the Artificial Intelligence Act) into account, black-box decision methods should be avoided and Human Oversight should be guaranteed. In terms of explainability of AI systems, many approaches work based on post-hoc visualization methods (e.g. by back-propagation) or the reduction of complexity. In our paper a different approach is used, combining hand-crafted as well as neural network based components analyzing the same phenomenon to aim for explainability. The exemplary chosen semantic phenomenon analyzed here is the eye blinking behavior in a genuine or DeepFake video. Furthermore, the impact of video duration on the classification result is evaluated empirically, so that a minimum duration threshold can be set to reasonably detect DeepFakes.