Audio captcha breaking and consequences for human users

Fabian  Oberthür; Martin  Steinebach; Verena  Battis

doi:10.2352/EI.2023.35.4.MWSF-373

Abstract

On the Internet, humans must repeatedly identify themselves to gain access to information or to use services. To check whether a request is sent by a human being and not by a computer, a task must be solved. These tasks are called CAPTCHAs and are designed to be easy for most people to solve and at the same time as unsolvable as possible for a computer. In the context of automated OSINT, which requires automatic solving of CAPTCHAs, we investigate the solving of audio CAPTCHAs. For this purpose, a program is written that integrates two common speech-to-text methods. The program achieves very good results and reaches an accuracy of about 81 percent. As CAPTCHAs are also an important tool for Internet access security, we also use the results of our attack to make suggestions for improving the security of these CAPTCHAs. We compares human listeners with computers and reveal weaknesses of audio CAPTCHAs.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2023.35.4.MWSF-373

MWSF-373

Article

Audio captcha breaking and consequences for human users

OberthürFabian

TU Darmstadt, Germany

SteinebachMartin

Fraunhofer Institute for Secure Information Technology, Germany

BattisVerena

Fraunhofer Institute for Secure Information Technology, Germany

Abstract

1612023

MWSF

Media Watermarking, Security, and Forensics 2023

373--1

373-6

2023

Audio CaptchasNoiseWebsite Securityspeech to text

articleview.keywords