On the Internet, humans must repeatedly identify themselves to gain access to information or to use services. To check whether a request is sent by a human being and not by a computer, a task must be solved. These tasks are called CAPTCHAs and are designed to be easy for most people to solve and at the same time as unsolvable as possible for a computer. In the context of automated OSINT, which requires automatic solving of CAPTCHAs, we investigate the solving of audio CAPTCHAs. For this purpose, a program is written that integrates two common speech-to-text methods. The program achieves very good results and reaches an accuracy of about 81 percent. As CAPTCHAs are also an important tool for Internet access security, we also use the results of our attack to make suggestions for improving the security of these CAPTCHAs. We compares human listeners with computers and reveal weaknesses of audio CAPTCHAs.
Fabian Oberthür, Martin Steinebach, Verena Battis, "Audio captcha breaking and consequences for human users" in Electronic Imaging, 2023, pp 373--1 - 373-6, https://doi.org/10.2352/EI.2023.35.4.MWSF-373