Multilingual Email Phishing Attacks Detection using Open-source Intelligence and Machine Learning

Panharith  An; Rana  Shafi; Tionge  Mughogho; Allan  Onyango; Nikola  Nachevski; Reiner  Creutzburg

doi:10.2352/EI.2026.38.3.MOBMU-330

Abstract

Email phishing remains a prevalent cyber threat, targeting victims to extract sensitive information or deploy malicious software. This paper explores the integration of open-source intelligence (OSINT) tools and machine learning (ML) models to enhance phishing detection across multilingual datasets. Using Nmap and theHarvester, this study extracted 17 features, including domain names, IP addresses, and open ports, to improve detection accuracy. Multilingual email datasets, including English and Arabic, were analyzed to address limitations of ML models trained predominantly on English-language data. Experiments with five classification algorithms: Decision Tree, Random Forest, Support Vector Machine, XGBoost, and Multinomial Naïve Bayes. It revealed that Random Forest achieved the highest performance, with accuracies of 97.37% on both the English and Arabic datasets. For OSINT-enhanced datasets, the model achieved higher accuracy than baseline models without OSINT features. These findings highlight the potential of combining OSINT tools with advanced ML models to detect phishing emails more effectively across diverse languages and contexts. This study contributes an approach to phishing detection by incorporating OSINT features and evaluating their impact on multilingual datasets, addressing a critical gap in cybersecurity research.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2026.38.3.MOBMU-330

MOBMU-330

Proceedings Paper

Multilingual Email Phishing Attacks Detection using Open-source Intelligence and Machine Learning

AnPanharith

Kadir Has University, Turkey