
Email phishing remains a prevalent cyber threat, targeting victims to extract sensitive information or deploy malicious software. This paper explores the integration of open-source intelligence (OSINT) tools and machine learning (ML) models to enhance phishing detection across multilingual datasets. Using Nmap and theHarvester, this study extracted 17 features, including domain names, IP addresses, and open ports, to improve detection accuracy. Multilingual email datasets, including English and Arabic, were analyzed to address limitations of ML models trained predominantly on English-language data. Experiments with five classification algorithms: Decision Tree, Random Forest, Support Vector Machine, XGBoost, and Multinomial Naïve Bayes. It revealed that Random Forest achieved the highest performance, with accuracies of 97.37% on both the English and Arabic datasets. For OSINT-enhanced datasets, the model achieved higher accuracy than baseline models without OSINT features. These findings highlight the potential of combining OSINT tools with advanced ML models to detect phishing emails more effectively across diverse languages and contexts. This study contributes an approach to phishing detection by incorporating OSINT features and evaluating their impact on multilingual datasets, addressing a critical gap in cybersecurity research.
Panharith An, Rana Shafi, Tionge Mughogho, Allan Onyango, Nikola Nachevski, Reiner Creutzburg, "Multilingual Email Phishing Attacks Detection using Open-source Intelligence and Machine Learning" in Electronic Imaging, 2026, pp 330-1 - 330-10, https://doi.org/10.2352/EI.2026.38.3.MOBMU-330