Practical OSINT investigation - Similarity calculation using Reddit user profile data

Valeria  Vishnevskaya; Klaus  Schwarz; Reiner  Creutzburg

doi:10.2352/EI.2023.35.3.MOBMU-356

Abstract

This paper presents a practical Open Source Intelligence (OSINT) use case for user similarity measurements with the use of open profile data from the Reddit social network. This PoC work combines the open data from Reddit and the part of the state-of-the-art BERT model. Using the PRAW Python library, the project fetches comments and posts of users. Then these texts are converted into a feature vector - representation of all user posts and comments. The main idea here is to create a comparable user's pair similarity score based on their comments and posts. For example, if we fix one user and calculate scores of all mutual pairs with other users, we will produce a total order on the set of all mutual pairs with that user. This total order can be described as a degree of written similarity with this chosen user. A set of "similar" users for one particular user can be used to recommend to the user interesting for him people. The similarity score also has a "transitive property": if $user_1$ is "similar" to $user_2$ and $user_2$ is similar to $user_3$ then inner properties of our model guarantees that $user_1$ and $user_3$ are pretty "similar" too. In this way, this score can be used to cluster a set of users into sets of "similar" users. It could be used in some recommendation algorithms or tune already existing algorithms to consider a cluster's peculiarities. Also, we can extend our model and calculate feature vectors for subreddits. In that way, we can find similar to the user's subreddits and recommend them to him.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2023.35.3.MOBMU-356

MOBMU-356

Article

Practical OSINT investigation - Similarity calculation using Reddit user profile data

VishnevskayaValeria

SRH Berlin University of Applied Sciences, Germany

SchwarzKlaus

SRH Berlin University of Applied Sciences, Germany

University of Granada, Spain

CreutzburgReiner

SRH Berlin University of Applied Sciences, Germany

Technische Hochschule Brandenburg, Germany

Abstract

1612023

MOBMU

Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2023

356-1

356-10

2023

OSINTopen-source intelligenceSOCMINTsocial media intelligencecybersecuritysocial media investigationReddit

articleview.keywords