Back to articles
Article
Volume: 35 | Article ID: MOBMU-356
Image
Practical OSINT investigation - Similarity calculation using Reddit user profile data
  DOI :  10.2352/EI.2023.35.3.MOBMU-356  Published OnlineJanuary 2023
Abstract
Abstract

This paper presents a practical Open Source Intelligence (OSINT) use case for user similarity measurements with the use of open profile data from the Reddit social network. This PoC work combines the open data from Reddit and the part of the state-of-the-art BERT model. Using the PRAW Python library, the project fetches comments and posts of users. Then these texts are converted into a feature vector - representation of all user posts and comments. The main idea here is to create a comparable user's pair similarity score based on their comments and posts. For example, if we fix one user and calculate scores of all mutual pairs with other users, we will produce a total order on the set of all mutual pairs with that user. This total order can be described as a degree of written similarity with this chosen user. A set of "similar" users for one particular user can be used to recommend to the user interesting for him people. The similarity score also has a "transitive property": if $user_1$ is "similar" to $user_2$ and $user_2$ is similar to $user_3$ then inner properties of our model guarantees that $user_1$ and $user_3$ are pretty "similar" too. In this way, this score can be used to cluster a set of users into sets of "similar" users. It could be used in some recommendation algorithms or tune already existing algorithms to consider a cluster's peculiarities. Also, we can extend our model and calculate feature vectors for subreddits. In that way, we can find similar to the user's subreddits and recommend them to him.

Subject Areas :
Views 93
Downloads 38
 articleview.views 93
 articleview.downloads 38
  Cite this article 

Valeria Vishnevskaya, Klaus Schwarz, Reiner Creutzburg, "Practical OSINT investigation - Similarity calculation using Reddit user profile datain Electronic Imaging,  2023,  pp 356-1 - 356-10,  https://doi.org/10.2352/EI.2023.35.3.MOBMU-356

 Copy citation
  Copyright statement 
Copyright © 2023, Society for Imaging Science and Technology 2023
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA