Back to articles
Volume: 30 | Article ID: art00012
Implementation and Evaluation of Distributed Graph Sampling Methods with Spark
  DOI :  10.2352/ISSN.2470-1173.2018.01.VDA-379  Published OnlineJanuary 2018

The growth of graph size has created new problems in graph visualization and graph analysis. To solve the problem, several graph sampling techniques have been proposed dedicated to obtaining a representative subgraph from a complex network. While prior research indicates that sampling on a large-scale graph is not an easy task, especially for topology-based sampling methods (e.g. breadth first sampling). Topology-based sampling methods can produce a more accurate subgraph than node sampling and edge sampling in preserving statistical graph properties. In this paper, we propose three types of distributed sampling algorithms and develop a sampling package on Spark. To evaluate the effectiveness of these distributed sampling techniques, we apply them to three graph datasets and compare them with traditional/non-distributed sampling approaches. The results show that (1) our distributed sampling approaches are as reliable as the non-distributed sampling techniques, and (2) they are a great improvement in sampling efficiency, especially for topology-based sampling. In addition, (3) the distributed architecture of these algorithms causes them to have horizontal scalability.

Subject Areas :
Views 24
Downloads 7
 articleview.views 24
 articleview.downloads 7
  Cite this article 

Fangyan Zhang, Song Zhang, Christopher Lightsey, "Implementation and Evaluation of Distributed Graph Sampling Methods with Sparkin Proc. IS&T Int’l. Symp. on Electronic Imaging: Visualization and Data Analysis,  2018,  pp 379-1 - 379-9,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2018
Electronic Imaging
Society for Imaging Science and Technology