Back to articles
Articles
Volume: 30 | Article ID: art00012
Image
Implementation and Evaluation of Distributed Graph Sampling Methods with Spark
  DOI :  10.2352/ISSN.2470-1173.2018.01.VDA-379  Published OnlineJanuary 2018
Abstract

The growth of graph size has created new problems in graph visualization and graph analysis. To solve the problem, several graph sampling techniques have been proposed dedicated to obtaining a representative subgraph from a complex network. While prior research indicates that sampling on a large-scale graph is not an easy task, especially for topology-based sampling methods (e.g. breadth first sampling). Topology-based sampling methods can produce a more accurate subgraph than node sampling and edge sampling in preserving statistical graph properties. In this paper, we propose three types of distributed sampling algorithms and develop a sampling package on Spark. To evaluate the effectiveness of these distributed sampling techniques, we apply them to three graph datasets and compare them with traditional/non-distributed sampling approaches. The results show that (1) our distributed sampling approaches are as reliable as the non-distributed sampling techniques, and (2) they are a great improvement in sampling efficiency, especially for topology-based sampling. In addition, (3) the distributed architecture of these algorithms causes them to have horizontal scalability.

Subject Areas :
Views 54
Downloads 11
 articleview.views 54
 articleview.downloads 11
  Cite this article 

Fangyan Zhang, Song Zhang, Christopher Lightsey, "Implementation and Evaluation of Distributed Graph Sampling Methods with Sparkin Proc. IS&T Int’l. Symp. on Electronic Imaging: Visualization and Data Analysis,  2018,  pp 379-1 - 379-9,  https://doi.org/10.2352/ISSN.2470-1173.2018.01.VDA-379

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2018
72010604
Electronic Imaging
2470-1173
Society for Imaging Science and Technology