We present BGS (Big Graph Surfer), a scalable graph visualization tool that creates hierarchical structure from original graphs and provide interactive navigation along the hierarchy by expanding or collapsing clusters when visualizing large-scale graphs. A distributed computing framework-Spark provides the backend for BGS on clustering and visualization. This architecture makes it capable of visualizing a graph bigger than 1 billion nodes or edges in real-time after preprocessing. In addition, BGS provides a series of hierarchy and graph exploration methods, such as hierarchy view, hierarchy navigation, hierarchy search, graph view, graph navigation, graph search, and other useful interactions. These functionalities facilitate the exploration of very large-scale graphs. To evaluate the effectiveness of BGS, we apply BGS to several large-scale graph datasets, and discuss its scalability, usability, and flexibility.
The growth of graph size has created new problems in graph visualization and graph analysis. To solve the problem, several graph sampling techniques have been proposed dedicated to obtaining a representative subgraph from a complex network. While prior research indicates that sampling on a large-scale graph is not an easy task, especially for topology-based sampling methods (e.g. breadth first sampling). Topology-based sampling methods can produce a more accurate subgraph than node sampling and edge sampling in preserving statistical graph properties. In this paper, we propose three types of distributed sampling algorithms and develop a sampling package on Spark. To evaluate the effectiveness of these distributed sampling techniques, we apply them to three graph datasets and compare them with traditional/non-distributed sampling approaches. The results show that (1) our distributed sampling approaches are as reliable as the non-distributed sampling techniques, and (2) they are a great improvement in sampling efficiency, especially for topology-based sampling. In addition, (3) the distributed architecture of these algorithms causes them to have horizontal scalability.