The growth of graph size has created new problems in graph visualization and graph analysis. To solve the problem, several graph sampling techniques have been proposed dedicated to obtaining a representative subgraph from a complex network. While prior research indicates that sampling on a large-scale graph is not an easy task, especially for topology-based sampling methods (e.g. breadth first sampling). Topology-based sampling methods can produce a more accurate subgraph than node sampling and edge sampling in preserving statistical graph properties. In this paper, we propose three types of distributed sampling algorithms and develop a sampling package on Spark. To evaluate the effectiveness of these distributed sampling techniques, we apply them to three graph datasets and compare them with traditional/non-distributed sampling approaches. The results show that (1) our distributed sampling approaches are as reliable as the non-distributed sampling techniques, and (2) they are a great improvement in sampling efficiency, especially for topology-based sampling. In addition, (3) the distributed architecture of these algorithms causes them to have horizontal scalability.
The characterization and abstraction of large multivariate time series data often poses challenges with respect to effectiveness or efficiency. Using the example of human motion capture data challenges exist in creating compact solutions that still reflect semantics and kinematics in a meaningful way. We present a visual-interactive approach for the semi-supervised labeling of human motion capture data. Users are enabled to assign labels to the data which can subsequently be used to represent the multivariate time series as sequences of motion classes. The approach combines multiple views supporting the user in the visualinteractive labeling process. Visual guidance concepts further ease the labeling process by propagating the results of supportive algorithmic models. The abstraction of motion capture data to sequences of event intervals allows overview and detail-on-demand visualizations even for large and heterogeneous data collections. The guided selection of candidate data for the extension and improvement of the labeling closes the feedback loop of the semisupervised workflow. We demonstrate the effectiveness and the efficiency of the approach in two usage scenarios, taking visualinteractive learning and human motion synthesis as examples.
The exploration of text document collections is a complex and cumbersome task. Clustering techniques can help to group documents based on their content for the generation of overviews. However, the underlying clustering workflows comprising preprocessing, feature selection, clustering algorithm selection and parameterization offer several degrees of freedom. Since no "best" clustering workflow exists, users have to evaluate clustering results based on the data and analysis tasks at hand. In our approach, we present an interactive system for the creation and validation of text clustering workflows with the goal to explore document collections. The system allows users to control every step of the text clustering workflow. First, users are supported in the feature selection process via feature selection metrics-based feature ranking and linguistic filtering (e.g., part-of-speech filtering). Second, users can choose between different clustering methods and their parameterizations. Third, the clustering results can be explored based on the cluster content (documents and relevant feature terms), and cluster quality measures. Fourth, the results of different clusterings can be compared, and frequent document subsets in clusters can be identified. We validate the usefulness of the system with a usage scenario describing how users can explore document collections in a visual and interactive way.
Storytelling animation has a great potential to be widely adopted by domain scientists for exploring trends in scientific simulations. However, due to the dynamic nature and generation methods of animations, serious concerns have been raised regarding their effectiveness for analytical tasks. This has led to interactive techniques often being favored over animations, as they provide the user with complete control over the visualization. This trend in scientific visualization design has not yet considered newer algorithmic animation generation methods that are driven by the automatic analysis of data features and storytelling techniques. In this work, the authors performed an experiment which compares feature-driven storytelling animations to common interactive visualization techniques for time-varying scientific simulations. They discuss the design of the experiment, including tasks for storm-surge analysis that are representative of common scientific visualization projects. Their results illustrate the relative advantages of both feature-driven storytelling animations and interactive visualizations, which may provide useful design guidelines for future storytelling and scientific visualization techniques. © 2016 Society for Imaging Science and Technology.
We evaluate a dozen prevailing graph-sampling techniques with an ultimate goal to better visualize and understand big and complex graphs that exhibit different properties and structures. The evaluation uses eight benchmark datasets with four different graph types collected from Stanford Network Analysis Platform and NetworkX to give a comprehensive comparison of various types of graphs. The study provides a practical guideline for visualizing big graphs of different sizes and structures. The paper discusses results and important observations from the study.