IS&T | Library

Text/Figure Separation in Document Images Using Docstrum Descriptor and Two-Level Clustering

33 6

DOCUMENT IMAGE ANALYSIS
CLUSTERING
KERNEL MAPS
ROI CLASSIFICATION

Valery Anisimovskiy, Ilya Kurilin, Andrey Shcherbinin, Petr Pohl

Pages 253-1 - 253-12, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-253

Volume 30

Issue 2

We propose a novel algorithm for text/figure separation tailored for binary document images containing line drawings, block diagrams, charts, schemes and other kinds of business graphics. Most of the approaches for this task rely either on clever design of visual descriptor allowing to easily distinguish text and graphics regions or on the supervised learning using dataset of labeled text/figure regions. Such approaches often provide moderate separation accuracy when applied to document images which contain very diverse set of figure classes and lack sufficiently representative labeled training dataset. In contrast, our method is well-suited for vast variety of figure classes and capable of operating either in semi-supervised mode or unsupervised mode. We achieve this by leveraging unsupervised learning algorithms applied to Docstrum descriptors extracted from regions of interest and subsequent semi-supervised label propagation or unsupervised label inference. Another advantage of our method is its suitability for large scale data processing which is achieved through efficient kernel-approximating feature mapping applied to Docstrum descriptors and two-level clustering where fast mini-batch K-means algorithm is first applied to large scale data and only small number of resulting cluster centroids is subsequently processed by one of the more sophisticated clustering algorithms.

Digital Library: EI

Published Online: January 2018

BGS: A Large-Scale Graph Visualization Tool

185 35

GRAPH VISUALIZATION
CLUSTERING
BIG DATA
GRAPH HIERARCHY

Fangyan Zhang, Song Zhang, Christopher Lightsey, Sarah Harun, Pak Chung Wong

Pages 378-1 - 378-9, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.01.VDA-378

Volume 30

Issue 1

We present BGS (Big Graph Surfer), a scalable graph visualization tool that creates hierarchical structure from original graphs and provide interactive navigation along the hierarchy by expanding or collapsing clusters when visualizing large-scale graphs. A distributed computing framework-Spark provides the backend for BGS on clustering and visualization. This architecture makes it capable of visualizing a graph bigger than 1 billion nodes or edges in real-time after preprocessing. In addition, BGS provides a series of hierarchy and graph exploration methods, such as hierarchy view, hierarchy navigation, hierarchy search, graph view, graph navigation, graph search, and other useful interactions. These functionalities facilitate the exploration of very large-scale graphs. To evaluate the effectiveness of BGS, we apply BGS to several large-scale graph datasets, and discuss its scalability, usability, and flexibility.

Digital Library: EI

Published Online: January 2018

An interactive tool for Analyzing the Correlation, Uncertainty, and Clustering (ACUC) over ensembles in climate dataset

142 10

CORRELATION
UNCERTAINTY
CLUSTERING

Najmeh Abedzadeh

Pages 5 - 11, January 2017, © Society for Imaging Science and Technology 2017

DOI

10.2352/ISSN.2470-1173.2017.1.VDA-384

Volume 29

Issue 1

Weather scientists are looking to better understand the atmospheric conditions. We propose a new tool to detect the most significant association between variables in the multidimensional multivariate time-varying climate datasets. In this case, we represent the correlation between variables, the uncertainty between different members within ensembles, and several clustering methods. 77w climate dataset is collected in different time steps and locations. One of the most important research questions for weather scientists is the relationship between various variables in different time steps, or dissimilar spatial locations. In this paper; we present a set of techniques to evaluate the correlation and association between different variables within a time step and spatial location. In another way, we perform static analysis on a single point in space-time, then extending that analysis either in the temporal or spatial dimensiorz(s), followed by an aggregation of the individual results to get an "overall" correlation. We created a tool that not only can he used to visualize the correlation and uncertainty between two time series of all ensembles, but also spatial locations. Mini-batch-K-Means clustering is applied to these datasets to identify the most substantial patterns within them. We study the Pearson correlation and integrate glyphs and color mapping into our design to demonstrate the trend of changing the correlation values of a single, pair: or triple of variables. Statistical calculations are applied to derive an accurate interpretation of the time-varying correlations between members within all of the ensembles as well as the uncertainty of the correlation values. The uncertainty visualizations provide insight toward the effects of parameter perturbation, sensitivity to initial conditions, and inconsistencies in model outputs. To evaluate the tool, we apply this technique to a climatology dataset.

Digital Library: EI

Published Online: January 2017

Visual-Interactive Semi-Supervised Labeling of Human Motion Capture Data

355 21

VISUAL ANALYTICS
MOTION CAPTURE DATA
LABELING
CLUSTERING
CLASSIFICATION
ACTIVE LEARNING

Jürgen Bernard, Eduard Dobermann, Anna Vögele, Björn Krüger, Jörn Kohlhammer, Dieter Fellner

DOI

10.2352/ISSN.2470-1173.2017.1.VDA-387

Volume 29

Issue 1

The characterization and abstraction of large multivariate time series data often poses challenges with respect to effectiveness or efficiency. Using the example of human motion capture data challenges exist in creating compact solutions that still reflect semantics and kinematics in a meaningful way. We present a visual-interactive approach for the semi-supervised labeling of human motion capture data. Users are enabled to assign labels to the data which can subsequently be used to represent the multivariate time series as sequences of motion classes. The approach combines multiple views supporting the user in the visualinteractive labeling process. Visual guidance concepts further ease the labeling process by propagating the results of supportive algorithmic models. The abstraction of motion capture data to sequences of event intervals allows overview and detail-on-demand visualizations even for large and heterogeneous data collections. The guided selection of candidate data for the extension and improvement of the labeling closes the feedback loop of the semisupervised workflow. We demonstrate the effectiveness and the efficiency of the approach in two usage scenarios, taking visualinteractive learning and human motion synthesis as examples.

Digital Library: EI

Published Online: January 2017

Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections

331 9

VISUAL ANALYTICS
TEXT ANALYSIS
CLUSTERING

Tobias Ruppert, Michael Staab, Andreas Bannach, Hendrik Lücke-Tieke, Jürgen Bernard, Arjan Kuijper, Jörn Kohlhammer

DOI

10.2352/ISSN.2470-1173.2017.1.VDA-388

Volume 29

Issue 1