The HPE Cognitive Computing Toolkit (CCT) is an opensource modeling platform backed by Hewlett Packard Enterprise. CCT provides a domain-specific language designed for problems like vision modeling and deep learning. The CCT platform compiles programs written in this language to native graphics processor (GPU) code. Developing vision models in CCT is far simpler and more productive than writing GPU code directly, but without sacrificing the performance gains of GPU acceleration. This programming model scales to interesting problems like dense optic flow, anisotropic diffusion, and deep learning. CCT is particularly powerful when combining multiple state-of-the-art techniques in a single algorithm.
Understanding the depth order of surfaces in the natural world is one of the most fundamental operations of the visual systems of many species. Humans reliably perceive the depth order of visually adjacent surfaces when there is relative motion between them such that one surface appears or disappears behind another. We have adapted a computational model of primate vision that fits important classical and recent psychophysical data on ordinal depth from motion in order to develop a fast, robust, and reliable algorithm for determining the depth order of regions in natural scene video. The algorithm uses dense optic flow to delineate moving surfaces and their relative depth order with respect to the parts of the static environment. The algorithm categorizes surfaces according to whether they are emerging, disappearing, unoccluded, or doubly occluded. We have tested this algorithm on real video where pedestrians and cars sometimes go behind and sometimes in front of trees. Because the algorithm extracts surfaces and labels their depth order, it is suitable as a low-level pre-processing step for complex surveillance applications. Our implementation of the algorithm uses the open source HPE Cognitive Computing Toolkit and can be scaled to very large video streams.
In this paper, we propose an accurate and robust video segmentation method. The main contributions are threefold: (1) multiple cues (appearance and shape) are explicitly used and adaptively combined to determine segment probability; (2) motion is implicitly used to compute the shape cue; and (3) the segment labeling is improved by utilizing geodesic graph cuts. Experimental results show the effectiveness of the proposed method. © 2016 Society for Imaging Science and Technology.
Summarization techniques can be applied to non-text data in order to perform classification and clustering of important imaging, video and other document-associated but non-text content. The advantage to this approach is that there is a multiplicity of inexpensive (even free) summarization engines, and so a robust solution can be crafted with relatively modest effort. In this paper, we present the applicability of this approach to video and imaging data, in addition to broader binary and genetic data.