Deep learning has enabled rapid advancements in the field of image processing. Learning based approaches have achieved stunning success over their traditional signal processing-based counterparts for a variety of applications such as object detection, semantic segmentation etc. This has resulted in the parallel development of hardware architectures capable of optimizing the inferencing of deep learning algorithms in real time. Embedded devices tend to have hard constraints on internal memory space and must rely on larger (but relatively very slow) DDR memory to store vast data generated while processing the deep learning algorithms. Thus, associated systems have to be evolved to make use of the optimized hardware balancing compute times with data operations. We propose such a generalized framework that can, given a set of compute elements and memory arrangement, devise an efficient method for processing of multidimensional data to optimize inference time of deep learning algorithms for vision applications.
A system-on-chip (SoC) platform having a dual-core microprocessor (μP) and a field-programmable gate array (FPGA), as well as interfaces for sensors and networking, is a promising architecture for edge computing applications in computer vision. In this paper, we consider a case study involving the low-cost Zynq- 7000 SoC, which is used to implement a three-stage image signal processor (ISP), for a nonlinear CMOS image sensor (CIS), and to interface the imaging system to a network. Although the highdefinition imaging system operates efficiently in hard real time, by exploiting an FPGA implementation, it sends information over the network on demand only, by exploiting a Linux-based μP implementation. In the case study, the Zynq-7000 SoC is configured in a novel way. In particular, to guarantee hard real time performance, the FPGA is always the master, communicating with the μP through interrupt service routines and direct memory access channels. Results include a validation of the overall system, using a simulated CIS, and an analysis of the system complexity. On this low-cost SoC, resources are available for significant additional complexity, to integrate a computer vision application, in future, with the nonlinear CMOS imaging system.