IS&T | Library

AVM Conference Overview and Papers Program

28 3

autonomous vehicles
perception
machine learning
deep learning
sensors and processors

Pages A17-1 - A17-7, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-A17

Volume 33

Issue 17

Digital Library: EI

Published Online: January 2021

Single Chip Auto-Valet Parking System with TDA4VMID SoC

84 16

Surround View
System Integration
Jacinto Platform
Deep Learning
OpenVx
Graphics
Analytics
ADAS

Mihir Mody, Kedar Chitnis, Hemant Hariyani, Shyam Jagannathan, Jason Jones, Gregory Shurtz, Abhishek Shankar, Ankur, Mayank Mangla, Sriramakrishnan Govindarajan, Aish Dubey, Kai Chirca

Pages 113-1 - 113-6, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-113

Volume 33

Issue 17

Auto-Valet parking is a key emerging function for Advanced Driver Assistance Systems (ADAS) enhancing traditional surround view system providing more autonomy during parking scenario. Auto-Valet parking system is typically built using multiple HW components e.g. ISP, micro-controllers, FPGAs, GPU, Ethernet/PCIe switch etc. Texas Instrument’s new Jacinto7 platform is one of industry’s highest integrated SoC replacing these components with a single TDA4VMID chip. The TDA4VMID SoC can concurrently do analytics (traditional computer vision as well as deep learning) and sophisticated 3D surround view, making it a cost effective and power optimized solution. TDA4VMID is a truly heterogeneous architecture and it can be programmed using an efficient and easy to use OpenVX based middle-ware framework to realize distribution of software components across cores. This paper explains typical functions for analytics and 3D surround view in auto-valet parking system with data-flow and its mapping to multiple cores of TDA4VMID SoC. Auto-valet parking system can be realized on TDA4VMID SOC with complete processing offloaded of host ARM to the rest of SoC cores, providing ample headroom for customers for future proofing as well as ability to add customer specific differentiation.

Digital Library: EI

Published Online: January 2021

DRAM Bandwidth Optimal Perspective Transform Engine

43 3

Homography
Perspective Transform
Lens Distortion Correction
DRAM Memory Bandwidth
Geometric Correction Engine
Warping

Mihir Mody, Rajasekhar Allu, Gang Hua, Brijesh Jadav, Niraj Nandan, Ankur Ankur, Mayank Mangla

Pages 114-1 - 114-6, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-114

Volume 33

Issue 17

Perspective transform (or Homography) is commonly used algorithms in ADAS and Automated Driving System. Perspective transform is used in multiple use-cases e.g. viewpoint change, fisheye lens distortion correction, chromatic aberration correction, stereo image pair rectification, This algorithm needs high external DRAM memory bandwidth due to inherent scaling, resulting in nonaligned two dimensional memory burst accesses, resulting in large degradation in system performance and latencies. In this paper, we propose a novel perspective transform engine to reduce external memory DRAM bandwidth to alleviate this problem. The proposed solution consists of multiple regions slicing of input video frame with block size tuned for each region. The paper also gives an algorithm for finding optimal region boundaries with corresponding block size tuned for each region. The proposed solution enables average BW reduction of 67% compared to traditional implementation and achieves clock up-to 720 MHz with output pixel throughput of 1 cycle/pixel in 16nm FinFET process node.

Digital Library: EI

Published Online: January 2021

Data Collection Through Translation Network Based on End-to-End Deep Learning for Autonomous Driving

64 5

Autonomous Vehicles
End-to-End
Deep Learning
Data collection
Translation Network

Zelin Zhang, Jun Ohya

Pages 115-1 - 115-7, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-115

Volume 33

Issue 17

To avoid manual collections of a huge amount of labeled image data needed for training autonomous driving models, this paperproposes a novel automatic method for collecting image data with annotation for autonomous driving through a translation network that can transform the simulation CG images to real-world images. The translation network is designed in an end-to-end structure that contains two encoder-decoder networks. The forepart of the translation network is designed to represent the structure of the original simulation CG image with a semantic segmentation. Then the rear part of the network translates the segmentation to a realworld image by applying cGAN. After the training, the translation network can learn a mapping from simulation CG pixels to the realworld image pixels. To confirm the validity of the proposed system, we conducted three experiments under different learning policies by evaluating the MSE of the steering angle and vehicle speed. The first experiment demonstrates that the L1+cGAN performs best above all loss functions in the translation network. As a result of the second experiment conducted under different learning policies, it turns out that the ResNet architecture works best. The third experiment demonstrates that the model trained with the real-world images generated by the translation network can still work great in the real world. All the experimental results demonstrate the validity of our proposed method.

Digital Library: EI

Published Online: January 2021

GG-Net: Gaze Guided Network for Self-driving Cars

132 17

Eye Tracking
Imitation Learning
Autonomous Driving
STN
Multitask Learning

M. Abdelkarim, M.K. Abbas, Alaa Osama, Dalia Anwar, Mostafa Azzam, M. Abdelalim, H. Mostafa, Samah El-Tantawy, Ibrahim Sobh

Pages 171-1 - 171-8, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-171

Volume 33

Issue 17

Imitation learning is used massively in autonomous driving for training networks to predict steering commands from frames using annotated data collected by an expert driver. Believing that the frames taken from a front-facing camera are completely mimicking the driver’s eyes raises the question of how eyes and the complex human vision system attention mechanisms perceive the scene. This paper proposes the idea of incorporating eye gaze information with the frames into an end-to-end deep neural network in the lane-following task. The proposed novel architecture, GG-Net, is composed of a spatial transformer network (STN), and a multitask network to predict steering angle as well as the gaze map for the input frame. The experimental results of this architecture show a great improvement in steering angle prediction accuracy of 36% over the baseline with inference time of 0.015 seconds per frame (66 fps) using NVIDIA K80 GPU enabling the proposed model to operate in real-time. We argue that incorporating gaze maps enhances the model generalization capability to the unseen environments. Additionally, a novel course-steering angle conversion algorithm with a complementing mathematical proof is proposed.

Digital Library: EI

Published Online: January 2021

Quantitative study of vehicle-pedestrian interactions: Towards pedestrian-adapted lighting communication functions for autonomous vehicles

77 9

Autonomous vehicles
Lighting communication function
Vehicle-Pedestrian interaction
Autonomous Vehicle-to-Pedestrian communication
Human factors
Pedestrian behaviors

Guoqin Zang, Shéhérazade Azouigui, Sébastien Saudrais, Olivier Peyricot, Mathieu Hebert

Pages 172-1 - 172-8, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-172

Volume 33

Issue 17

This paper reports the main conclusions of a fielding observation of vehicle-pedestrian interactions at urban crosswalks, by describing the types, sequences, spatial distributions and probabilities of occurrence of the vehicle and pedestrian behaviors. This study was motivated by the fact that in a near future, with the introduction of autonomous vehicles (AVs), human drivers will become mere passengers, no longer being able to participate into the traffic interactions. With the purpose of recreating the necessary interactions, there is a strong need of new communication abilities for AVs to express their status and intentions, especially to pedestrians who constitute the most vulnerable road users. As pedestrians highly rely on the actual behavioral mechanism to interact with vehicles, it looks preferable to take into account this mechanism in the design of new communication functions. In this study, through more than one hundred of video-recorded vehicle-pedestrian interaction scenes at urban crosswalks, eight scenarios were classified with respect to the different behavioral sequences. Based on the measured position of pedestrians relative to the vehicle at the time of the significant behaviors, quantitative analysis shows that distinct patterns exist for the pedestrian gaze behavior and the vehicle slowing down behavior as a function of Vehicle-to-Pedestrian (V2P) distance and angle.

Digital Library: EI

Published Online: January 2021

End-to-End Imaging System Optimization for Computer Vision in Driving Automation

99 16

Imaging System Optimization
End-to-End Optimization
Driving Automation
Joint Camera and CV Optimization

Korbinian Weikl, Damien Schroeder, Daniel Blau, Zhenyi Liu, Walter Stechele

Pages 173-1 - 173-7, January 2021, © Society for Imaging Science and Technology 2021

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-173

Volume 33

Issue 17

Full driving automation imposes to date unmet performance requirements on camera and computer vision systems, in order to replace the visual system of a human driver in any conditions. So far, the individual components of an automotive camera hav mostly been optimized independently, or without taking into account the effect on the computer vision applications. We propose an end-to-end optimization of the imaging system in software, from generation of radiometric input data over physically based camera component models to the output of a computer vision system. Specifically, we present an optimization framework which extends the ISETCam and ISET3d toolboxes to create synthetic spectral data of high dynamic range, and which models a stateof-the-art automotive camera in more detail. It includes a stateof-the-art object detection system as benchmark application. We highlight in which way the framework approximates the physical image formation process. As a result, we provide guidelines for optimization experiments involving modification of the model parameters, and show how these apply to a first experiment on high dynamic range imaging.

Digital Library: EI

Published Online: January 2021

Boosting computer vision performance by enhancing camera ISP

158 48

Image signal processors (ISP)
Computer vision
Deep learning
Convolutional neural networks (CNN)
Object detection
Face recognition
Stereo disparity estimation

Peter van Beek, Chyuan-Tyng (Roger) Wu, Baishali Chaudhury, Thomas R. Gardos

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-174

Volume 33

Issue 17

Traditional image signal processors (ISPs) are primarily designed and optimized to improve the image quality perceived by humans. However, optimal perceptual image quality does not always translate into optimal performance for computer vision applications. In [1], Wu et al. proposed a set of methods, termed VisionISP, to enhance and optimize the ISP for computer vision purposes. The blocks in VisionISP are simple, content-aware, and trainable using existing machine learning methods. VisionISP significantly reduces the data transmission and power consumption requirements by reducing image bit-depth and resolution, while mitigating the loss of relevant information. In this paper, we show that VisionISP boosts the performance of subsequent computer vision algorithms in the context of multiple tasks, including object detection, face recognition, and stereo disparity estimation. The results demonstrate the benefits of VisionISP for a variety of computer vision applications, CNN model sizes, and benchmark datasets.

Digital Library: EI

Published Online: January 2021

Unify The View of Camera Mesh Network to a Common Coordinate System

49 1

Continuous Tracking
Object Tracking
Depth Estimation
Stereo Vision
Trajectory Prediction
Surveillance

Haney W. Williams, Steven J. Simske, Fr. Gregory Bishay

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-175

Volume 33

Issue 17

The demand for object tracking (OT) applications has been increasing for the past few decades in many areas of interest, including security, surveillance, intelligence gathering, and reconnaissance. Lately, newly-defined requirements for unmanned vehicles have enhanced the interest in OT. Advancements in machine learning, data analytics, and AI/deep learning have facilitated the improved recognition and tracking of objects of interest; however, continuous tracking is currently a problem of interest in many research projects. [1] In our past research, we proposed a system that implements the means to continuously track an object and predict its trajectory based on its previous pathway, even when the object is partially or fully concealed for a period of time. The second phase of this system proposed developing a common knowledge among a mesh of fixed cameras, akin to a real-time panorama. This paper discusses the method to coordinate the cameras' view to a common frame of reference so that the object location is known by all participants in the network.

Digital Library: EI

Published Online: January 2021

Data driven degradation of automotive sensors and effect analysis

190 36

sensor
AI
Automotive
Validation
Camera
multimodal
LiDAR
Radar

Sven Fleck, Benjamin May, Gwen Daniel, Chris Davies

DOI

10.2352/ISSN.2470-1173.2021.17.AVM-180

Volume 33

Issue 17