### Single Chip Auto-Valet Parking System with TDA4VMID SoC

Mihir Mody, Kedar Chitnis, Hemant Hariyani<sup>\*</sup>, Shyam Jagannathan, Jason Jones<sup>+</sup>, Gregory Shurtz<sup>+</sup>, Abhishek Shankar<sup>+</sup>, Ankur, Mayank Mangla<sup>\*</sup>, Sriramakrishnan Govindarajan, Aish Dubey<sup>\*</sup> and Kai Chirca<sup>\*</sup> Embedded Processor Business, Texas Instruments Bangalore- India, <sup>\*</sup>Dallas & <sup>\*</sup>Houston- USA

#### Abstract

Auto-Valet parking is a key emerging function for Advanced Driver Assistance Systems (ADAS) enhancing traditional surround view system providing more autonomy during parking scenario. Auto-Valet parking system is typically built using multiple HW components e.g. ISP, micro-controllers, FPGAs, GPU, Ethernet/PCIe switch etc. Texas Instrument's new Jacinto7 platform is one of industry's highest integrated SoC replacing these components with a single TDA4VMID chip. The TDA4VMID SoC can concurrently do analytics (traditional computer vision as well as deep learning) and sophisticated 3D surround view, making it a cost effective and power optimized solution. TDA4VMID is a truly heterogeneous architecture and it can be programmed using an efficient and easy to use OpenVX based middle-ware framework to realize distribution of software components across cores. This paper explains typical functions for analytics and 3D surround view in auto-valet parking system with data-flow and its mapping to multiple cores of TDA4VMID SoC. Auto-valet parking system can be realized on TDA4VMID SOC with complete processing offloaded of host ARM to the rest of SoC cores, providing ample headroom for customers for future proofing as well as ability to add customer specific differentiation ...

#### I. Introduction

In order to reduce the number of collisions, increase reliability and accelerate reaction time compared to human drivers, ADAS systems perform analytics on the captured scene to alert drivers regarding obstacles in the driving/parking path and surrounding. Surround view (SV) is a key application of ADAS and Autonomous Driving systems [1][2]. The emerging trend is combining surround view system with analytics to enable vehicles to move around to find parking spots and following path planning maneuvers to put the vehicle in the identified parking slot.

Typically, these systems are developed using multiple components as shown in figure 1. Automotive grade cameras enabled with discrete Image Signal Processors (ISP) are mounted on the body of the vehicle. The data captured from them are fed into an application processor. Application processor typically does 3D surround view with inbuilt GPU providing environment view to driver. In case of Auto-Valet parking, it needs additional computationally complex analytics consisting of deep learning and traditional computer vision algorithms across all surround view captured images. This is typically achieved by one or more analytics companion chips (e.g. FPGA doing optical flow, GPU doing Deep Learning). To improve robustness of algorithm, ultrasound sensors or radar/lidar sensors are also connected to safety microcontroller doing fusion. The display output is shown on the dashboard of the vehicle and analytics output is fed into control systems of the vehicle to do parking maneuvers. Usage of such multiple discrete components along with their DDR, power solution and their interconnections significantly increases the power and cost of system.



Figure 1. Typical Auto Valet Parking System

#### II. TDA4VMID SoC

Texas Instrument's new Jacinto7 platform [3] introduces industry's highest level of integration of multiple components targeting ADAS, Cockpit and Gateway market. TDA4x series is a customization of Jacinto7 Platform targeting multiple ADAS markets like front camera, surround view, auto-valet parking, highway driving and driver monitoring. The TDA4VMID SoC [4] is an optimal solution for low and mid end of ADAS segment offering highly integrated solution to target Auto-Valet Parking and Highway Driving systems as shown in figure 2.



Figure 2. Jacinto7 Platform - TDA4VMid SoC



Figure 3. Jacinto7 Platform - TDA4VMid SoC

It contains latest dual cores of ARM Cortex A72 as host or application CPU, Imagination 8XE series GPU, two cores of c66x DSP, one core of new generation DSP (C7x) with neural network acceleration via Matrix Multiply Accelerator (MMA), Integrated Imaging signal pipe (VPAC), hardware engines for optical flow & stereo disparity (DMPAC), video codec and display Sub-system (DSS). It has integrated Ethernet switch, PCIe backplane and safety microcontroller using (Cortex R5F based on MCU island). It has an additional 4 MCU (2xDual core Cortex R5F) to help in fusion and offloading low latency tasks for host ARM cortex A72. From automotive functional safety standpoint [5], the MCU island supports highest functional safety i.e. ASIL-D level, while rest of chip supports ASIL-B (except GPU and video codec). With the level of integration introduced by TDA4VMID, typical Auto-valet parking system as described in figure 1 are transformed into significantly more cost-effective and cohesive solution as shown in figure 3.

#### **III. Use-case Data Flow**

The analytics for camera perception consists of passing captured image to image signal processor, followed by lens distortion correction and pyramid generation in Vision Preprocessing acceleration (VPAC) as shown in figure 4. Later, it is passed to Dense optical flow engine (DOF) and optionally Stereo disparity engine (SDE) as part of Depth and Motion processing accelerator (DMPAC). Finally, it gets processed on DSP cores (C6x and C7x) using neural network inference on Matrix Multiply Acceleration (MMA). The multiple algorithms namely motion segmentation, depth estimation; multiple object detection (pedestrian, cyclist, vehicle etc.), semantic segmentation, parking spot detection and visual localization are performed on DSP cores with MMA. The algorithm can use traditional computer vision



Figure 4. Analytics Data Flow



Figure 5. Adaptive Surround View Data Flow



Figure 6. Software Architecture and stack

analytics and/or deep learning, both of which are supported by the flexible TDA4 SoC architecture.

The figure 5 show typical 3D surround view data flow on TDA4VMID SoC. This consists of applying photometric and geometric correction on captured images across cameras. The photometric correction is performed using image signal processing pipe (VPAC). The geometric correction and stitching of multiple images to generate final image for display is performed by GPU (IMG-8XE). Additionally, car model gets updated as per viewing angle requirements and blended with stitched output from cameras. Finally, GPU output is scaled as per display resolution with hardware display pipeline in display sub-system (DSS) before sending it to the head unit controlling dashboard display.

#### III. SW architecture and OpenVX middleware

Figure 6 shows the distribution of SW across different CPUs to realize the data flow shown in figure 3. As seen, most the HW accelerator functions are allocated to R5F and DSPs, leaving A72 free for high level application control.

OpenVX [6] is used as a middleware, which allows SW on different cores like DSP, R5F, accelerators to appear as "nodes" on host application CPU (A72). This allows the host to create an OpenVX data flow graph using these disparate nodes and realize a coherent system application for surround view and analytics. OpenVX extension[7] for pipelined processing allows data flow to execute such that all HW accelerators are simultaneously active. This allows the system to meet real-time performance. Low overhead inter-CPU communication, allows HW accelerators and DSPs to talk directly to each other without intervention of the host A72. This results in extreme low overhead (<1% due to middleware) and low latency operation.

| Khronos<br>Conformance Test                                                                                                         | TI Extension<br>Conformance Test | Examples / Use-case<br>Processor SDK Auto |                     |
|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|-------------------------------------------|---------------------|
| OpenVX API                                                                                                                          |                                  | TIOVX API                                 |                     |
| TIOVX Framework    Context  Node  Target  Obj Desc  Image  Array    Graph  Kernel  Target Kernel  Scalar  Pyramid                   |                                  |                                           |                     |
| TIOVX Platform                                                                                                                      | Queue<br>IPC                     | Mutex Ta<br>Platform                      | ask Event<br>Memory |
| VPAC NF      VPAC MSC      DMPAC DOF      VXLIB      Target kernels        VPAC LDC      VPAC VISS      DMPAC SDE      User kernels |                                  |                                           |                     |
| TI Khronos Customer                                                                                                                 |                                  |                                           |                     |

Figure 7. OpenVx Middleware

#### **IV. Results**

The figure 8 shows analytics output namely semantics segmentation, vehicle detection and parking spot detection across 3 cameras (left, right and back). Figure 9 shows 3D surround view stitched from 4 cameras and rendered from free view selected by user enabling understanding of external environment. Figure 10, shows the CPU and HW accelerator loading for the output shown in figure 8 and figure 9. As seen in figure 10, much of the heavy lifting of pixel processing and analytics is done by the underlying accelerators and DSPs like C7x/MMA, VPAC and GPU, leaving ARM A72 and R5F DMIPS free for customers to plug-in their custom algorithms for sensor fusion, localization, and path-planning.



Figure 10. OpenVx Middleware



Figure 8. AVP – Multi Camera Analytics Output



Figure 9. 3D Surround View Output

#### V. Conclusion

The paper presents the fully integrated TDA4VMID SoC that can concurrently cater to 3D Surround view and analytics for autovalet parking systems. The high level of integration is achieved by integration of external safety microcontroller (using MCU island), ISP (using VPAC), FPGA (using DMPAC), additional GPU (MMA), Ethernet switch, PCIE backplane, additional MCU (2xDual Cortex R5F). The amount of integration, mix of automotive safety levels, multiple heterogeneous high-performance processor cores in TDA4VMID along with efficient and flexible SW architecture using OpenVX enables cost optimized solution for Autovalet parking for mass market deployment.

#### References

- Buyue Zhang et. al, "A Surround View Camera Solution for Embedded Systems," IEEE Computer Vision and Pattern Recognition Workshops (CVPRW), July 2014
- [2] Mihir Mody, Piyali Goswami, Rajat Sagar\*, Gregory Shurtz et.al, "Fully-Integrated Surround Vision and Mirror Replacement SoC for ADAS/ Automated Driving:, HotChips, 2017
- [3] Texas Instruments Jacinto Platform URL, http://www.ti.com/processors/automotive-processors/featuredplatform.html
- [4] R. Venkatasubramanian, D. Steiss, G. Shurtz, T. Anderson et. al, "A 16nm 3.5B Transistor >14TOPS 2-to-10W Multicore SoC Platform for Automotive and Embedded Applications with Integrated Safety MCU, 512b Vector VLIW DSP, Embedded Vision and Imaging Acceleration", IEEE International Solid-State Circuits Conference (ISSCC), Feb 2020
- [5] K. Chitnis et. al, "Enabling Functional Safety ASIL Compliance for Autonomous Driving Software Systems", Electronic Imaging and System – Feb 2017

- [6] Kedar Chitnis, Jesse Villarreal, Lucas Weaver, Brijesh Jadav, Mihir Mody, et. al., "System Data Flow Pipelining for Embedded Heterogenous SoCs using OpenVX", IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), 2020
- [7] Kedar Chitnis, Jesse Villarreal, Brijesh Jadav, Mihir Mody, "Novel OpenVX Implementation for Heterogeneous Multi-Core Systems", IEEE Consumer Electronics Conference, (ICCE 2017)

#### **Author Biography**

Mihir Mody is Senior principal architect (DMTS) in embedded processor business, responsible for defining IP & SoC for Automotive and Industrial market in Texas Instrument (TI). He received his master's in electrical engineering from Indian Institute of Science (IISc) in 2000. He is working in TI from 2001 to architect an area efficient and low power solutions for image processing, computer vision, video coding, deep learning and control systems.

Kedar Chitnis is currently the Principal Software Architect for the Processors business unit in TI. He has 19+ years of experience in embedded systems primarily in the domain of image and video processing, systems and software for automotive, industrial and consumer markets. He also contributes to open standards as part of Khronos OpenVX working group. He completed his B.E, Computer Engineering, from University of Mumbai and is currently pursuing M. Tech, Software Systems, from BITS Pilani

Hemant Hariyani is architect responsible defining graphics architecture for Automotive and Industrial SoC for Texas Instruments. He is working in Texas Instruments for past 19 years and delivered overall graphics solution including software across multiple product lines.

Shyam Jagannathan is a Senior Technical Lead and Member Group of Technical Staff at Embedded Processors Group at Texas Instruments. He received a masters degree from IIT Chicago in the field of Signal Processing and Communications and has been with Texas Instruments since 2005. His areas of interest involve embedded DSP architecture, SoC architecture, hardware accelerators, deep learning, computer vision and optimizations.

Jason Jones graduated from Texas A&M University in 1993 and has been at TI for 27 years where he is currently a Distinguished Member of Technical Staff. He worked in TI's DSP/SOC design group for 15 years in the areas of RTL, DFT, and physical design, with numerous innovations and patents in the areas of low-power design, clocking, and multi-core/heterogeneous architectures to name a few. In the last 12 years he has been lead system architect for the Automotive Processor organization, and has been responsible for the definition of last 3 generations of TI's ADAS and Infotainment SOCs.

Gregory Shurtz is Principal Systems Architect and Distinguished Member of Technical Staff in the automotive business at Texas Instruments Incorporated, leading SoC architecture for Automotive ADAS and Gateway solutions. With 25 years industry experience, former positions include Logic Design Engineer for Lockheed Martin supporting the US space program, and Hardware Design Engineer for Symbios Logic supporting mass storage system design. He graduated summa cum laude with a BSEE degree from Wichita State University (1995).

Abhishek Shankar is a Computer Engineer from IIT Roorkee and has been in TI for the last 18 years. He has avid interest in Computer Architecture and Algorithms and also enjoys listening to music and spending time with family. Currently, he is on the team that architects the latest Automotive SOCs with special focus on SOC bus fabric and performance. Ankur completed his B. Tech in Computer Engineering from Netaji Subhas Institute of Technology (2012). He has worked on wireless connectivity (Wi-Fi) and working in ADAS processors team with a focus on Vision HWA at Texas Instruments India.

Mayank Mangla is an experienced imaging engineer, leading camera R&D for Texas Instruments Embedded Processors product line. He has been passionate about image science from days when digital cameras had just arrived on the horizon. Over the years he has been instrumental in 100s of millions of cameras reaching the consumers. His current research interests include application of imaging and computer vision in ADAS and Robotics.

Sriramakrishnan received B.E(Hons) in Computer Science from BITS, Pilani (2002). Since then he has been with Texas Instruments India. He has a rich experience with system software design and development for embedded systems including automotive. His key focus areas include low latency, high-performance compute, multi-chip heterogeneous architecture and virtualization technology.

Aish Dubey is systems architect for ADAS. His interests include perception algorithms, energy efficient architecture and resilient systems design.

Kai Chirca received her MSEE in Electrical Engineering from Columbia University. She joined Texas Instruments Inc. in 2004, Her work has focused on the high performance low power DSP processor, memory and cache controller circuit, deep learning accelerator design and computer architecture. She is a Distinguished Member of Technical Staff in TI.

## JOIN US AT THE NEXT EI!

# IS&T International Symposium on Electronic Imaging SCIENCE AND TECHNOLOGY

## Imaging across applications . . . Where industry and academia meet!







- SHORT COURSES EXHIBITS DEMONSTRATION SESSION PLENARY TALKS •
- INTERACTIVE PAPER SESSION SPECIAL EVENTS TECHNICAL SESSIONS •



www.electronicimaging.org