As applications of drone proliferate, it has become increasingly important to equip drones with automatic sense and avoid (SAA) algorithms to address safety and liability concerns. Sense and avoid algorithms can be based upon either active or passive sensing methods. Each of them has advantages when compared to the other but neither is sufficient by itself. Therefore, especially for application such as autonomous navigation where failure could be catastrophic, deploying both passive and active sensors simultaneously and utilizing inputs from them become critical to detect and avoid objects in a reliable way. As part of the solution, in this paper, we present an efficient SAA algorithm based on input from multiple stereo cameras, which can be implemented on a low-cost and low-power embedded processor. In this algorithm, we construct an instantaneous 3D occupancy grid (OG) map at each time instance using the disparity information from the stereo cameras. Then, we filter noise using spacial information, and further filter noise using a probabilistic approach based on temporal information. Using this OG Map, we detect threats to the drone in order to determine the best trajectory for it to reach a destination.
Object detection using aerial drone imagery has received a great deal of attention in recent years. While visible light images are adequate for detecting objects in most scenarios, thermal cameras can extend the capabilities of object detection to night-time or occluded objects. As such, RGB and Infrared (IR) fusion methods for object detection are useful and important. One of the biggest challenges in applying deep learning methods to RGB/IR object detection is the lack of available training data for drone IR imagery, especially at night. In this paper, we develop several strategies for creating synthetic IR images using the AIRSim simulation engine and CycleGAN. Furthermore, we utilize an illumination-aware fusion framework to fuse RGB and IR images for object detection on the ground. We characterize and test our methods for both simulated and actual data. Our solution is implemented on an NVIDIA Jetson Xavier running on an actual drone, requiring about 28 milliseconds of processing per RGB/IR image pair.
A drone-projector equipped with a beam projector mounted on a drone has been investigated in order to develop a projector which can overcome restriction of place on which an image is projected. For the stability, the drone-projector requires its mass to be centered, and the additional weights related to projector should be within the payload of the drone. In addition to this requirement, the drone-projector should be designed to minimize the distortion of image caused by 3D translations or rotations of a drone during its hovering due to vibration of propellers, or global positioning system (GPS) errors. In this paper, we consider rotation of a droneprojector which makes the projected image tilted, keystoned, and shifted. To overcome this problem, we propose a software-based stabilization method which pre-corrects the image to be projected based on flight information. Our experimental results show that the distortion of the projected image due to rotations of the proposed drone-projector is attenuated by applying our stabilization method.
In this work, we present a computer vision and machine learning backed autonomous drone surveillance system, in order to protect critical locations. The system is composed of a wide angle, high resolution daylight camera and a relatively narrow angle thermal camera mounted on a rotating turret. The wide angle daylight camera allows the detection of flying intruders, as small as 20 pixels with a very low false alarm rate. The primary detection is based on YOLO convolutional neural network (CNN) rather than conventional background subtraction algorithms due its low false alarm rate performance. At the same time, the tracked flying objects are tracked by the rotating turret and classified by the narrow angle, zoomed thermal camera, where classification algorithm is also based on CNNs. The training of the algorithms is performed by artificial and augmented datasets due to scarcity of infrared videos of drones.
In the last years, the ductility and easiness of usage of unmanned aerial vehicles (UAV) and their affordable cost have increased the drones use by industry and private users. However, drones carry the potential of many illegal activities from smuggling illicit material, unauthorized reconnaissance and surveillance of targets and individuals, to electronic and kinetic attacks in the worse threatening scenarios. As a consequence, it has become important to develop effective and affordable coun- termeasures to report of a drone flying over critical areas. In this context, our research chooses different short term parametrization in time and frequency domain of environmental audio data to develop a machine learning based UAV warning system which employs the support vector machines to understand and recognize the drone audio fingerprint. Preliminary experimental results have shown the effectiveness of the proposed approach.
The rapid development of Unmanned Aerial Vehicle (UAV) technology, -also known as drones- has raised concerns on the safety of critical locations such as governmental buildings, nuclear stations, crowded places etc. Computer vision based approach for detecting these threats seems as a viable solution due to various advantages. We envision an autonomous drone detection and tracking system for the protection of strategic locations. It has been reported numerous times that, one of the main challenges for aerial object recognition with computer vision is discriminating birds from the targets. In this work, we have used 2-dimensional scale, rotation and translation invariant Generic Fourier Descriptor (GFD) features and classified targets as a drone or bird by a neural network. For the training of this system, a large dataset composed of birds and drones is gathered from open sources. We have achieved up to 85.3% overall correct classification rate.
A reliable method to estimate population sizes of wild turkeys (Meleagris gallopavo) using unmanned aerial vehicles and thermal video imaging data collected at several field sites in Texas is described. Automating the data processing of airborne survey videos provides a fast and reproducible way to count wild turkeys for wildlife management and conservation. A deep learning semantic segmentation pipeline is developed to detect and count roosting Rio-Grande wild turkeys (M.g. intermedia) which appear as small faint objects in drone-based thermal IR videos. The proposed approach to detect roosting turkeys that appear as small objects, relies on Mask R-CNN, a deep architecture semantic segmentation algorithm. This is followed by a post-processing data association and filtering (DAF) process for counting the number of roosting birds. DAF was used to eliminate false positives like rocks and other small bright objects, which often have noisy detections across temporally adjacent video frames, that can be filtered using appearance association and distance-based gating across time. Transfer learning was used to train the Mask R-CNN network by initializing using ImageNet weights. Drone-based thermal IR videos are extremely challenging due to the complexity of the natural environment including weather effects, occlusion of birds, terrain, trees, complex tree shapes, rocks, water and thermal inversion. The transect videos were collected at night at several times and altitudes to optimize data collection opportunities without disturbing the roosting turkeys. Preliminary performance evaluation using 280 video frames is promising.
Coral reef ecosystems are some of the diverse and valuable ecosystems on earth. They support more species per unit area than any other marine environment and are essential to the sustenance of life in our oceans. However, due to climate change, only under 46% of the worlds coral were considered healthy as of 2008. One of the biggest challenges with regard to coral conservation is that reef mapping is currently carried out manually, with a group a divers manually moving and placing a large PVC quadrat for every unit area of the reef and then photographing and analyzing each unit separately. Hence, there is a pressing need to improve the methodology of imaging, stitching and analyzing coral reef maps in order to make it feasible to protect them and sustain life in our oceans. To improve the current methodology, a reef-mapping surface drone robot which photographs, stitches and analyzes the reef autonomously was built. This robot updates the physical quadrat which is used today, to a projected laser quadrat, which eliminates the need to dive to the bottom of the sea and allows relative pose estimation. The robot then captures and processes the images and using 3D reconstruction and computer vision algorithms is able to map and classify the coral autonomously.
Video capture is becoming more and more widespread. The technical advances of consumer devices have led to improved video quality and to a variety of new use cases presented by social media and artificial intelligence applications. Device manufacturers and users alike need to be able to compare different cameras. These devices may be smartphones, automotive components, surveillance equipment, DSLRs, drones, action cameras, etc. While quality standards and measurement protocols exist for still images, there is still a need of measurement protocols for video quality. These need to include parts that are non-trivially adapted from photo protocols, particularly concerning the temporal aspects. This article presents a comprehensive hardware and software measurement protocol for the objective evaluation of the whole video acquisition and encoding pipeline, as well as its experimental validation.
Advancements in sensing, computing, image processing, and computer vision technologies are enabling unprecedented growth and interest in autonomous vehicles and intelligent machines, from self-driving cars to unmanned drones, to personal service robots. These new capabilities have the potential to fundamentally change the way people live, work, commute, and connect with each other, and will undoubtedly provoke entirely new applications and commercial opportunities for generations to come. The main focus of AVM is perception. This begins with sensing. While imaging continues to be an essential emphasis in all EI conferences, AVM also embraces other sensing modalities important to autonomous navigation, including radar, LiDAR, and time of flight. Realization of autonomous systems also includes purpose-built processors, e.g., ISPs, vision processors, DNN accelerators, as well core image processing and computer vision algorithms, system design and architecture, simulation, and image/video quality. AVM topics are at the intersection of these multi-disciplinary areas. AVM is the Perception Conference that bridges the imaging and vision communities, connecting the dots for the entire software and hardware stack for perception, helping people design globally optimized algorithms, processors, and systems for intelligent “eyes” for vehicles and machines.