This conference brings together real-world practitioners and researchers in intelligent robots and computer vision to share recent applications and developments. Topics of interest include the integration of imaging sensors supporting hardware, computers, and algorithms for intelligent robots, manufacturing inspection, characterization, and/or control. The decreased cost of computational power and vision sensors has motivated the rapid proliferation of machine vision technology in a variety of industries, including aluminum, automotive, forest products, textiles, glass, steel, metal casting, aircraft, chemicals, food, fishing, agriculture, archaeological products, medical products, artistic products, etc. Other industries, such as semiconductor and electronics manufacturing, have been employing machine vision technology for several decades. Machine vision supporting handling robots is another main topic. With respect to intelligent robotics another approach is sensor fusion – combining multi-modal sensors in audio, location, image and video data for signal processing, machine learning and computer vision, and additionally other 3D capturing devices. There is a need for accurate, fast, and robust detection of objects and their position in space. Their surface, background, and illumination are uncontrolled, and in most cases the objects of interest are within a bulk of many others. For both new and existing industrial users of machine vision, there are numerous innovative methods to improve productivity, quality, and compliance with product standards. There are several broad problem areas that have received significant attention in recent years. For example, some industries are collecting enormous amounts of image data from product monitoring systems. New and efficient methods are required to extract insight and to perform process diagnostics based on this historical record. Regarding the physical scale of the measurements, microscopy techniques are nearing resolution limits in fields such as semiconductors, biology, and other nano-scale technologies. Techniques such as resolution enhancement, model-based methods, and statistical imaging may provide the means to extend these systems beyond current capabilities. Furthermore, obtaining real-time and robust measurements in-line or at-line in harsh industrial environments is a challenge for machine vision researchers, especially when the manufacturer cannot make significant changes to their facility or process.
The number of spikes, spikelets per spike, number of spikes per square meter are essential metrics for plant breeders and researchers in predicting wheat crop yield. Evaluating the crop yield based on wheat ears counting is still done manually, which is a labor-intensive, tedious and costly task. Thus, there is a significant need to develop a real-time wheat spikes/ears counting system for plant breeders for effective and efficient crop yield predictions. This paper proposed two deep learning-based methods based on EfficientDet and Faster-RCNN to detect and count the spikes. The images are taken using high-throughput phenotyping techniques under natural field conditions, and the algorithms localize and automatically count wheat spikes/ears. Faster R-CNN with Resnet50 as backbone architecture produced an overall accuracy of 88.7% on the test images. We also used recent stateof- the-art models EfficientDet-D5 and EfficientDet-D7, having backbone architectures EfficientNet-B5 and EfficientNet- B7, respectively. A comprehensive quantitative analysis is performed on the standard performance metrics. In the analysis, the EfficientDet-D5 model produces an accuracy of 92.7% on the test images, and EfficientDet-D7 produces an accuracy of 93.6%.
Given a suitable dataset, transfer learning using deep convolutional neural networks is an effective method to develop a system to detect and classify objects. Despite having models pretrained on large general-purpose datasets, the requirement to manually label an application-specific dataset remains a limiting factor in system development. We consider this wider problem in the context of the purity analysis of canola seeds, where end users wish to distinguish species of interest from contaminants in images taken with optical microscopes. We use a Detector network, trained only to detect seeds, to help label the dataset used to train an Analyzer network, capable of both seed detection and classification. We present results, over three experiments that involve 25 contaminant species, including Primary and Secondary Noxious Weed Seeds (as per the Canadian Weed Seeds Order), to validate our incremental approach. We also compare the proposed system to competing ones in a literature review.
Multi-object tracking is an active computer vision problem that has gained consistent interest due to its wide range of applications in many areas like surveillance, autonomous driving, entertainment, and, gaming to name a few. In the age of deep learning, many computer vision tasks have benefited from the convolutions neural network. They have been optimized with rapid development, whereas multi-target tracking remains challenging. A variety of models have benefited from the representational power of deep learning to tackle this issue. This paper inspects three CNN-based models that have achieved state-of-the-art performance in addressing this problem. All three models follow a different paradigm and provide a key inside of the development of the field. We examined the models and conducted experiments on the three models using the benchmark dataset. The quantitative results from the state-of-the-art models are listed in the standard metrics and provide the basis for future research in the field.
Roadway “corners†are common for pedestrian use, whether designated with markings or not. Different types of markings have been deployed, ranging from simple parallel lines to more complex designs. Understanding the impact of different types of crosswalks is important for public safety. In this work we explore methods to improve the logging of marked crosswalk types. We used the Roadway Information Database from the Second Strategic Highway Research Project and used active learning methods with transfer learning to identify the crosswalk types (marked or unmarked). Upon completion we found our classifiers were unable to perform above roughly 94% correct classifications. To improve their efficacy, we separated the crosswalks into their “fine grained†types and used Gradient-Weighted Class Activation Mapping to isolate and study the features that classified the crosswalks. We compared this with sampled manually marked crosswalks and present findings. We believe this use case can represent a process to improve the active learning method for some visual machine learning applications.
We proposed a deep learning-based approach for pig keypoint detection. In a nutshell, we explored transfer learning to adapt a human pose estimation model for the pigs. In total, we tested three different models and eventually trained openpose on the pig data. For training, the data is annotated in COCO format. Additionally, we visualized the pixel level response of the network named PAF (part infinity field) on the test frames to highlight the model learning capabilities. The trained model shows promising results and open new a door for further research.
This paper proposes a landslide detection method by UAV-based visual analysis. The fundamental strategy is to detect ground surface elevation changes caused by landslides. Our method consists of five steps: multi-temporal image acquisition, ground surface reconstruction, georeferencing, elevation data export, and landslide detection. In order to improve efficiency, we use Visual Simultaneous Localization and Mapping for ground surface reconstruction. It can perform faster than conventional methods based on Structure-from-Motion. In addition, we introduce convolutional neural network (CNN) to detect landslides robustly in the multi-temporal elevation data. The experimental results in a simulation environment show that the proposed method runs 5.5 times as fast as the conventional methods. In addition, the CNN-based model achieved F1 score of 0.79-0.84, showing robustness against reconstruction noise and registration error.
Tunnels have been constructed in various places for transportation and lifelines. During tunnel construction, the industrial accidents have occurred due to falling rocks from the tunnel face. A large amount of falling rocks is confirmed in the precursor of tunnel collapse. Therefore, it is necessary to detect falling rocks to prevent industrial accidents and to grasp of the situation on tunnel face. As conventional methods, the inter-frame difference method and the laser measurement method were proposed. However, those methods were difficult to monitor the entire tunnel face and detected moving objects other than falling rocks. In this paper, we propose a falling rocks detection method that combines the moving object detection method on the tunnel face and the estimation method of excavation points using single color markers. It was confirmed that only falling rocks were detected during excavation experiment.