Automated extraction of intersection topologies from aerial and street-level images is relevant for Smart City traffic-control and safety applications. The intersection topology is expressed in the amount of approach lanes, the crossing (conflict) area, and the availability of painted striping for guidance and road delineation. Segmentation of road surface and other basic information can be obtained with 80% score or higher, but the segmentation and modeling of intersections is much more complex, due to multiple lanes in various directions and occlusion of the painted stripings. This paper addresses this complicated problem by proposing a dualistic channel model featuring direct segmentation and involving domain knowledge. These channels are developing specific features such as drive lines and lane information based on painted striping, which are filtered and then fused to determine an intersection-topology model. The algorithms and models are evaluated with two datasets, a large mixture of highway and urban intersections and a smaller dataset with intersections only. Experiments with measuring the GEO metric show that the proposed late-fusion system increases the recall score with 47 percentage points. This recall gain is consistent for using either aerial imagery or a mixture of aerial and street-level orthographic image data. The obtained recall for intersections is much lower than for highway data because of the complexity, occlusions by trees and the small amount of annotated intersections. Future work should aim at consolidating this model improvement at a higher recall level with more annotated data on intersections.
We introduce an efficient distributed sequence parallel approach for training transformer-based deep learning image segmentation models. The neural network models are comprised of a combination of a Vision Transformer encoder with a convolutional decoder to provide image segmentation mappings. The utility of the distributed sequence parallel approach is especially useful in cases where the tokenized embedding representation of image data are too large to fit into standard computing hardware memory. To demonstrate the performance and characteristics of our models trained in sequence parallel fashion compared to standard models, we evaluate our approach using a 3D MRI brain tumor segmentation dataset. We show that training with a sequence parallel approach can match standard sequential model training in terms of convergence. Furthermore, we show that our sequence parallel approach has the capability to support training of models that would not be possible on standard computing resources.
Starch plays a pivotal role in human society, serving as a vital component of our food sources and finding widespread applications in various industries. Microscopic imaging offers a straightforward, efficient, and precise approach to examine the distribution, morphology, and dimensions of starch granules. Quantitative analysis through the segmentation of starch granules from the background aids researchers in exploring their physicochemical properties. This article presents a novel approach utilizing a modified U-Net model in deep learning to achieve the segmentation of starch granule microscope images with remarkable accuracy. The method yields impressive results, with mean values for several evaluation metrics including JS, Dice, Accuracy, Precision, Sensitivity and Specificityreaching 89.67%, 94.55%, 99.40%, 94.89%, 94.23% and 99.70%, respectively.
This paper examines two new methodological approaches exploring Reflectance Transformation Imaging (RTI) data processing for detecting, documenting, and tracking surface changes. The first approach is unsupervised and applies per-pixel calculations on the raw image stack to extract information related to specific surface attributes (angular reflectance, micro-geometry). The second method proposes a supervised segmentation approach that, based on machine learning algorithms, uses coefficients of a fitting model to separate the surface’s characteristics and assign them to a class. Both methodologies were applied to monitor coating failure, in the form of filiform corrosion, on low carbon steel test samples, mimicking treated historical metal objects’ surfaces. The results demonstrate the feasibility of creating accurate cartographies that depict the surface characteristics and their location. Additionally, they provide a qualitative evaluation of corrosion progression that allows tracking and monitoring changes on challenging surfaces.
Infrastructure maintenance of complex environments like railroads is a very expensive operation. Recent advances in mobile mapping systems to collect 3D point cloud data and in deep learning for detection and segmentation can prove to be very helpful in automating this maintenance and allowing preventive maintenance at certain locations before big failures occur. Some fully-supervised methods have been developed for understanding dynamic railroad environments. These methods often fail to generalize to infrastructure changes or new classes in low-labeled data. To address this issue, we propose a railroad segmentation method that leverages few-shot learning by generating class prototypes for the most relevant infrastructure classes. This method takes advantage of existing embedding networks for point clouds, taking the geometrical and spatial context into account for feature representation of complex connected classes. We evaluate our method on real-world data measured on Belgian railway tracks. Our model achieves promising results on connected classes, exposed to only a few annotated samples at test time.
We introduce a physics guided data-driven method for image-based multi-material decomposition for dual-energy computed tomography (CT) scans. The method is demonstrated for CT scans of virtual human phantoms containing more than two types of tissues. The method is a physics-driven supervised learning technique. We take advantage of the mass attenuation coefficient of dense materials compared to that of muscle tissues to perform a preliminary extraction of the dense material from the images using unsupervised methods. We then perform supervised deep learning on the images processed by the extracted dense material to obtain the final multi-material tissue map. The method is demonstrated on simulated breast models with calcifications as the dense material placed amongst the muscle tissues. The physics-guided machine learning method accurately decomposes the various tissues from input images, achieving a normalized root-mean-squared error of 2.75%.
In the field of automated working machines, not only is the general trend towards automation in industry, transport and logistics reflected, but new areas of application and markets are also constantly emerging. In this paper we present a pipeline for terrain classification in offroad environments and in the field of "automated maintenance of slopes", which offers potential for solving numerous socio-economic needs. Working tasks can be made more efficient, more ergonomic and, in particular, much safer, because mature, automated vehicles are used. At present, however, such tasks can only be carried out remotely or semi-automatically, under the supervision of a trained specialist. This only partially facilitates the work. The real benefit only comes when the supervising person is released from this task and is able to pursue other work. In addition to the development of a safe integrated system and sensor concept for use in public spaces as a basic prerequisite for vehicles licensed in the future, increased situational awareness of mobile systems through machine learning in order to increase their efficiency and flexibility is also of great importance.
In one of our previous paper [1] proposed in the last year, we described the color management pipeline that applied to our nail inkjet printer. However, the resulting prints are not as vivid as we would like to have since those prints are not well saturated. In this paper, we propose a saturation enhancement method based on the image segmentation and hue angle. This method will not necessarily give us the closest representation of the colors within the input image but could give us more saturated prints. The main idea that we perform our saturation enhancement method is to keep the lightness and hue constant, while stretching the chroma component.
We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.
State departments of transportation often maintain extensive “video logs” of their roadways that include signs, lane markings, as well as non-image-based information such as grade, curvature, etc. In this work we use the Roadway Information Database (RID), developed for the Second Strategic Highway Research Program, as a surrogate for a video log to design and test algorithms to detect rumble strips in the roadway images. Rumble strips are grooved patterns at the lane extremities designed to produce an audible queue to drivers who are in danger of lane departure. The RID contains 6,203,576 images of roads in six locations across the United States with extensive ground truth information and measurements, but the rumble strip measurements (length and spacing) were not recorded. We use an image correction process along with automated feature extraction and convolutional neural networks to detect rumble strip locations and measure their length and pitch. Based on independent measurements, we estimate our true positive rate to be 93% and false positive rate to be 10% with errors in length and spacing on the order of 0.09 meters RMS and 0.04 meters RMS. Our results illustrate the feasibility of this approach to add value to video logs after initial capture as well as identify potential methods for autonomous navigation.