Stripe noise removal is a fundamental task in remote sensing image processing, which is of great significance in improving image quality and subsequent applications. The standard nuclear norm has been widely used to remove stripe noises, but it treats each singular value equally and affects its capability and flexibility in destriping. In this paper, we proposed a weighted low-rank spatial–spectral total variation (WLRSSTV) model by exploiting the weighted nuclear norm and global spatial–spectral total variation regularization. The split Bregman iteration is used to optimize the WLRSSTV model and to estimate the weight of the nuclear norm. Extensive experiments on both the synthetic and real remote sensing images validate that the proposed model can effectively remove the stripe noise and preserve more fine-scale details.
Computed tomography (CT) images provide a wealth of anatomical information crucial for diagnosing femoral fractures. However, predicting these fractures poses challenges due to postural variabilities of the femur and device-related factors. This study introduces an approach for predicting femoral fracture from CT and mask images. The approach includes several stages: annotations for masks, the scaling iterative closest point (SICP) algorithm for registration, three-dimensional (3D) affine transformation of images, image histogram matching, and a two-channel 3D convolutional neural network (3DCNN). In the proximal femoral region, SICP is applied to adjust the size and posture of the point cloud by using 3D affine transformation to ensure alignment with the target point cloud. The 3D affine transformation, generated by SICP registration, is applied to the original CT and mask images, systematically normalizing variances in the femoral postures and sizes across different subjects. Image histogram matching is used to diminish the variances in image grayscale values that originate from the scanning devices. It redistributes the pixel grayscale distributions in CT images, aligning them more closely with a reference histogram. The two-channel 3DCNN takes as input CT images (i.e., the first channel) that have undergone 3D affine transformation and image histogram matching, along with their corresponding masks (i.e., the second channel), and delivers the probability of a fracture as its output. Results show that the predictive capability of the 3DCNN-based model is notable, achieving an accuracy of 91.299%, specificity of 91.551%, sensitivity of 91.071%, and an area under the curve of 0.973. In conclusion, this approach effectively minimizes the impact of irrelevant factors on prediction, optimally utilizing image information to assess the risks of femoral fracture. Moreover, this approach enhances the accuracy and reliability of fracture prediction.
In this paper, we present a novel high-resolution projector photometric compensation method named HRPC. This method leverages a convolutional neural network architecture to compensate the projector input image before projection. The network incorporates multi-scale image feature pyramids and Laplacian pyramids to capture features at different levels. This enables scale-invariant learning of complex mappings among the projection surface image, the uncompensated projected image, and the ground truth image. Additionally, a non-local attention module and a multi-layer perceptron module are introduced into the bottleneck to enhance long-range dependency modeling and non-linear transformation abilities. Experiments on high-resolution projection datasets demonstrate HRPC’s ability to effectively compensate images with reduced color inconsistencies, illumination variations, and detail loss compared to state-of-the-art methods.
Land surface temperature resources based on satellite remote sensing technology are of great practical significance for urban planning, ecological protection, and sustainable development. However, due to the limitations of satellite observation mode and resolution, it is necessary to further improve the accuracy and integrity of retrieved land surface temperature data through data reconstruction. By using traditional data reconstruction methods such as interpolation, it is difficult to solve the problem of time dependence and spatial heterogeneity. Taking the Miyun Reservoir in Beijing as an example, based on the analysis of temporal and spatial distribution characteristics of surface temperature, this paper proposes a reconstruction method combining multi-data fusion and spatiotemporal bidirectional attention mechanism prediction. First, the time series features of temperature series near meteorological stations are extracted, and the enhanced time series features of the current region are output by the time attention function. Then, a neural network is used to extract more accurate microclimate boundary features from high-definition satellite images, and the spatial attention function is used to output the enhanced image boundary features of the current region. Finally, the enhanced features of the two channels are fused by a Long Short-Term Memory network. The results show the following: (1) the influence of microclimate effect characteristics on the prediction results is obvious, and the prediction results are more stable in the same microclimate characteristics while the prediction results in the microclimate boundary area fluctuate greatly; (2) the spatial attention mechanism has a significant inhibition effect on the outliers of different coverage features, which reflects the data fusion optimization effect of high-precision satellite images. These results indicate that the developed method maximizes the potential of obtaining high-precision continuous surface temperature resources through remote sensing image inversion, which is valuable for the study of microclimate characteristics of urban lakes and sustainable development.
The aim of this work is to transfer the model trained on magnetic resonance images of human autosomal dominant polycystic kidney disease (ADPKD) to rat and mouse ADPKD models. A dataset of 756 MRI images of ADPKD kidneys was employed to train a modified UNet3+ architecture, which incorporated residual layers, switchable normalization, and concatenated skip connections for kidney and cyst segmentation tasks. The trained model was then subjected to transfer learning (TL) using data from two commonly utilized animal PKD models: the Pkdh1pck (PCK) rat and the Pkd1RC∕RC (RC) mouse. Transfer learning achieved Dice similarity coefficients of 0.93±0.04 and 0.63±0.16 (mean±SD) for a sample combination of PCK+RC kidneys and cysts, respectively, on the test datasets of animal images. We showcased the utilization of TL in situations involving constrained source and target datasets and have achieved good accuracy in the cases of class imbalance.
Point clouds generated from 3D scans of part surfaces consist of discrete 3D points, some of which may be incorrect, or outliers. Outlier points can be caused by the scanning method, part surface attributes, and data acquisition techniques. Filtering techniques to remove these outliers from point clouds frequently require a “guess and check” method to determine proper filter parameters. This paper presents two novel approaches to automatically determine proper filter parameters using the relationships among point cloud outlier removal, principal component variance, and the average nearest neighbor distance. Two post-processing workflows were developed that reduce outlier frequency in point clouds using these relationships. These post-processing workflows were applied to point clouds with artificially generated noise and outliers along with two real-world point clouds. Analysis of the results showed both approaches effectively reducing outlier frequency when used in suitable circumstances.
In response to challenges such as large parameter count, difficulty in deployment, low accuracy, and slow speed of facial state recognition models in driver fatigue detection, the authors propose a lightweight real-time facial state recognition model called YOLOv5-fatigue based on YOLOv5n. First, a bilateral convolution is proposed, which can fully utilize the feature information in the channel. Then an innovative deep lightweight module is proposed, which reduces the number of network parameters as well as the computational effort by replacing the ordinary convolution in the neck network. Lastly, the normalization-based attention module is added to solve the problem of accuracy decline caused by lightweight models while keeping the number of parameters unchanged. In this paper, we first recognize the facial state by YOLOv5-fatigue and then use the proportion of eyes closed per unit of time and the proportion of mouth closed per unit of time to determine fatigue. In comparison experiments conducted on our self-built VIGP-fatigue dataset with other detection algorithms, our proposed method achieved an increase of 1% in AP50 compared to the baseline model YOLOv5n, reaching 92.6%. The inference time was reduced by 9% to 2.1 ms, and the parameter count decreased by 42.6% to 1.01 M.
Object detection and video single-frame detection have seen substantial advancements in recent years, particularly with deep-learning-based approaches demonstrating strong performance. However, these detectors often struggle in practical scenarios such as the analysis of video frames captured by unmanned aerial vehicles. The existing detectors usually do not perform well, especially for some objects with small area, large scale variation, dense distribution, and motion blur. To address these challenges, we propose a new feature extraction network: Attention-based Weighted Fusion Network. Our proposed method incorporates the Self-Attention Residual Block to enhance feature extraction capabilities. To accurately locate and identify objects of interest, we introduce the Mixed Attention Module, which significantly enhances object detection accuracy. Additionally, we incorporate adaptive learnable weights for each feature map to emphasize contributions from feature maps with varying resolutions during feature fusion. The performance of our method is evaluated on two datasets: PASCAL VOC and VisDrone2019. Experimental results demonstrate that our proposed method is superior to the baseline and other detectors. Our method achieves 87.1% mean average precision on the Pascal VOC 2007 test set and surpasses the baseline by 3.1% AP50. In addition, our method also exhibits lower false detection rate and missed detection rate compared with other detectors.
The utility and ubiquitousness of virtual reality make it an extremely popular tool in various scientific research areas. Owing to its ability to present naturalistic scenes in a controlled manner, virtual reality may be an effective option for conducting color science experiments and studying different aspects of color perception. However, head mounted displays have their limitations, and the investigator should choose the display device that meets the colorimetric requirements of their color science experiments. This paper presents a structured method to characterize the colorimetric profile of a head mounted display with the aid of color characterization models. By way of example, two commercially available head mounted displays (Meta Quest 2 and Meta Quest Pro) are characterized using four models (Look-up Table, Polynomial Regression, Artificial Neural Network, and Gain Gamma Offset), and the appropriateness of each of these models is investigated.