Computed tomography (CT) images provide a wealth of anatomical information crucial for diagnosing femoral fractures. However, predicting these fractures poses challenges due to postural variabilities of the femur and device-related factors. This study introduces an approach for predicting femoral fracture from CT and mask images. The approach includes several stages: annotations for masks, the scaling iterative closest point (SICP) algorithm for registration, three-dimensional (3D) affine transformation of images, image histogram matching, and a two-channel 3D convolutional neural network (3DCNN). In the proximal femoral region, SICP is applied to adjust the size and posture of the point cloud by using 3D affine transformation to ensure alignment with the target point cloud. The 3D affine transformation, generated by SICP registration, is applied to the original CT and mask images, systematically normalizing variances in the femoral postures and sizes across different subjects. Image histogram matching is used to diminish the variances in image grayscale values that originate from the scanning devices. It redistributes the pixel grayscale distributions in CT images, aligning them more closely with a reference histogram. The two-channel 3DCNN takes as input CT images (i.e., the first channel) that have undergone 3D affine transformation and image histogram matching, along with their corresponding masks (i.e., the second channel), and delivers the probability of a fracture as its output. Results show that the predictive capability of the 3DCNN-based model is notable, achieving an accuracy of 91.299%, specificity of 91.551%, sensitivity of 91.071%, and an area under the curve of 0.973. In conclusion, this approach effectively minimizes the impact of irrelevant factors on prediction, optimally utilizing image information to assess the risks of femoral fracture. Moreover, this approach enhances the accuracy and reliability of fracture prediction.
With advances in deep learning technology, the study and application of Human Action Recognition (HAR) systems in competitive sports have evolved, becoming more profound and diverse. These systems have demonstrated the potential to enhance athletes’ training and competitive performance while introducing innovation and progress into sports education and entertainment. This paper addresses the practical needs of sports training by designing a HAR system tailored to competitive sports scenarios, subsequently analyzing its recognition performance and applied models. The primary contribution of this paper lies in its exploration of HAR technology through Convolutional Neural Networks (CNNs) in the context of competitive sports. It systematically investigates and applies HAR requirements in competitive settings. Additionally, this paper evaluates the real-world performance of AlexNet and GoogleNet, constructs a CNN-based HAR system, and assesses its capabilities using publicly available datasets. These efforts provide valuable insights and technical support for the implementation of CNN-based HAR technology in competitive sports and other related fields, offering both academic and practical applications. The results indicate that different models achieve recognition accuracies of 94.45%, 95.04%, 93.01%, 93.23%, and 90.54% under five distinct decision-level fusion equations (A# ∼E#, respectively). Following fine-tuning and optimization, the recognition accuracy of AlexNet, GoogleNet, and ResNet networks significantly improved, with the model achieving a remarkable 99.94% accuracy in recognizing and analyzing the same athlete. In comparison to alternative algorithms, the designed HAR system prioritizes immediacy and interactivity while offering superior accuracy and broader application potential. It successfully fulfills its intended function, accurately recognizing human actions from video images, thereby proving invaluable for research in competitive sports.
Forest fires wreak havoc on natural ecosystems and represent a grave threat to environmental stability. Establishing a rapid and efficient network for the early detection of forest fires remains a critical challenge and a focal point of research. In response to this problem, this paper proposes Fire & Smoke - You Only Look Once (FS-YOLO) for real-time forest fire detection. FS-YOLO significantly enhances fire detection performance through the integration of three innovative modules: Mixed Attention Cross Stage Partial (MACSP), Cross Stage Feature Pyramid Network (CSFPN), and Scalable Spatial Pyramid Pooling (SSPP). First, the MACSP module targets diverse colors and shapes characteristic of forest fires. By combining channel attention with local spatial attention, it precisely weights the network’s features, achieving greater accuracy in capturing fire characteristics. Second, the CSFPN method merges high-level semantic information with low-level detail via both top-down and bottom-up pathways, creating multi-scale feature maps that boast expanded receptive fields. Lastly, the SSPP method enhances the network’s focus on fire targets across varied scenes through scaling factors, bolstering the model’s robustness. Additionally, this paper organizes and annotates a forest fire dataset. The experimental results show that compared to the baseline model, FS-YOLO achieves an 8% improvement in mean average precision, and the average precision values for flames and smoke increase by 10.1% and 5.7%, respectively, indicating a significant overall performance improvement of the model. Compared to other object detection algorithms, FS-YOLO consistently achieves optimal performance.
Magnetic induction tomography (MIT) is an emerging imaging technology holding significant promise in the field of cerebral hemorrhage monitoring. The commonly employed imaging method in MIT is time-difference imaging. However, this approach relies on magnetic field signals preceding cerebral hemorrhage, which are often challenging to obtain. Multiple bioelectrical impedance information with different frequencies is added to this study on the basis of single-frequency information, and the collected signals with different frequencies are identified to obtain the magnetic field signal generated by single-layer heterogeneous tissue. The Stacked Autoencoder (SAE) neural network algorithm is used to reconstruct the images of head multi-layer tissues. Both numerical simulation and phantom experiments are carried out. The results indicate that the relative error of the multi-frequency SAE reconstruction is only 7.82%, outperforming traditional algorithms. Moreover, under a noise level of 40 dB, the anti-interference capability of the MIT algorithm based on frequency identification and SAE is superior to traditional algorithms. This research explores a novel approach for the dynamic monitoring of cerebral hemorrhage and demonstrates the potential advantages of MIT in non-invasive monitoring.
The economy developed rapidly and the change in consumption segment poses higher requirements for the efficient circulation of industrial product transportation. Improving the speed of goods circulation and ensuring the safety and quality of finished products has become an important issue. However, relevant research on these is still lacking. To address the issues of missing barcodes, box damage, and brand error in existing industrial products within the park, this paper proposes an industrial packaging surface barcode and damage detection method based on the GooLeNet network. First, Delta Machine Vision (DMV) products capture and quickly read barcodes from six directions. Second, a corresponding training set is created through sample and data collection. The training set categorizes damage into three types; damaged holes, cracks, and indentations. Moreover, image enhancement processing, along with data expansion, is applied to the dataset. Finally, the improved GooLeNet network model is designed by combining GooLeNet architecture with regularization, aiming to facilitate feature extraction and training of images under the interference of packaging surface patterns. This design leads to a higher damage identification accuracy of 96.63%, which is 14.66%, 3.72%, 14.05%, and 12.78% higher than that of AlexNet, GoogleNet, VGG, and RestNet, respectively, in a convolutional neural network.
In response to the current challenges in the detection of solder ball defects in ball grid array (BGA) packaged chips, which include slow detection speed, low efficiency, and poor accuracy, our research has addressed these issues. We have designed an algorithm for detecting solder ball defects in BGA-packaged chips by leveraging the specific characteristics of these defects and harnessing the advantages of deep learning. Building upon the YOLOv8 network model, we have made adaptive improvements to enhance the algorithm. First, we have introduced an adaptive weighted downsampling method to boost detection accuracy and make the model more lightweight. Second, to improve the extraction of image features, we have proposed an efficient multi-scale convolution method. Finally, to enhance convergence speed and regression accuracy, we have replaced the traditional Complete Intersection over Union loss function with Minimum Points Distance Intersection over Union (MPDIoU). Through a series of controlled experiments, our enhanced model has shown significant improvements when compared to the original network. Specifically, we have achieved a 1.7% increase in mean average precision, a 1.5% boost in precision, a 0.9% increase in recall, a reduction of 4.3 M parameters, and a decrease of 0.4 G floating-point operations per second. In comparative experiments, our algorithm has demonstrated superior overall performance when compared to other networks, thereby effectively achieving the goal of solder ball defect detection.
Model-based approaches to imaging, such as specialized image enhancements in astronomy, facilitate explanations of relationships between observed inputs and computed outputs. These models may be expressed with extended matrix-vector (EMV) algebra, especially when they involve only scalars, vectors, and matrices, and with n-mode or index notations, when they involve multidimensional arrays, also called numeric tensors or, simply, tensors. Although this paper features an example, inspired by exoplanet imaging, that employs tensors to reveal (inverse) 2D fast Fourier transforms in an image enhancement model, the work is actually about the tensor algebra and software, or tensor frameworks, available for model-based imaging. The paper proposes a Ricci-notation tensor (RT) framework, comprising a dual-variant index notation, with Einstein summation convention, and codesigned object-oriented software, called the RTToolbox for MATLAB. Extensions to Ricci notation offer novel representations for entrywise, pagewise, and broadcasting operations popular in EMV frameworks for imaging. Complementing the EMV algebra computable with MATLAB, the RTToolbox demonstrates programmatic and computational efficiency via careful design of numeric tensor and dual-variant index classes. Compared to its closest competitor, also a numeric tensor framework that uses index notation, the RT framework enables superior ways to model imaging problems and, thereby, to develop solutions.
Stripe noise removal is a fundamental task in remote sensing image processing, which is of great significance in improving image quality and subsequent applications. The standard nuclear norm has been widely used to remove stripe noises, but it treats each singular value equally and affects its capability and flexibility in destriping. In this paper, we proposed a weighted low-rank spatial–spectral total variation (WLRSSTV) model by exploiting the weighted nuclear norm and global spatial–spectral total variation regularization. The split Bregman iteration is used to optimize the WLRSSTV model and to estimate the weight of the nuclear norm. Extensive experiments on both the synthetic and real remote sensing images validate that the proposed model can effectively remove the stripe noise and preserve more fine-scale details.
Steel surface defect detection in industrial quality control has always been a challenging objective detection task in the field of computer vision. However, unlike other detection problems, some surface defects on steel are relatively small compared to the entire inspection object, leading to less prominent defect features in the detection. To address these issues, we propose a YOLOv5-based steel defect detection method enhanced with multi-scale feature extraction and contextual augmentation (MSCA-YOLO). Specifically, adopting the YOLOv5 as the backbone network, we first add the C3-RFE to expand the receptive. Then, we design a neck network structure via combining multi-scale guided upsampling, which effectively enhances the model’s ability to handle multi-scale features and improves the model’s feature extraction ability for small defects. Finally, we propose a context mechanism that provides the model with a deeper context analysis capability, offering richer up-and-down information. The experiments on the NEU-DET dataset show that MSCA-YOLO achieves a mean Average Precision of 0.645 while maintaining rapid detection, especially at an Intersection over Union threshold of 0.5. It also exhibits substantial improvements in Precision compared to YOLOv5 across six defect types: Crazing (18.5% increase), Inclusion (1.2% increase), Patches (1.9% increase), Pitted_Surface (7.8% increase), Rolled-in_Scale (8.9% increase), and Scratches (6.5% increase). This achievement marks the efficiency and reliability of MSCA-YOLO in automated steel surface defect detection, providing a new solution for real-time inspection of steel surface defects.
The 3D acquisition of an indoor scene with colorimetric information is currently achieved by vision systems using structured light. The existing solutions are specific to small scenes at short distances. In the case of larger scenes where the vision system is far from the analysed objects, i.e. more than 4 m from the acquisition system, the solutions do not enable scene acquisition with a measurement error of the order of a millimetre. From existing algorithms in the literature, the method, Bi-Frequency and Gray Code Phase Shifting (B-GCPS) combining two structured light algorithms is proposed. The main idea consists in using several reference planes to minimize the measurement error. Our method uses the Gray Code + Phase Shifting (GC+PS) algorithm to assist in the acquisition of reference planes and the selection of the most relevant ones. Then, the Bi-frequency algorithm estimates the 3D coordinates of the scene with a very low measurement error, thanks to different reference planes acquired previously. With the proposed method, the acquisition of long distances scenes higher than 2 m in width and length is possible, with a very low measurement error. This method reduces the measurement error on the 3 axes (X, Y, Z) by at least 400, in the order of a millimetre.