
The determined drive toward miniaturization in electronics has led to increasingly complex printed circuit board (PCB) designs, posing significant challenges for object detection and inspection processes. This research introduces an innovative method to improve the detection of small objects on PCBs by integrating advanced multiscale layer fusion techniques with the YOLOv8 object detection framework. Leveraging the capabilities of YOLOv8, the proposed methodology addresses the limitations imposed by low-resolution imaging systems, thereby enhancing the reliability and accuracy of small-object detection. The effectiveness of the proposed approach is assessed through experimentation and validation, showcasing its ability to detect small components and defects on PCBs. The results indicate superior performance compared to existing methods, with a mean Average Precision (mAP@0.5) of 99.30% and an inference speed of 161 frames per second (FPS). This high FPS and high accuracy facilitate real-time processing, making the model suitable for deployment in time-sensitive industrial environments.

An interior graphic design method based on the graphic generation network and the dual-branch Transformer is proposed aimed at the problems of insufficient layout rationality and difficulty in coordinating and optimizing style and function in the automation of interior graphic design, especially the low manual operation efficiency of computer-aided design (CAD) systems and the poor engineering adaptability of existing automation methods. The topological structure of functional areas and spatial connections is constructed through the graph generation network. The layout is incrementally generated by combining the improved breadth-first search algorithm, and the semantic prediction network is introduced to achieve the collaborative optimization of geometry and semantics. The dual-branch Transformer processes geometric topology and functional semantics respectively, optimizes detail design by using the cross-modal attention mechanism, and dynamically adjusts the feature fusion weights. Experiments show that this method achieves an average intersection and union ratio of 84.03% and a pixel error of 4.84% in layout generation quality, with a processing speed of 0.09 s per scheme, meeting the real-time interaction requirements of CAD tools. The generated graphic design scheme achieved a Peak Signal-to-Noise Ratio of 34.17 dB and a structural similarity of 0.91 in visual quality evaluation, showing a high degree of consistency with the professional design scheme, indicating that the generated scheme has high clarity and structural rationality. Compared with the existing methods, this method demonstrates significant advantages in terms of generation efficiency, layout rationality, and design diversity. Compared with the real label drawings, the generated results are close to the actual design requirements in terms of space utilization and layout consistency. This research, through the dual-branch collaborative modeling of geometry and semantics, has significantly enhanced the automation level of interior graphic design and the practical value of generating solutions in CAD integrated scenarios.

Knowledge graphs play a critical role in intelligent systems, but they face persistent challenges of incomplete data acquisition, noisy information, and inefficient inference under dynamic updates. To address these issues, the authors propose a graph-embedding-based framework that integrates three novel components: (1) a neighborhood-enhanced embedding module that captures richer structural semantics, (2) an inference optimization mechanism based on contextual consistency and confidence reweighting, and (3) a dynamic update strategy for efficient incremental learning. Extensive experiments on FB15k-237, WN18RR, and MedKG show clear improvements over state-of-the-art baselines. The proposed framework achieves Mean Reciprocal Rank gains of 8–15% and Hits@10 gains of 3–6%, demonstrating substantial accuracy improvements in link prediction. On dynamic update tasks, the proposed method maintains almost identical accuracy to full retraining (AUC difference < 0.2%) while achieving a 7.7-fold reduction in update time. These results verify that the proposed framework significantly enhances both the effectiveness and efficiency of knowledge graph reasoning.

Underwater images are afflicted by dynamic blur, low illumination, poor contrast, and noise interference, hampering the accuracy of underwater robot proximity detection and its application in marine development. This study introduces a solution utilizing the MIMO-UNet network. The network integrates the Atrous Spatial Pyramid Pooling module between the encoder and the decoder to augment feature extraction and contextual information retrieval. Furthermore, the addition of a channel attention module in the decoder enhances detailed feature extraction. A novel technique combines multi-scale content loss, frequency loss, and mean squared error loss to optimize network weight updates, enhance high-frequency loss information, and ensure network convergence. The effectiveness of the method is assessed using the UIEB dataset. Ablation experiments confirm the efficacy and reasoning behind each module design while performance comparisons demonstrate the algorithm’s superiority over other underwater enhancement methods.

Sparse representation is the key part of shape registration, compression, and regeneration. Most existing models generate sparse representation by detecting salient points directly from input point clouds, but they are susceptible to noise, deformations, and outliers. The authors propose a novel alternative solution that combines global distribution probabilities and local contextual features to learn semantic structural consistency and adaptively generate sparse structural representation for arbitrary 3D point clouds. First, they construct a 3D variational auto-encoder network to learn an optimal latent space aligned with multiple anisotropic Gaussian mixture models (GMMs). Then, they combine GMM parameters with contextual properties to construct enhanced point features that effectively resist noise and geometric deformations, better revealing underlying semantic structural consistency. Second, they design a weight scoring unit that computes a contribution matrix to the semantic structure and adaptively generates sparse structural points. Finally, the authors enforce semantic correspondence and structural consistency to ensure that the generated structural points have stronger discriminative ability in both feature and distribution domains. Extensive experiments on shape benchmarks have shown that the proposed network outperforms state-of-the-art methods, with lower costs and more significant performance in shape segmentation and classification.

We propose an efficient multi-scale residual network that integrates 3D face alignment with head pose estimation from an RGB image. Existing methods excel in performing each task independently but often fail to acknowledge the interdependence between them. Additionally, these approaches lack a progressive fine-tuning process for 3D face alignment, which could otherwise require excessive computational resources and memory. To address these limitations, we introduce a hierarchical network that incorporates a frontal face constraint, significantly enhancing the accuracy of both tasks. Moreover, we implement a multi-scale residual merging process that allows for multi-stage refinement without compromising the efficiency of the model. Our experimental results demonstrate the superiority of our method compared to state-of-the-art approaches.

In recent years, deep learning has achieved excellent results in several applications across various fields. However, as the scale of deep learning models increases, the training time of the models also increases dramatically. Furthermore, hyperparameters have a significant influence on model training results and selecting the model’s hyperparameters efficiently is essential. In this study, the orthogonal array of the Taguchi method is used to find the best experimental combination of hyperparameters. This research uses three hyperparameters of the you only look once-version 3 (YOLOv3) detector and five hyperparameters of data augmentation as the control factor of the Taguchi method in addition to the traditional signal-to-noise ratio (S/N ratio) analysis method with larger-the-better (LB) characteristics.
Experimental results show that the mean average precision of the blood cell count and detection dataset is 84.67%, which is better than the available results in literature. The method proposed herein can provide a fast and effective search strategy for optimizing hyperparameters in deep learning.

Innovations in computer vision have steered research towards recognizing compound facial emotions, a complex mix of basic emotions. Despite significant advancements in deep convolutional neural networks improving accuracy, their inherent limitations, such as gradient vanishing/exploding problem, lack of global contextual information, and overfitting issues, may degrade performance or cause misclassification when processing complex emotion features. This study proposes an ensemble method in which three pre-trained models, DenseNet-121, VGG-16, and ResNet-18 are concatenated instead of utilizing individual models. It is a significant layer-sharing method, and we have added dropout layers, fully connected layers, activation functions, and pooling layers to each model after removing their heads before concatenating them. This enables the model to get a chance to learn more before combining the individual learned features. The proposed model uses an early stopping mechanism to prevent it from overfitting and improve performance. The proposed ensemble method surpassed the state-of-the-art (SOTA) with 74.4% and 71.8% accuracy on RAF-DB and CFEE datasets, respectively, offering a new benchmark for real-world compound emotion recognition research.

Deep learning (DL) has advanced computer-aided diagnosis, yet the limited data available at local medical centers and privacy concerns associated with centralized AI approaches hinder collaboration. Federated learning (FL) offers a privacy-preserving solution by enabling distributed DL training across multiple medical centers without sharing raw data. This article reviews research conducted from 2016 to 2024 on the use of FL in cancer detection and diagnosis, aiming to provide an overview of the field’s development. Studies show that FL effectively addresses privacy concerns in DL training across centers. Future research should focus on tackling data heterogeneity and domain adaptation to enhance the robustness of FL in clinical settings. Improving the interpretability and privacy of FL is crucial for building trust. This review promotes FL adoption and continued research to advance cancer detection and diagnosis and improve patient outcomes.

Image compression is an essential technology in image processing as it reduces video storage, which is increasingly popular. Deep learning-based image compression has made significant progress, surpassing traditional coding and decoding approaches in specific cases. Current methods employ autoencoders, typically consisting of convolutional neural networks, to map input images to lower-dimensional latent spaces for compression. However, these approaches often overlook low-frequency information, leading to sub-optimal compression performance. To address this challenge, this study proposed a novel image compression technique, Transformer and Convolutional Dual Channel Networks (TCDCN). This method extracts both edge detail and low-frequency information, achieving a balance between high and low-frequency compression. The study also utilized a variational autoencoder architecture with parallel stacked transformer and convolutional networks to create a compact representation of the input image through end-to-end training. This content-adaptive transform captured low-frequency information dynamically, leading to improved compression efficiency. Compared to the classic JPEG method, our model showed significant improvements in Bjontegaard Delta rate up to 19.12% and 18.65% on Kodak and CLIC test datasets, respectively. These improvements also surpassed the state-of-the-art solutions by notable margins of 0.47% and 0.74%, signifying a substantial enhancement in the image compression encoding efficiency. The results underscore the effectiveness of our approach in enhancing the capabilities of existing techniques, marking a significant step forward in the field of image compression.