We propose an efficient multi-scale residual network that integrates 3D face alignment with head pose estimation from an RGB image. Existing methods excel in performing each task independently but often fail to acknowledge the interdependence between them. Additionally, these approaches lack a progressive fine-tuning process for 3D face alignment, which could otherwise require excessive computational resources and memory. To address these limitations, we introduce a hierarchical network that incorporates a frontal face constraint, significantly enhancing the accuracy of both tasks. Moreover, we implement a multi-scale residual merging process that allows for multi-stage refinement without compromising the efficiency of the model. Our experimental results demonstrate the superiority of our method compared to state-of-the-art approaches.
As pets now outnumber newborns in households, the demand for pet medical care and attention has surged. This has led to a significant burden for pet owners. To address this, our experiment utilizes image recognition technology to preliminarily assess the health condition of dogs, providing a rapid and economical health assessment method. By collaboration, we collected 2613 stool photos, which were enhanced to a total of 6079 images and analyzed using LabVIEW and the YOLOv8 segmentation model. The model performed excellently, achieving a precision of 86.805%, a recall rate of 74.672%, and an mAP50 of 83.354%. This proves its high recognition rate in determining the condition of dog stools. With the advancement of technology and the proliferation of mobile devices, the aim of this experiment is to develop an application that allows pet owners to assess their pets’ health anytime and manage it more conveniently. Additionally, the experiment aims to expand the database through cloud computing, optimize the model, and establish a global pet health interactive community. These developments not only propel innovation in the field of pet medical care but also provide practical health management tools for pet families, potentially offering substantial help to more pet owners in the future.
Multichannel methods have attracted much attention in color image denoising. These are image denoising methods that combine the low-rankness of a matrix with the nonlocal self-similarity of a natural image. The methods apply to color images with noise of different intensities in each color channel. Denoising methods based on the low-rankness of tensors, and extensions of matrices, have also attracted attention in recent years. Many tensor-based methods have been proposed as extensions of matrix-based methods and have achieved higher denoising performance than matrix-based methods. Tensor-based methods perform denoising using an approximate function of the tensor rank. However, unlike multichannel methods, tensor-based methods do not assume different noise intensities for each channel. On the other hand, the tensor nuclear norm minus Frobenius norm (TNNFN) has been proposed in the domain of traffic data completion. The TNNFN is one of the tensor rank approximation functions and is known to have high performance in traffic data completion, but it has not been applied to image restoration. In this paper, we propose MC-TNNFN as a tensor-based multichannel method. It is a TNNFN-based multichannel method that uses TNNFN to remove noise from a tensor constructed from similar patches and then estimates the original image. Experimental results using natural images show that the proposed method outperforms existing methods objectively and subjectively.
Underwater images are afflicted by dynamic blur, low illumination, poor contrast, and noise interference, hampering the accuracy of underwater robot proximity detection and its application in marine development. This study introduces a solution utilizing the MIMO-UNet network. The network integrates the Atrous Spatial Pyramid Pooling module between the encoder and the decoder to augment feature extraction and contextual information retrieval. Furthermore, the addition of a channel attention module in the decoder enhances detailed feature extraction. A novel technique combines multi-scale content loss, frequency loss, and mean squared error loss to optimize network weight updates, enhance high-frequency loss information, and ensure network convergence. The effectiveness of the method is assessed using the UIEB dataset. Ablation experiments confirm the efficacy and reasoning behind each module design while performance comparisons demonstrate the algorithm’s superiority over other underwater enhancement methods.
We propose a new convolutional neural network called Physics-guided Encoder–Decoder Network (PEDNet) designed for end-to-end single image dehazing. The network uses a reformulated atmospheric scattering model, which is embedded into the network for end-to-end learning. The overall structure is in the form of an encoder–decoder, which fully extracts and fuses contextual information from four different scales through skip connections. In addition, in view of the uneven spread of haze in the real world, we design a Res2FA module based on Res2Net, which introduces a Feature Attention block that is able to focus on important information at a finer granularity. The PEDNet is more adaptable when handling various hazy image types since it employs a physically driven dehazing model. The efficacy of every network module is demonstrated by ablation experiment results. Our suggested solution is superior to current state-of-the-art methods according to experimental results from both synthetic and real-world datasets.
Physicians attempt to detect different colonic polyps at the same time during endoscopy inspection. A deep-learning-based object detection method is proposed to aim at the problem of simultaneous detection of different colonic polyps. This study used a single-shot detector (SSD) with a Resnet50 backbone, called the SSD-Resnet50 model, to detect two types of colonic polyps, which are adenomas and hyperplastic polyps, in endoscopic images. The Taguchi method was used to optimize algorithm hyperparameter combinations for the SSD-Resnet50 model to promote the detection accuracy of colonic polyps. The SSD-Resnet50 model along with its optimized algorithm hyperparameters was employed for simultaneous detection of two types of colonic polyps. The experimental findings revealed that the SSD-Resnet50 model achieved an average mAP of 0.8933 on a test set comprising 300 × 300 × 3 images of colonic polyps. Notably, the detection accuracy attained with the SSD-Resnet50 model and its optimized algorithm hyperparameters, derived from the Taguchi method, surpassed that of the SSD-Resnet50 model and its algorithm hyperparameter combination obtained from the Matlab example. Additionally, the SSD-Resnet50 model achieved higher detection accuracy compared to the SSD-MobileNetV2, SSD-InceptionV3, SSD-Shufflenet, SSD-Squeezenet, and SSD-VGG16 models. The proposed SSD-Resnet50 model with its optimized algorithm hyperparameters had higher accuracy in detecting the adenomas and hyperplastic polyps in endoscopic images at the same time.
To solve the problem of color separation of printed images, this paper proposes a black generation algorithm that can maximize the gamut of the CMYK output color space. This proposed method considers the printing gamut space as a CMY color cube and divides it into six sub-gamut spaces using the gamut center diagonal. First, the color targets and algorithms are designed to calculate a gamut center diagonal black lookup table and two gamut boundary black lookup tables for each sub-gamut. Then, for an input color, the three corresponding lookup tables are found by determining the sub-gamut space where the input color is located, and the final CMYK amounts are determined by interpolating the corresponding color points on these lookup tables. Finally, the interpolation calculation results are further optimized using the neighborhood-search strategy. The color-difference evaluation experiment shows that the proposed algorithm can achieve a mean color difference of less than 1 CIEDE2000 standard color-difference unit when reproducing the standard test color target. The gamut reproduction evaluation experiment shows that the gamut distribution and gamut size obtained by the proposed algorithm are closer to the source gamut space than the Gray Balance algorithm. The image reproduction test experiment shows that the proposed algorithm can effectively reproduce the dark details of images and meet the image reproduction requirements of the printing industry.
Accurate segmentation and recognition of retinal vessels is a very important medical image analysis technique, which enables clinicians to precisely locate and identify vessels and other tissues in fundus images. However, there are two problems with most existing U-net-based vessel segmentation models. The first is that retinal vessels have very low contrast with the image background, resulting in the loss of much detailed information. The second is that the complex curvature patterns of capillaries result in models that cannot accurately capture the continuity and coherence of the vessels. To solve these two problems, we propose a joint Transformer–Residual network based on a multiscale attention feature (MSAF) mechanism to effectively segment retinal vessels (MATR-Net). In MATR-Net, the convolutional layer in U-net is replaced with a Residual module and a dual encoder branch composed with Transformer to effectively capture the local information and global contextual information of retinal vessels. In addition, an MSAF module is proposed in the encoder part of this paper. By combining features of different scales to obtain more detailed pixels lost due to the pooling layer, the segmentation model effectively improves the feature extraction ability for capillaries with complex curvature patterns and accurately captures the continuity of vessels. To validate the effectiveness of MATR-Net, this study conducts comprehensive experiments on the DRIVE and STARE datasets and compares it with state-of-the-art deep learning models. The results show that MATR-Net exhibits excellent segmentation performance with Dice similarity coefficient and Precision of 84.57%, 80.78%, 84.18%, and 80.99% on DRIVE and STARE, respectively.
Real-world scenes typically have a larger dynamic range than what a camera can capture. Temporally and spatially varying exposures have become widely used techniques to capture high dynamic range (HDR) images. One of the key questions is what the optimal set of exposure settings should be in order to achieve good image quality. In response to this question, this paper introduces a lightweight learning-based exposure strategy network. The proposed network is designed to optimize the exposure strategy for direct fusion of standard dynamic range (SDR) images without access to RAW-domain images. Unlike most of the direct fusion exposure strategies that primarily focus on tone optimization alone, the proposed method also incorporates the worst-case signal-to-noise ratio (SNR) in the loss function design. This ensures that the SNR remains consistently above an acceptable threshold while enabling visually pleasing tones in lower noise regions. This lightweight network achieves a significantly shorter inference time compared to other state-of-the-art methods. It is a more practical HDR enhancement technique for real-time and on-device applications. The code can be found at https://github.com/JieyuLi/exposure-bracketing-strategy.