A collection of articles on remote research in cognition and perception using the Internet for the Journal of Perceptual Imaging is presented. Four original articles cover the topics of exact versus conceptual replication of cognitive effects (e.g., mental accounting), effects of facial cues on the perception of avatars, cultural influences on perceptual image and video quality assessment, and how Internet habits influence social cognition and social cognitive research. The essentials of these articles are summarized here, and their contributions are embedded within a wider view and historical perspective on remote research in cognition and perception using the Internet.
In recent years, deep learning has achieved excellent results in several applications across various fields. However, as the scale of deep learning models increases, the training time of the models also increases dramatically. Furthermore, hyperparameters have a significant influence on model training results and selecting the model’s hyperparameters efficiently is essential. In this study, the orthogonal array of the Taguchi method is used to find the best experimental combination of hyperparameters. This research uses three hyperparameters of the you only look once-version 3 (YOLOv3) detector and five hyperparameters of data augmentation as the control factor of the Taguchi method in addition to the traditional signal-to-noise ratio (S/N ratio) analysis method with larger-the-better (LB) characteristics.
Experimental results show that the mean average precision of the blood cell count and detection dataset is 84.67%, which is better than the available results in literature. The method proposed herein can provide a fast and effective search strategy for optimizing hyperparameters in deep learning.
This article resolutely uses the concept of feature fusion to establish a deep learning model that can quickly recognize objects and complete an anti-counterfeit label recognition system. The receiver combines the training of the technology acceptance model (TAM) to evaluate the satisfaction of users in completing the anti-counterfeit label classification training. In this study, the fusion-based recognition program was employed to extract the feature sets of different categories of anti-counterfeit labels based on the operation of multilayer convolutional neural networks (CNNs) with different depth models. Using neighborhood components analysis, ten important sets of features from different CNN models were selected and reorganized parallelly into a new small-scale feature fusion dataset. By using naive Bayes and support vector machine methods, efficient classification of wine label image feature datasets after fusion was achieved. The feature fusion anti-counterfeiting label recognition system proposed in this article had a maximum recognition accuracy of 99.29% and a data reduction compression ratio of about 1/50. In addition to reducing training time, it maintained a high level of accuracy. This study established a TAM with the advantage of a feature fusion anti-counterfeit label recognition system. The model was tested on 100 consumers, and a satisfaction evaluation and validation analysis with partial least squares structural equation modeling were completed thereafter. The efficiency of the fusion-based deep learning model met the level of consumer satisfaction. This will be beneficial for educating consumers to use and enhance their willingness to promote and repurchase wine products in the future.
Physicians attempt to detect different colonic polyps at the same time during endoscopy inspection. A deep-learning-based object detection method is proposed to aim at the problem of simultaneous detection of different colonic polyps. This study used a single-shot detector (SSD) with a Resnet50 backbone, called the SSD-Resnet50 model, to detect two types of colonic polyps, which are adenomas and hyperplastic polyps, in endoscopic images. The Taguchi method was used to optimize algorithm hyperparameter combinations for the SSD-Resnet50 model to promote the detection accuracy of colonic polyps. The SSD-Resnet50 model along with its optimized algorithm hyperparameters was employed for simultaneous detection of two types of colonic polyps. The experimental findings revealed that the SSD-Resnet50 model achieved an average mAP of 0.8933 on a test set comprising 300 × 300 × 3 images of colonic polyps. Notably, the detection accuracy attained with the SSD-Resnet50 model and its optimized algorithm hyperparameters, derived from the Taguchi method, surpassed that of the SSD-Resnet50 model and its algorithm hyperparameter combination obtained from the Matlab example. Additionally, the SSD-Resnet50 model achieved higher detection accuracy compared to the SSD-MobileNetV2, SSD-InceptionV3, SSD-Shufflenet, SSD-Squeezenet, and SSD-VGG16 models. The proposed SSD-Resnet50 model with its optimized algorithm hyperparameters had higher accuracy in detecting the adenomas and hyperplastic polyps in endoscopic images at the same time.
The detection of urban appearance violation in unmanned aerial vehicle imagery faces several challenges. To address this problem, an optimized YOLOv8n-based urban appearance violation detection model is proposed. A custom dataset including four classes is created owing to the lack of a sufficient dataset. The Convolutional Block Attention Module attention mechanism is applied to improve the feature extraction ability of the model. A small target detection head is added to capture the characteristics of small targets and context information more effectively. The loss function Wise Intersection over Union is applied to improve the regression performance of the bounding box and the robustness of detection. Experimental results show that compared with the YOLOv8n model, the Precision, Recall, mAP0.5, and mAP0.5−0.95 of the optimized method increase by 3.8%, 2.1%, 3.3%, and 4.8%, respectively. Besides, an intelligent urban appearance violation detection system is developed, which generates and delivers warning messages via the WeChat official account platform.
This study applies YOLOv7-tiny object detection to inspect guava covering and count their quantity. Real-time monitoring enhances efficiency and reduces labor costs in agriculture. A custom dataset was created by collecting and labeling guava images. The YOLOv7-tiny model, trained with default parameters, achieved an initial mean Average Precision (mAP) of 66.7%. To improve accuracy, parameter adjustments, data augmentation (mosaic, mixup), and learning rate strategies (warm-up, decay) were employed, raising the mAP to 76.7%. The optimized model was transferred to mobile devices for convenient detection. This research provides an effective method for guava covering inspection and quantity counting, contributing to advancements in agricultural applications.
Deep neural networks (DNNs) have heavily relied on traditional computational units, such as CPUs and GPUs. However, this conventional approach brings significant computational burden, latency issues, and high power consumption, limiting their effectiveness. This has sparked the need for lightweight networks such as ExtremeC3Net. Meanwhile, there have been notable advancements in optical computational units, particularly with metamaterials, offering the exciting prospect of energy-efficient neural networks operating at the speed of light. Yet, the digital design of metamaterial neural networks (MNNs) faces precision, noise, and bandwidth challenges, limiting their application to intuitive tasks and low-resolution images. In this study, we proposed a large kernel lightweight segmentation model, ExtremeMETA. Based on ExtremeC3Net, our proposed model, ExtremeMETA maximized the ability of the first convolution layer by exploring a larger convolution kernel and multiple processing paths. With the large kernel convolution model, we extended the optic neural network application boundary to the segmentation task. To further lighten the computation burden of the digital processing part, a set of model compression methods was applied to improve model efficiency in the inference stage. The experimental results on three publicly available datasets demonstrated that the optimized efficient design improved segmentation performance from 92.45 to 95.97 on mIoU while reducing computational FLOPs from 461.07 MMacs to 166.03 MMacs. The large kernel lightweight model ExtremeMETA showcased the hybrid design’s ability on complex tasks.
Innovations in computer vision have steered research towards recognizing compound facial emotions, a complex mix of basic emotions. Despite significant advancements in deep convolutional neural networks improving accuracy, their inherent limitations, such as gradient vanishing/exploding problem, lack of global contextual information, and overfitting issues, may degrade performance or cause misclassification when processing complex emotion features. This study proposes an ensemble method in which three pre-trained models, DenseNet-121, VGG-16, and ResNet-18 are concatenated instead of utilizing individual models. It is a significant layer-sharing method, and we have added dropout layers, fully connected layers, activation functions, and pooling layers to each model after removing their heads before concatenating them. This enables the model to get a chance to learn more before combining the individual learned features. The proposed model uses an early stopping mechanism to prevent it from overfitting and improve performance. The proposed ensemble method surpassed the state-of-the-art (SOTA) with 74.4% and 71.8% accuracy on RAF-DB and CFEE datasets, respectively, offering a new benchmark for real-world compound emotion recognition research.
Deep learning (DL) has advanced computer-aided diagnosis, yet the limited data available at local medical centers and privacy concerns associated with centralized AI approaches hinder collaboration. Federated learning (FL) offers a privacy-preserving solution by enabling distributed DL training across multiple medical centers without sharing raw data. This article reviews research conducted from 2016 to 2024 on the use of FL in cancer detection and diagnosis, aiming to provide an overview of the field’s development. Studies show that FL effectively addresses privacy concerns in DL training across centers. Future research should focus on tackling data heterogeneity and domain adaptation to enhance the robustness of FL in clinical settings. Improving the interpretability and privacy of FL is crucial for building trust. This review promotes FL adoption and continued research to advance cancer detection and diagnosis and improve patient outcomes.