Blind and low vision (BLV) individuals face unique challenges due to a lack of objective explanations and shared artistic vocabulary. This study introduces Cultural ArtQA (C-ArtQA), a benchmark designed to assess whether current multimodal large language models (MLLMs; GPT-4V and Gemini) meet BLV needs by integrating structured visual art descriptions into auditory and tactile domains. The approach categorizes art into Visual, Multimodal Extended, and Imagery Perceptions, distributed across 19 fine-grained categories. The study employs visual question answering with 361 questions generated from a dataset of modern artworks, selected for their accessibility and cultural richness by BLV volunteers and art experts. Results indicate that GPT-4V excels in Visual and Imagery Perceptions while both models underperform in Multimodal Extended Perceptions, highlighting areas for improvement in AI’s support for BLV individuals. This study lays the foundation for developing MLLMs to meet the visual art appreciation needs of the BLV community.
Accurately predicting the remaining shelf life can effectively reduce the risk of spoilage during the storage process of agricultural products. The quality of agricultural products can be indirectly indicated by changes in environmental parameters. To better explore the intrinsic relationship between key environmental parameters during banana storage and their remaining shelf life, this paper proposes a novel causal convolution lightweight Transformer network. This model utilizes causal convolution operations to mine the temporal features of sensor data and applies positional encoding to the input signals. It employs a Transformer encoder to extract and fuse features while also utilizing a probabilistic sparse self-attention mechanism instead of the conventional self-attention mechanism. Moreover, a distillation operation is introduced, which effectively reduces the number of trainable parameters in the Transformer-based model and shortens the training time. Compared to traditional machine learning algorithms (BP, SVM) and conventional time series data mining algorithms (LSTM, RNN), the proposed prediction method achieves a mean squared error of 0.0221, a root mean squared error of 0.1486, a mean absolute error of 0.1101, and a maximum prediction error of 0.2221 days, allowing for more accurate and efficient predictions of bananas’ remaining shelf life.
In recent years, abnormal operation behaviors in logistics have imposed significant losses and poor experiences on both enterprises and customers. Identifying diverse abnormal behaviors remains a significant challenge in this field. Therefore, it is crucial to propose an objective and quantitative monitoring and evaluation method. This paper utilizes a high-precision, compact, and low-power barometric pressure sensor to detect the internal pressure of small packages for rapid identification of logistics abnormal operation behaviors. The authors introduce a recognition fusion algorithm based on variance analysis and support vector machines (SVMs). This algorithm can identify various logistics abnormal operation behaviors, including unilateral extrusion, bilateral extrusion, treading, dropping, and stepping. The SVM model is employed to deeply learn and recognize these abnormal behaviors, achieving an average recognition accuracy of 98%. The proposed method outperforms five other methods, including Naive Bayes, by 4.9%, 2.12%, 2.76%, 4.46%, and 3.22% in detection accuracy. The shortest training time in the experiment is 2.6862 s, and the fastest classification per second can reach 3700 times. The barometric pressure sensor emerges as a promising approach for identifying logistics abnormal operation behaviors, contributing significantly to improving the current logistics security environment.
In the smart manufacturing process, it is important to closely monitor manufactured parts. To solve the problem of part anomaly detection, this paper proposes a GAM–Boost anomaly detection model using a large-scale dataset (14.3 GB) from the Kaggle competition “Bosch Production Line Performance.” The model first selects the important features using the XGBoost algorithm and then captures the nonlinear relationships between the features using the generalized additive model. To capture the nonlinear relationships between features and at the same time improve the model’s ability to understand the data relationships, feature engineering techniques are applied to transform the nonlinear relationships without ignoring the linear relationship features. Finally the XGBoost model is optimized for anomaly detection using the Bayesian algorithm. The experimental results show that the model achieves lower errors on both training and test sets, the generalization performance of the model is significantly improved, it can better adapt to various data situations, and it achieves better results in terms of flexibility and prediction accuracy.
In response to the challenge of monitoring the quality of ink droplet injection in the field of digital inkjet printing, this study designs and implements a visual measurement system for ink droplets based on high-definition video image processing technology. The aim is to provide a convenient and accurate method to alert users on time to the quality of ink droplet injection in inkjets. The system can capture and analyze the image of a sprayed ink droplet by an inkjet in real time, effectively monitoring and evaluating the quality of ink droplet injection. This study uses high-definition camera equipment to capture real-time images of ink droplets sprayed by an inkjet head. By using image processing algorithms, the system can accurately extract key parameters such as the number, position, volume, and flight speed of ink droplets. Through detailed experimental verification, the algorithm and system developed by our research institute have demonstrated excellent performance in detecting ink droplet spray anomalies, achieving precise detection and evaluation of ink droplets. The ink droplet visual detection system can not only capture high-definition images of ink droplets in real time but also extract crucial information for quality evaluation, providing users with an accurate and reliable tool for evaluating the quality of ink droplets. Experimental results demonstrate that the proposed droplet visual inspection system significantly outperforms other systems, validating its effectiveness in droplet detection applications. The results of this study not only provide strong technical support for quality control of inkjet printing technology but also significantly improve traditional ink droplet detection methods through real-time monitoring and automated processing. This improves the efficiency and accuracy of inkjet printing and also greatly promotes the application of inkjet printing technology in various fields through innovative system applications, especially in high-precision printing. This in turn can significantly improve product quality and production efficiency.
With the rapid development of artificial intelligence and image processing technologies, images are susceptible to accidental or deliberate tampering attacks. This paper presents a watermark tampering detection method to address the challenges of marking and identifying image sources in specialized fields such as medicine and justice. A quaternary encoding matrix is proposed to map the geometric information of any arbitrary watermark onto the carrier image, bridging the watermark and the magic matrix that performs steganographic modifications. The specific magic matrix serves as the key to retrieving and visualizing the steganographically embedded watermark. In comparative experiments, our method shows a significant improvement over the existing method, with the peak signal-to-noise ratio of stego images being 24.58 dB higher and the Intersection over Union increasing by 20.30%. This demonstrates that our method effectively maintains the authentication and integrity of carrier images while producing high-quality images after steganography.
Embedding conspicuous digital visible watermarks on images inevitably compromises their original integrity. To address this challenge, we propose a reversible adaptive visible watermarking method that allows the watermark to be color adaptive, highly salient, and capable of traceless removal with authorization. This method relies on statistical analysis of the color information within the original image to determine the optimal stamp code for encoding the watermark pixels. Additionally, the use of the stamp code facilitates the reverse decoding of the watermark pixels, enabling traceless removal of the visible watermark. Experimental results show that the peak signal-to-noise ratio values of the carrier images reach approximately 50 after watermark removal. The normalized energy of watermark visibility is significantly improved compared to other reversible visible watermarking methods. These experiments highlight the method’s ability to realize adaptive addition and traceless removal of visible watermarks, solving the difficulties of balance between original image integrity and watermark clarity.
A manufacturing service composition optimization method based on non-cooperative game theory is proposed to address the gap in specialized service composition optimization methods tailored for guide rollers in cloud manufacturing. This research encompasses analysis of the structure and production intricacies of guide rollers as well as consideration of the interests of both the manufacturing service demand side and the cloud manufacturing platform operator. The research commences by designing an attribute index system that serves as a foundation for service composition optimization. The key players in this scenario are the manufacturing service demand side and the cloud manufacturing service platform operator, with the objective of enhancing quality of service and flexibility. Based on the attribute index system, a non-cooperative game decision model is constructed to effectively optimize the manufacturing service composition for guide rollers within the cloud manufacturing environment. Furthermore, an algorithmic enhancement strategy rooted in NSGA-II is proposed to tackle this model. This optimization strategy aims to refine the NSGA-II algorithm by optimizing the initial population and augmenting the jump operation. Experimental results show the effectiveness and superiority of this approach and the enhancement strategy in optimizing the service composition specific to guide roller manufacturing.
With the wide application of information and communication technology, the smart library, which is mainly characterized by the provision of digital content and smart services, has attained a new stage. The digital literacy of users has become a key factor influencing the reading effect of digital content and the implementation effect of smart services in the smart library. This study recruits college students who use smart libraries as the research object. It uses the literature research method, Delphi method, and analytic hierarchy process; draws on existing authoritative digital literacy frameworks and evaluation methods; and combines two rounds of expert interviews to establish a digital literacy evaluation system for college students using the smart library. It also carries out empirical analysis by a questionnaire survey. The results show that the proposed digital literacy evaluation system can evaluate accurately the digital literacy of college students in smart libraries. Furthermore, it can provide useful reference for the improvement of digital literacy of these students and for the construction of a smart library.