This study presents a novel character-level writer verification framework for ancient manuscripts, employing a building-block approach that integrates decision strategies across multiple token levels, including characters, words, and sentences. The proposed system utilized edge-directional and hinge features along with machine learning techniques to verify the hands that wrote the Great Isaiah Scroll. A custom dataset containing over 12,000 samples of handwritten characters from the associated scribes was used for training and testing. The framework incorporated character-specific parameter tuning, resulting in 22 separate models and demonstrated that each character has distinct features that enhance system performance. Evaluation was conducted through soft voting, comparing probability scores across different token levels, and contrasting the results with majority voting. This approach provides a detailed method for multi-scribe verification, bridging computational and paleographic methods for historical manuscript studies.
An ideal archival storage system combines longevity, accessibility, low cost, high capacity, and human readability to ensure the persistence and future readability of stored data. At Archiving 2024 [B. M. Lunt, D. Kemp, M. R. Linford, and W. Chiang, “How long is long-term? An update,” Archiving (2024)], the authors’ research group presented a paper that summarized several efforts in this area, including magnetic tapes, optical disks, hard disk drives, solid-state drives, Project Silica (a Microsoft project), DNA, and projects C-PROM, Nano Libris, and Mil Chispa (the last three being the authors’ research). Each storage option offers unique advantages in each of the desirable characteristics. This paper provides information on other efforts in this area, including the work by Cerabyte, Norsam Technologies, and Group 47 DOTS, and an update on the authors’ projects C-PROM, Nano Libris, and Mil Chispa.
Predicting the perceived brightness and lightness of image elements using color appearance models is important for the design and evaluation of HDR displays. This paper presents a series of experiments to examine perceived brightness/lightness for displayed stimuli of differing sizes. The number of observers in the first pilot experiment was 7, in the second and third pilot experiments was 6, and in the main experiment was 14. The target and test stimuli in the main experiment were 10∘ and 1∘ field of view, respectively. The results indicate a small, but consistent, effect that brightness increases with stimulus size. The effect is dependent on the stimulus lightness level but not on the hue or saturation of the stimuli. A preliminary model is also introduced to enhance models such as CIECAM16 with the capability of predicting brightness and lightness as a function of stimulus size. The proposed model yields good performance in terms of perceived brightness/lightness prediction.
With the escalating demands for rendering technology, exclusive reliance on rendering programs is no longer sufficient. The collection of surface information pertaining to real-world materials has become a crucial aspect of computer graphics. The acquisition of material surface information is pivotal in the fields of digital reconstruction and virtual reality. In this article, we introduce a material acquisition device that employs multiple cameras and lights. This device utilizes a combination of multiple cameras and lights to capture objects from various angles and lighting conditions, resulting in more comprehensive and realistic material surface information. Specifically, the device features multiple cameras and strategically placed lighting. Four cameras are positioned at 10∘, 35∘, 60∘, and 85∘ to capture various aspects of the surface of the object while 24 point lights are placed in three layers of the hemisphere at 10∘, 35∘, and 60∘ with eight lights per layer spaced 45∘ apart. This approach facilitates the acquisition of rich and diverse material surface information by integrating multiple perspectives and lighting conditions.
Low-light images often fail to accurately capture color and texture, limiting their practical applications in imaging technology. The use of low-light image enhancement technology can effectively restore the color and texture information contained in the image. However, current low-light image enhancement is directly calculated from low-light to normal-light images, ignoring the basic principles of imaging, and the image enhancement effect is limited. The Retinex model splits the image into illumination components and reflection components, and uses the decomposed illumination and reflection components to achieve end-to-end enhancement of low-light images. Inspired by the Retinex theory, this study proposes a low-light image enhancement method based on multispectral reconstruction. This method first uses a multispectral reconstruction algorithm to reconstruct a metameric multispectral image of a normal-light RGB image. Then it uses a deep learning network to learn the end-to-end mapping relationship from a low-light RGB image to a normal-light multispectral image. In this way, any low-light image can be reconstructed into a normal-light multispectral image. Finally, the corresponding normal-light RGB image is calculated according to the colorimetry theory. To test the proposed method, the popular dataset for low-light image enhancement, LOw-Light (LOL) is adopted to compare the proposed method and the existing methods. During the test, a multispectral reconstruction method based on reversing the image signal processing of RGB imaging is used to reconstruct the corresponding metameric multispectral image of each normal-light RGB image in LOL. The deep learning architecture proposed by Zhang et al. with the convolutional block attention module added is used to establish the mapping relationship between the low-light RGB images and the corresponding reconstructed multispectral images. The proposed method is compared to existing methods such as self-supervised, RetinexNet, RRM, KinD, RUAS, and URetinex-Net. In the context of the LOL dataset and an illuminant chosen for rendering, the results show that the low-light image enhancement method proposed in this study is better than the existing methods.
In eye-tracking based 3D displays, system latency due to eye-tracking and 3D rendering causes an error between the actual eye position and the tracked position, which is proportional to the viewer’s movement. This discrepancy makes viewers to see 3D content from a non-optimal position, thereby increasing 3D crosstalk and degrading the quality of 3D images under dynamic viewing conditions. In this paper, we investigate the latency issue, distinguish each source of system latency and study the display margin of eye-tracking based 3D display. To reduce 3D crosstalk during viewer’s motion, we propose a motion compensation method by predicting viewer’s eye position. The effectiveness of our motion compensation method is validated by experiments using previously implemented 3D display prototype and the results show that the prediction error decreased to 24.6%, indicating that the accuracy of eye pupil position became 4 times higher, and crosstalk reduced to a level similar to that of a 1/4 latency system.
Object detection in varying traffic scenes presents significant challenges in real-world applications. Thermal image utilization is acknowledged as a beneficial approach to enhance RGB image detection, especially in suboptimal lighting conditions. However, harnessing the combined potential of RGB and thermal images remains a formidable task. We tackle this by implementing an illumination-guided adaptive information fusion technique across both data types. Thus, we propose the illumination-guided with crossmodal attention transformer fusion (ICATF), a novel object detection framework that skillfully integrates features from RGB and thermal data. Further, an illumination-guided module is developed to adapt features to current lighting conditions, steering the learning process towards the most informative data fusion. Then, we incorporate frequency domain convolutions within the network’s backbone to assimilate spectral context and derive more nuanced features. In addition, we fuse the differential modality features for multispectral pedestrian detection with illumination-guided feature weights and transformer fusion architecture. Our method achieves state-of-the-art by experimental results on multispectral detection datasets, including FLIR-aligned, LLVIP, and KAIST.
Crop diseases have always been a major threat to agricultural production, significantly reducing both yield and quality of agricultural products. Traditional methods for disease recognition suffer from high costs and low efficiency, making them inadequate for modern agricultural requirements. With the continuous development of artificial intelligence technology, utilizing deep learning for crop disease image recognition has become a research hotspot. Convolutional neural networks can automatically extract features for end-to-end learning, resulting in better recognition performance. However, they also face challenges such as high computational costs and difficulties in deployment on mobile devices. In this study, we aim to improve the recognition accuracy of models, reduce computational costs, and scale down for deployment on mobile platforms. Specifically targeting the recognition of tomato leaf diseases, we propose an innovative image recognition method based on a lightweight MCA-MobileNet and WGAN. By incorporating an improved multiscale feature fusion module and coordinate attention mechanism into MobileNetV2, we developed the lightweight MCA-MobileNet model. This model focuses more on disease spot information in tomato leaves while significantly reducing the model’s parameter count. We employ WGAN for data augmentation to address issues such as insufficient and imbalanced original sample data. Experimental results demonstrate that using the augmented dataset effectively improves the model’s recognition accuracy and enhances its robustness. Compared to traditional networks, MCA-MobileNet shows significant improvements in parameters such as accuracy, precision, recall, and F1-score. With a training parameter count of only 2.75M, it exhibits outstanding performance in recognizing tomato leaf diseases and can be widely applied in mobile or embedded devices.
To address the challenges in chip logo detection, such as the small size of the logos making them difficult to be detected accurately and the slow convergence speed of traditional models, we propose a real-time detection algorithm for small objects, called small-DETR. First, to reduce production costs and enhance efficiency, we employ a semi-automated data annotation method based on template matching instead of traditional manual annotation, generating label files for model training and testing. Subsequently, building upon the RT-DETR algorithm, we enhance the feature fusion module in cross-scale feature fusion module (CCFM) using semantics and details injection (SDI) module from U-Net v2. This improvement aims to retain detailed image information, accurately capturing edges, textures, and subtle variations within the marks. Lastly, employing FasterNet as the backbone network for the detection model, we optimize the existing network structure using partial convolution (PConv) to reduce redundant computations and improve convergence speed. Experimental results demonstrate that small-DETR model achieves satisfactory convergence in just 200 cycles, with a detection precision of 91.8% and a loss value of 6.1%. Compared to other models, small-DETR exhibits outstanding performance within shorter training periods, providing robust support for real-time chip pin mark detection in industrial contexts.