1.
Introduction
Humans evaluate the quality and appearance of objects based on visual information [
1]. “Shitsukan” refers to the surface condition of an object and plays a crucial role in this evaluation. The term shitsukan is a Japanese word that encompasses the overall sensation derived from physical stimuli, including higher-order psychophysical elements related to the object’s appearance such as glossiness, transparency, and roughness perception. In recent years, owing to growing recognition of the importance of the accurately reproducing and conveying the texture of objects, research on shitsukan has been actively pursued in various fields, including information engineering, psychophysics, and neuroscience [
2]. Researchers have utilized evaluation experiments and tactile feedback to assess not only the visual appearance of objects but also higher-order psychophysical elements such as glossiness and roughness.
Roughness perception, a key component of shitsukan, is directly linked to an object’s surface condition, is influenced by both visual and tactile sensations. Bergmann Tiest et al. [
3] demonstrated the interaction between tactile and visual sensations in roughness perception and revealed that physical roughness does not always correlate with perceived roughness, highlighting a clear discrepancy between physical and perceived roughness. In manufacturing, surface roughness is strictly defined by International Standards Organization (ISO) standards [
4], and various methodologies have been proposed for measuring surface roughness from images. For example, Dhanasekar et al. [
5] employed a Bayesian estimation approach using image processing techniques to propose a non-contact surface roughness evaluation method. Thus, surface roughness perception and its measurement have become significant issues in the manufacturing sector.
Roughness perception is a process in which the visual system processes information about fine surface irregularities and texture and integrates it into a perceptual impression. Fleming [
6] stated that roughness perception is not a direct evaluation of the physical properties of a surface but is formed by a statistical generative model. The visual system integrates multiple visual cues such as the spatial frequency characteristics of the surface and light reflectance patterns to construct perceptual properties based on physical stimuli. In addition, viewing conditions and contexts significantly influence roughness perception. For instance, lighting conditions and changes in viewpoint can strongly affect subjective evaluations of roughness. Furthermore, the visual system incorporates higher-order cognitive processes, such as prior experience and expectations, when integrating these cues. Consequently, even identical physical stimuli can elicit different roughness impressions among observers.
Recent developments in display technology have also increased the demand for reproducing shitsukan in images shown on displays. Although red, green, and blue (RGB) sub-pixel arrays have traditionally been the standard; new arrays such as red, green, blue, and white (RGBW) and PenTile have been introduced to improve luminous efficiency and display quality. Sub-pixel arrays in display technologies refer to the array patterns of sub-pixels that compose a pixel and play a critical role in determining the resolution and display quality of a screen. Each pixel typically consists of three sub-pixels: red (R), green (G), and blue (B). However, differences in the array patterns can have a significant impact on display performance and power efficiency. The RGBW array adds a white (W) sub-pixel to the conventional RGB array, which adds luminance. However, a disadvantage of this design is the potential degradation in resolution and color accuracy owing to the added white sub-pixel [
7]. By contrast, the PenTile array features an asymmetrical sub-pixel layout that is optimized for human visual characteristics. This design increases the number of green sub-pixels, allowing for a reduced pixel count while maintaining visual resolution and reducing manufacturing costs [
8]. These sub-pixel arrays provide diverse options tailored to optimize energy efficiency, manufacturing cost, and visual quality. Although their exact influence remains unclear, these varying pixel structures are hypothesized to impact not only perceptual resolution but also shitsukan perception. In a previous study, we examined the effects of different sub-pixel arrays on perceptual resolution, specifically comparing RGB, RGBW, and PenTile RGBG arrays. Results showed that perceptual resolution varied depending on the sub-pixel array used [
9]. Additionally, our experiments on the influence of pixel aperture ratio confirmed that perceptual resolution also depends on this ratio [
10]. While it is evident that differences in pixel structures, such as sub-pixel arrays and pixel aperture ratios, affect perceptual resolution, advancements in display technology have made it easier to perceive texture in images. This not only necessitates further investigation into perceptual resolution but also its impact on shitsukan perception.
Previous studies on roughness perception have explored the interaction between visual and tactile mechanisms. Roughness perception refers to the sensation of non-uniformity on an object’s surface, perceived both visually and tactilely. Physical surface irregularities, ranging from several hundred microns to millimeters, often cause roughness perception. Bergmann Tiest et al. [
11] demonstrated the strong influence of visual information on tactile roughness perception, indicating that vision can dominate tactile perception. Previous studies have focused on shitsukan perception, including roughness, glossiness, and transparency, using both real objects and images, to analyze the relationship between physical quantities and shitsukan perception through psychophysical experiments [
12–
14]. For instance, Tanaka et al. [
12] investigated texture perception using 34 materials (e.g., stone, paper, and glass), and showed that shitsukan perception, including roughness, can significantly differ when comparing real materials with their imaged counterparts. In particular, resolution discrepancies were found to influence shitsukan perception.
Additionally, experiments were conducted to explore how different sub-pixel arrays affect the perception of glossiness and transparency in displayed images [
15]. This study investigated the effects of RGB, RGBW, and PenTile RGBG sub-pixel arrays on shitsukan perception using three image stimuli with identical image signals and luminance. The results showed that the RGB array was rated highest for glossiness, followed by PenTile and RGBW. RGB was also rated highest for transparency, followed by RGBW and PenTile. These findings suggest that the display’s sub-pixel array is a significant factor influencing shitsukan perception. However, the relationship between display characteristics and roughness perception has not yet been fully explored.
Given these findings, we aim to elucidate the impact of different sub-pixel arrays on the roughness perception of images displayed on screens. Through evaluation experiments, we investigate the effects of RGB, RGBW, and PenTile RGBG sub-pixel arrays on roughness perception using three types of natural images of different materials. The experimental results are analyzed by calculating effect sizes, clustering observers, and examining image features to understand individual tendencies and the influence of image characteristics.
3.
Results and Discussions
This section presents the findings and the analysis of the impact of each sub-pixel array on roughness perception, along with modulation transfer function (MTF) analysis. Additionally, the influence of observer response tendencies and image characteristics on the results is examined. The experiment involved 11 observers, all with normal color vision and binocular acuity equivalent to 20∕20.
3.1
MTF Calculation
Modulation transfer function (MTF) represents the magnitude of response to sinusoidal waves of different spatial frequencies. Previous studies [
9,
10] have employed MTF to analyze and discuss perceptual resolution, as it provides objective and quantitative spatial frequency characteristics of displays. In this study, the MTF [
17] was calculated to account for discrepancies in physical conditions such as the use of a sub-pixel array. We anticipate that MTF will prove to be an effective tool for conducting quantitative analysis of how differences in spatial frequency characteristics resulting from the use of a sub-pixel array affect roughness perception. It can be reasonably deduced that an increase in MTF will result in a more pronounced image display, which, in turn, will lead to a heightened perception of roughness. This study focused solely on the vertical projection as no discernible difference in MTF was observed across the three sub-pixel arrays in the horizontal projection. For illustration, the line spread function LSF(
x) of the luminance profile of the lateral projection of the RGB sub-pixel array is as follows:
In this context, “rect” represents the rectangular function.The variables
LR,
LG, and
LB represent the luminance of the red, green, and blue sub-pixels, respectively.
LRGB is defined as
LRGB =
LR +
LG +
LB. The MTF
ξ was calculated by performing a Fourier transform on the obtained LSF
x and normalizing it to a value of 1 at
ξ = 0, and the resulting MTF
ξ is as follows:
Figure
3 illustrates the MTF values for each sub-pixel array, which were calculated using the same procedure. Notably, these values indicate that the RGB sub-pixel array exhibited superior performance compared to both the PenTile and RGBW sub-pixel arrays. At the Nyquist frequency of 0.5 cycles/pixel, the MTF differences between the RGB, PenTile, and RGBW sub-pixel arrays were approximately 21.6%, 10.0%, and 11.6%, respectively. In this context, the MTF difference refers to the magnitude of the difference in MTF values for each sub-pixel array, expressed as a percentage.
3.2
Response Rate and Significant Differences
Figure
4 illustrates the average response rate of the 11 respondents who reported a strong sense of roughness at two viewing distances equivalent to 20 and 30 cpd. The ’average response rate of the 11 observers’ for each stimulus pair was calculated by averaging the response rates based on 16 evaluations performed by each observer. Specifically, for the RGB-RGBW pair, if an observer selected RGB nine times and RGBW seven times, the response rates were 9/16 (56.25%) for RGB and 7/16 (43.75%) for RGBW. The same calculation was performed for all observers and the average response rate was determined based on 176 evaluations (11 observers × 16 evaluations per observer) for each pair. The percentage of each comparison pair is shown to one decimal place, with the larger number rounded down to ensure that the total is 100%. To determine whether significant differences in roughness perception existed among the stimulus pairs within the sub-pixel arrays, statistical
p-values were calculated, as shown in Table
III. Further analyses were conducted using standard deviations, effect sizes, and effect size indices. Specifically, Cohen’s d was used to calculate the effect size, and the effect size index was determined by selecting the closest match based on the guidelines of Sawilowsky and Cohen [
18,
19]. When the effect size was substantial, it was classified as “huge effect size,” indicating a significant difference in shitsukan perception between the stimulus pairs. This method allowed us to evaluate the effect size without being influenced by the sample size.
Table III.
Effect sizes between stimulus pairs for all responses.
| Distance |
---|
| | | | | | |
---|
| 20 cpd | 30 cpd |
---|
| |
---|
| | | | | | |
---|
Stimulus pair | RGB – RGBW | RGBW – PenTile | PenTile – RGB | RGB – RGBW | RGBW – PenTile | PenTile – RGB |
---|
p-value | 0.1344 | 0.1905 | 0.2442 | 0.0187 | 0.0419 | 0.3590 |
Std. dev. | 7.71 | 11.19 | 9.45 | 1.50 | 3.71 | 5.55 |
Effect size | 2.82 | 2.25 | 1.88 | 8.32 | 5.47 | 1.36 |
| Huge | Huge | Huge | Huge | Huge | Very large |
The p-values, which indicate whether there is a significant difference between the average response rates, showed statistically significant differences for the 30 cpd RGB-RGBW and RGBW-PenTile stimulus pairs, lending support to the idea that observers perceive different levels of roughness between these pairs. At 20 cpd, although no significant differences were observed in the p-values, the effect size—independent of sample size—indicated that all stimulus pairs exhibited a “huge” effect size. Thus, while the p-values at 20 cpd did not indicate statistical significance, the effect size classification for all of the pairs remained “huge.”
Additionally, the relative shitsukan relationships between different sub-pixel arrays were analyzed in terms of effect size. The results indicated that, regardless of viewing distance, the order PenTile > RGB > RGBW consistently elicited a stronger sense of roughness. These findings suggest that sub-
pixel array configurations can influence perceived roughness. However, it is important to note that the results do not always align with theoretical predictions based on MTF. In theory, a higher MTF should result in greater sharpness and perceptual resolution, thereby enhancing roughness perception. However, despite the RGB array having a higher MTF than PenTile, observers perceived PenTile as rougher. This discrepancy suggests that MTF alone does not fully explain roughness perception, indicating that roughness is a complex phenomenon influenced not only by spatial frequency characteristics but also by pixel arrangement and subjective evaluation. Moreover, individual differences in shitsukan perception were observed among the observers, likely due to their unique visual experiences and varied responses to the stimuli.
Summaries of individual observer results are as follows: Observer 1 consistently perceived PenTile as rougher than RGBW in the RGBW-PenTile stimulus pair, regardless of the viewing distance. Observer 1 also perceived PenTile as rougher than RGB in the PenTile-RGB stimulus pair, with a significant effect size. These results suggest that Observer 1 consistently perceived PenTile as rougher compared to other arrays. In contrast, Observer 6 had distinct results, perceiving RGBW as rougher than PenTile in the RGBW-PenTile stimulus pair irrespective of the viewing distance. Additionally, in the PenTile-RGB pair, Observer 6 perceived RGB as rougher (effect size: “huge” at both 20 and 30 cpd, and “very large” at 30 cpd). These discrepancies among observers support the hypothesis that some observers may perceive RGB and RGBW as rougher than PenTile. This variation in responses indicates individual differences in the assessment of shitsukan relationships among sub- pixel arrays, suggesting that the overall average results may not fully capture these nuances.
In summary, the results indicate that differences in sub-pixel arrays influence roughness perception, although the effect varies among individuals. This highlights the importance of considering observer biases when evaluating the impact of sub-pixel array variations on shitsukan. A detailed classification of observers and an analysis of each natural image are provided in the following section.
3.3
Cluster Analysis
The results of the evaluation experiment were used to classify observers through cluster analysis, thereby accounting for the response tendencies of each observer. Given that individual observers may exhibit varying tendencies, we grouped them based on their response data to elucidate these discrepancies. The analysis was conducted using hierarchical clustering (Ward’s method), using 18 dimensions of response rate data, derived from 16 trials conducted for each image pair. For example, for an RGB-RGBW pair, the proportions of RGB selections (9∕16 = 56.25%) and RGBW selections (7∕16 = 43.75%) were recorded. This method yielded two response rate data points for each pair. With three sub-pixel array combinations (pairs), six data points were generated per natural image type, and with three natural image types, 18 dimensions of the response rate data were constructed for each observer. Based on these data, a clustering analysis was performed to classify observers’ response tendencies. Figure
5 shows the resulting dendrogram for the observer clusters. As illustrated, the dendrogram for 20 cpd in (a) is divided into two clusters, while the 30 cpd data in (b) is partitioned into three clusters. The color-coded groups are labeled as RO1_20 cpd and RO2_20 cpd for 20 cpd, and RO1_30 cpd, RO2_30 cpd, and RO3_30 cpd for 30 cpd. Figure
6 and Table
IV present the average response rate for each cluster, along with the statistical significance indicated by the effect size.
Table IV.
Effect sizes between stimulus pairs for different observer clusters.
As shown by the
p-values in Fig.
6 and Table
IV, no statistically significant difference was found at 20 cpd. However, the effect size results—independent of the number of samples— demonstrate a “huge” effect size for all pairs except the RGB-RGBW pair in RO1_20 cpd. This suggests that there are notable differences in sensitivity to roughness among the sub-pixel arrays. Specifically, in RO1_20 cpd, the RGB and RGBW arrays exhibited nearly equivalent roughness, while the PenTile array had a significantly lower roughness perception. In other words, for this group of observers, the roughness of the PenTile array was perceived as clearly distinct from the other arrays. However, the RO2_20 cpd group exhibited the opposite pattern in relative roughness perception, with PenTile perceived as the roughest, followed by RGB, and then RGBW. This finding supports the hypothesis that some observers perceive the roughness of the PenTile array more intensely, while others perceive the opposite.
Next, we consider the results at a viewing distance of 30 cpd. As shown in Fig.
6 and Table
IV, the RO2_30 cpd RGB-RGBW and RGBW-PenTile pairs showed significantly higher scores (
p < 0.05), indicating statistically significant differences between these pairs. When considering effect size—independent of sample size—the effect size for these pairs is substantial, highlighting that observers strongly perceived a difference in roughness between the sub-pixel arrays. Additionally, a cluster-by-cluster analysis of the relationships in roughness perception between stimulus pairs revealed that PenTile > RGB > RGBW for RO1_30 cpd and RO2_30 cpd. However, RO3_30 cpd showed a different result, with RGBW > RGB > PenTile. This suggests that, even at a viewing distance of 30 cpd, observers exhibited different response tendencies. Notably, all clusters demonstrated a “huge” effect size for the RGBW-PenTile stimulus pair, indicating a particularly pronounced difference in roughness perception between these arrays. All observers shared similar attributes, and were male graduate students in their 2 0s specializing in information engineering. Therefore, the influence of age, gender, or expertise on roughness perception was negligible. Additionally, the changes in cluster composition between 20 and 30 cpd were likely due to shifts in the visual features emphasized by the observers because of differences in spatial frequency.
In general, the PenTile > RGB > RGBW relationship was confirmed for the overall average response rate, regardless of viewing distance. However, the clustering results indicate that some observers exhibited the lowest and highest roughness perception for the PenTile and RGBW arrays, respectively. The RGB array consistently ranked in the middle, showing no significant superiority or inferiority in roughness perception, regardless of viewing distance or clustering.
3.4
Image Features
In addition to differences in sub-pixel array, we investigated the image features that influence observers’ response tendencies by calculating the features of each natural image. The image features utilized were contrast (a measure of local variation in luminance) and energy (a measure of texture repetition) of the Gray Level Co-occurrence Matrix (GLCM) [
20], which is considered an effective method for analyzing the texture of objects in images. Additionally, kurtosis and skewness were calculated from the luminance histogram of the image. As kurtosis indicates the sharpness or flatness of peaks and skewness indicates the asymmetry of the distribution, these aspects are widely used to effectively capture texture features [
21].
Three types of features were utilized in the frequency domain. The first is the mean frequency, defined as the mean value of the amplitude spectrum. This feature is considered useful for identifying the central frequency and understanding the dominant spectral components, as well as revealing the central tendency of the frequency components. The second is the energy distribution of the image, represented by spectral entropy, which can capture changes in the energy distribution, major frequencies and directions, and texture of an image to evaluate image quality [
22]. Slope (the slope of the amplitude spectrum from low to high frequencies) is also considered effective for representing image texture and structural information [
23]. Figure
7 shows the calculated image features. In Fig.
7, the kurtosis values for orange and wood are 4.85 × 10
−4 and − 1.75 × 10
−4, respectively.
Analyzing the image features separately revealed that the orange image had moderate GLCM_Contrast and a slightly elevated mean frequency compared to the wood image. This suggests that the texture pattern of the orange image is relatively regular, which may contribute to an impression of visual smoothness due to the lack of random noise. The low spectral entropy in the frequency domain of the orange image also supports the presence of regular patterns that potentially reduce the perceived roughness. Given the relatively minor effect of sub- pixel array differences on this image, it can be inferred that regular patterns may have less influence on perceived roughness.
The wood image exhibited a higher GLCM_Energy value than the orange image, indicating a repetitive texture pattern. The lowest histogram kurtosis value suggests that the luminance distribution is relatively broad, implying minimal randomness. Additionally, the wood image had the lowest mean frequency and spectral slope, indicating a predominance of low-frequency components and an overall smooth structure. In images with strong low-frequency components, the effect of sub-pixel array differences on perceived roughness is expected to be minimal.
Conversely, the wool image exhibited high GLCM_Contrast and GLCM_Energy values, indicating a complex and dynamic texture that likely contributes to a stronger perception of roughness. Compared to the other images, the wool image also had significantly higher mean frequency, spectral entropy, and spectral slope values, highlighting a substantial presence of high-frequency content. These high-frequency components and random patterns are likely to be the main factors contributing to the strong roughness perception.
Fig.
6 illustrates that seven of the 11 stimulus pairs that were rated with a “huge” effect size in each cluster showed the most significant differences in response rates for the wool image at both the 20 and 30 cpd viewing distances. This suggests that differences in sub-pixel arrays significantly impact roughness perception, particularly in images with complex, high-energy textures and high-frequency components. The complexity and frequency of the image content appear to influence perceived roughness.
These findings demonstrate that roughness perception varies considerably depending on the sub-pixel array, even when the image stimulus remains constant. However, this discrepancy cannot be attributed to the superiority of MTF alone. Specifically, perceived roughness is contingent on both the observer’s response tendencies and the image content, with the sub-pixel array effect being most pronounced in complex images with high-frequency components. The results suggest that MTF alone is insufficient to explain human perception of roughness as it is significantly influenced by image structure and texture.
Therefore, to achieve effective shitsukan management across various displays, it is crucial to account for perceptual characteristics that cannot be explained solely by MTF (e.g., individual observer tendencies and image content) in addition to methods that focus on MTF [
24]. When developing shitsukan management techniques, it is essential to consider human visual characteristics, observer biases, and the specific content of the images.
4.
Conclusions
This study experimentally investigated the effects of different sub-pixel arrays on roughness perception that have not been fully elucidated. Specifically, three types of sub- pixel arrays (RGB, RGBW, and PenTile RGBG) were used to display natural images, and the differences in perceived roughness between the sub-pixel arrays were evaluated through evaluation experiments. The findings confirmed that there is a discernible discrepancy in the perceived roughness of the various sub-pixel arrays. The PenTile array elicited a stronger sense of roughness compared to the RGB array, while the RGB array demonstrated a more pronounced roughness perception than the RGBW array. However, this trend did not align with the expected superiority–subordination relationship based on MTF. Theoretically, a higher MTF should correspond to higher perceptual resolution and, consequently, a stronger perception of roughness. However, this was not consistently reflected in the actual perceptual evaluations, suggesting that additional factors beyond MTF play a role in roughness perception.
Effect sizes were calculated, and cluster analysis was conducted to examine the influence of observer response tendencies and image characteristics on the impact of sub-pixel arrays. The cluster analysis revealed that some observers’ responses diverged from the overall average. Specifically, some exhibited the lowest roughness response for the PenTile array, while others perceived the RGBW array as rougher. These results indicate that inter-observer variability significantly influences roughness perception, highlighting the need to account for such differences in display design.
Furthermore, an analysis of the images’ features demonstrated that the impact of sub-pixel arrays on roughness perception depends on the intrinsic characteristics of the image, particularly the complexity and frequency components of its texture. The perceptual differences between sub-pixel arrays were especially pronounced in images with complex textures and high- frequency content. For example, complex textures (e.g., wool) elicited the greatest perceived roughness, whereas simpler textures (e.g., orange and wood) showed less variation in roughness perception depending on the sub-pixel array. These findings suggest that the influence of sub-pixel arrays on roughness perception is contingent on not only physical resolution and MTF but also on the content of the image and the individual observer’s evaluation tendencies. However, the limited variety of natural images used suggests that the conclusions require further verification. Future research should test this hypothesis using a broader dataset with textures of varying complexities and high-frequency components.
Additionally, the observed effects can be explained by the quantitative characteristics of textures and perceived magnitude of roughness. Therefore, conducting cross-image comparisons of roughness perception (e.g., orange to wool) may provide valuable insights. Although statistical validation was performed using data from 11 participants, future studies should include a larger number of observers and more diverse set of image stimuli to increase the generalizability of the findings. Also, in this study, we adopted a method that simulated different sub-pixel arrays on a single monitor. The observation distances were adjusted to visually perceive multiple pixels as sub-pixel elements. However, the use of real displays with specific sub-pixel arrays can yield different results owing to the unique optical and display characteristics of each device. Furthermore,displays that differ only in their sub-pixel arrays are not readily available, which presents significant technical challenges for such experiments. Considering these issues, further validation using real displays should be addressed in future studies.
In summary, a number of additional factors beyond roughness perception are involved in the perception of shitsukan. As display technology advances, there is a growing need for technology that can effectively reproduce these shitsukan characteristics. To gain further insights into the impact of the hardware structure of these displays on shitsukan perception, it would be beneficial to examine the effect of other shitsukan.