With recent advancements in display technology, the perception of objects based on their images has become a crucial aspect of the human visual experience. “Shitsukan” refers to a comprehensive perception of an object’s appearance, encompassing various attributes such as roughness, glossiness, and transparency. The accurate reproduction of these characteristics is increasingly necessary in various applications. However, the impact of pixel structures in different displays on shitsukan perception remains unclear. To achieve consistent reproduction and effective shitsukan management across displays, it is essential to clarify the impact of pixel structure on shitsukan perception. This study aimed to investigate the effect of display pixel arrangements on roughness perception. In an evaluation experiment, the effects of three sub-pixel arrays (red, green and blue [RGB], red, green, blue, and white [RGBW], and PenTile) on roughness perception using natural images were analyzed. The experimental results showed that variations in sub-pixel arrays significantly influence roughness perception under the given conditions. The average responses of all observers indicated that the PenTile array exhibited the highest perceived roughness, followed by the RGB and RGBW arrays. These findings suggest that variations in sub-pixel arrays can influence roughness perception. Moreover, a comprehensive analysis of observer responses via cluster analysis indicated that the relative influence of sub-pixel arrays on roughness perception varied among observers. It was also confirmed that differences in perceived roughness arise from image content and texture complexity. Specifically, the effect of sub-pixel arrays was more pronounced for images with complex textures and high-frequency components, while differences between arrays were less noticeable in images with simpler textures.