Filip and Vilímovská

An efficient computational characterization of real-world materials is one of the challenges in image understanding. An automatic assessment of materials, with similar performance as human observer, usually relies on complicated image filtering derived from models of human perception. However, these models become too complicated when a real material is observed in the form of dynamic stimuli. This study tackles the challenge from the other side. First, we collected human ratings of the most common visual attributes for videos of wood samples and analyzed their relationship to selected image statistics. In our experiments on a set of sixty wood samples, we have found that such image statistics can perform surprisingly well in the discrimination of individual samples with reasonable correlation to human ratings. We have also shown that these statistics can be also effective in the discrimination of images of the same material taken under different illumination and viewing conditions.

Digital representations of materials are widely used in various applications. However, automatically interpreting the visual properties of captured materials remains an ongoing research challenge. To achieve automatic material assessment matching human-level performance, researchers often resort to intricate image filtering based on models of human perception. In the past decade, there has been an increasing trend in employing generative machine learning networks to extract latent information about material appearance. These networks have been applied to tasks ranging from material classification, style transfer to appearance synthesis [

This paper adopts a different perspective to address this challenge, focusing on the utilization of image statistics to enhance material discrimination. Our primary motivation is to propose an easily interpretable parametric description of material appearance, focusing on a limited number of parameters with directly interpretable meanings. We have selected specific statistics inspired by low-level mechanisms of human vision, capable of explaining major visual attributes of material appearance. These statistics are also compact and easy to evaluate on images or videos.

Our goal is to identify a concise set of well-defined image statistics that can serve as features for assessing visual similarities between materials. This set should offer the option to focus on user-selected visual features. Once we identify such descriptive features, highly correlated with human ratings, they can act as visual fingerprints for materials. This parametric description could be applied for sorting or retrieval based on material similarity.

To assess human visual ratings of materials one can use either real specimens or their digital representations. Currently, one of the best approaches to digitally representing of real-world materials is the bidirectional texture function (BTF) [

The following sections provide an overview of past research in material appearance understanding, describe the process of obtaining the human rating data, and discuss the selected image statistics. These statistics are then compared to the ratings and applied to inter-material and intra-material comparison tasks.

The analysis of human visual perception of material appearance has been extensively studied in the past [

Another branch of research focused on the analysis of predefined visual and subjective attributes. Fleming et al. [

Computational features relating to visual attributes were also widely studied. A related research in material recognition identifies local material properties, so called visual material traits [

The analysis of mutimodal perception of three different modalities—vision, audition, and touch—using wood as the target object was performed in [

This work uses rating data obtained in Ref. [

We used thirty wood veneers (SET 1) carefully selected from a catalog of over one hundred wood veneers to provide as broad and uniform a range of appearances as possible (see Figure

Two sets of wood veneer samples used in our experiments shown in specular condition (light is opposite to camera).

An example of rating stimulus shown to the observers. Note that stimulus is dynamic due to rotating viewpoint around the sample’s surface.

Based on a review of previous studies identifying visual dimensions of materials [

Dendrograms obtained by hierarchical clustering of samples on both datasets: (a,c) human ratings, (b,d) computational statistics.

Similarity matrices for both datasets for (a,d) rating data, (b,e) computational statistics, (c) regression of rating using statistics, (f) ratings prediction of SET 2, using statistics and regression coefficients from SET 1. R and R^{2} scores compare similarity matrices values without diagonal.

We considered normalization of the rating scores (z-scoring), but eventually decided to keep the original scale 0–100 as it yielded significantly better results. The standard deviation across subjects, averaged across all attributes, was 20.54.

To evaluate agreement among observers, we used the Krippendorff’s alpha [_{K} = 0.371, while the highest values were obtained for

Additionally, hypothesis testing of attributes means using repeated measures. ANOVA confirmed significant differences between attributes means with

When selecting a narrow set of image statistics, we preferred those that could be evaluated quickly, had a limited total number of parameters, and had the ability to explain all the tested rating attributes.

We were inspired by statistics used in image synthesis approaches [

To identify clusters of similar wooden materials, we performed hierarchical clustering. Since combining individual attributes into a single clustering distance (e.g., using Euclidean distance) does not make much sense, we used Pearson correlation to compare sets of eleven attributes of two material samples as a similarity measure between ratings/statistics of samples pair. Dendrogram plots in Figure

To assess overall similarities between materials, we computed aggregated similarity matrices for all human ratings and statistics, again using the correlation of ratings/statistics vectors as a similarity measure for the matrix computation. Similarity matrices (30 × 30 samples) are shown in Figure

In Ref. [

In the next step, we computed correlations between visual ratings and computational statistics. The correlation was computed on similarity matrices obtained independently for each rating attribute and statistic by differentiating values for different samples. The similarity matrix diagonal was excluded during the correlation computation. The correlation plot is shown in Figure

Correlation between the visual ratings and suggested statistics computed on the similarity matrices.

In order to evaluate contribution of individual statistics to rating representation, we performed leave-one-out analysis, removing individual statistics and evaluating changes in rating prediction. Results shown in Figure

Correlation drop obtained for leave-one-out analysis of individual statistics.

As our stimuli are dynamic, we also analyzed whether adding time-variable statistics improves our results. We found that adding the standard deviation of individual statistics as additional parameters did not improve the correlation of prediction, even when we added them one by one. A higher improvement in performance was obtained by using statistical values for distinct frames from the sequence instead of their average. This resulted in twice so many parameters, i.e., one set for the specular and the other the non-specular frame. Such a configuration increased the correlation of predicted values on SET1 from 0.83 to 0.98; however, we consider this model to be over-fitted as the correlation of prediction on SET2 dropped from 0.74 to 0.51. Therefore, we consider using the mean values if statistics as a good balance between prediction accuracy and model generalization abilities.

Our statistics demonstrated promising descriptive performance, and we expect that even better fit to the rating data can be obtained by modelling their mutual relationship. To this end, we used linear regression model with intercept, taking each rating dimension as the dependent variable and statistics as the independent variables. Analysis of linear model coefficients in Figure

Loadings of linear regression coefficients obtained for individual rating attributes.

First, regression on SET 1 provided rating predictions with Pearson correlation to human ratings ^{2} = 0.85) (computed on all 30 ⋅ 10 = 300 values). Results of performance for individual rating attributes are shown as blue bars in Figure ^{2} = 0.55) as shown by the yellow bars in Fig.

Pearson correlation between visual the ratings (rows) and computational statistics (columns).

Fig. ^{2} score in Fig.

Although the linear model with an intercept was used, we still risk collinearity between individual statistics. Therefore, we tested ridge regression as an alternative to linear regression, which should be more robust in terms of independent variables (i.e., image statistics) collinearity. We used ridge regression with an intercept and regularization parameter obtained for each attribute independently. Unfortunately, the predicting performance gain of this model was negligible.

In the next subsections, we demonstrate how our statistical description performs in applied tasks related to inter- and intra-material comparison.

In this section, we compared the similarity of thirty materials in both datasets using t-distributed stochastic neighbor embedding (

The descriptive performance of our statistics can be used also for similarity assessment of the same material observed for variable illumination and viewing conditions. One example of such a dataset can be the bidirectional texture function (BTF) [

Maps of the preserved BTF images for two tested metrics and their three different thresholds preserving 2000, 100, and 10 images. The preserved images are shown as green dots in BTF space of 81 illumination (rows) and 81 viewing (columns) directions.

Figure

A comparison of BTF renderings for all images with their reduced variants for three reduction thresholds (columns) and two different reduction metrics (rows). Attached are the difference images from the reference.

Our statistical similarity representation has several advantages. First, it is fast to evaluate as its values are global for the image and thus do not depend on pixel-wise comparison. Second, it is fully parametric, allowing a comparison only on a selected subset of attributes. For example, we can disregard image directionality and compare similarity only based on the other statistics.

Our material dataset set comprised of a limited number of 30 wooden samples. Although the samples were carefully selected to span variances within the category of wood materials, it is not sufficient for covering all variability in natural woods. Additionally, our work reports results obtained for one of the initial selection of image statistics used as material texture similarity criteria. Although it showed promising descriptive and discriminative performance, we expect its further extension to better discriminate colorful materials and materials with multidirectional patterns, common for fabrics. We plan to extend the number of tested statistics to account for the best representation of human ratings. Concerning the application to BTF data reduction, we plan to improve its performance by introducing linear scaling when the image is replaced by its similar counterpart.

We plan to extend our analysis on a larger set of materials spanning over different material categories, to identify a unified material statistical description acting effectively as material visual fingerprint.

A source code for statistics computation is available at

This paper investigates the extent to which basic image statistics can reproduce human rating of visual attributes. We collected visual rating responses for two sets of 30 wood veneers shown as image sequence under variable illumination and viewing conditions. We selected a set of eleven image statistics that were averaged across all frames of the sequence. The similarity between human ratings, image statistics, and rating predictions using linear regression of image statistics was analyzed by means of using hierarchical clustering, correlation of ratings and similarity matrices. Our results suggest that basic image statistics perform well for both inter- and intra-material comparison, as demonstrated by comparing different materials using

This research has been supported by the Czech Science Foundation grant GA22-17529S. We thank Jiri Lukavsky and Filip Dechterenko from Psychology Institute of CAS for help with rating data collection in the online study. The authors have no competing interests to declare that are relevant to the content of this article.