1.
Introduction
Naturalness is a key attribute that is used to assess the appearance of an object [
15]. It is a complex attribute that involves the interaction of a multitude of other appearance attributes, mainly color, roughness, and gloss [
15]. This complexity is evident when defining naturalness, which can be approached in two ways:
(1)
Objects that are not manipulated by humans, that is, possessing the quality of being in accordance with nature; for example, natural wood versus varnished wood [
19].
(2)
Close matching between the understanding of a scene in the observer’s mind—including (but not limited to) materials comprising the scene and scene depth—and the memory of such scenes and materials in an observer’s memory [
12].
The main difference between these definitions is that the first is more restrictive compared to the second. While the former only considers objects as being natural if they are in their “raw” form, the latter relates naturalness more to the realism of the scene presented. These definitions reveal a layer of complexity when understanding the naturalness attribute, and how naturalness perception is dependent on an observer’s preferences and familiarity with the scenes and materials involved.
But despite its complexity, naturalness remains an important attribute to achieve correctly for an accurate and high-quality reproduction [
19,
34]. Studies have shown that observers react more favorably to natural-looking samples as opposed to fake-looking replicas. This was the case in the study by Overvliet and Soto-Faraco [
19], where observers deemed natural-looking wood samples more valuable than fake-looking samples, as well as in the study by Reinhard et al. [
23], where their “natural look” was essential for the acceptance of eye prostheses by patients. Interestingly, naturalness judgment is also considered when evaluating “unnatural” objects, such as 3D printed processed foods. Groot [
10] has shown that when 3D printed food was considered more natural looking, its acceptance by observers was higher. And still, naturalness is highly sought after in many other applications such as cultural heritage conservation/reproduction, interior decor/design, and art [
15].
3D printing technology is a process where three-dimensional objects are constructed layer by layer from digital design files. It offers many advantages over traditional manufacturing techniques, such as enabling the production of complex and customized objects and cheaper manufacturing [
6]. 3D printing is an umbrella term for many different techniques that use different materials, and are employed for different applications ranging from industrial manufacturing to artistic reproduction all the way to building construction [
18]. Of these techniques, PolyJet 3D printing allows for color 3D printing. A PolyJet 3D printer uses UV-curable colored ink droplets that are jetted from inkheads and cured (solidified) using UV-light. The positioning of the ink droplets is determined by an error-diffusion color halftoning step that is intrinsic to the printing pipeline [
38]. This technique is useful for a certain approach called graphical 3D printing: the reproduction of an object’s appearance using 3D printing [
31]. This involves matching the appearance attributes of the reproduced model with the original so that the reproduction looks and feels the same as the original object. As naturalness is a key attribute in appearance reproduction, it significantly contributes to the perceived quality of an object.
Although progress has been made in understanding how humans perceive the naturalness of objects for 2D and 2.5D applications, research on naturalness perception for 3D printing applications is still in its early stages. This is also true for studies investigating the influence of algorithmically generated surface textures, printed at different elevations, on the perceived naturalness of 3D printed objects. Previous research has explored naturalness perception in various contexts, including 2D images [
34] and 2.5D printed reliefs [
14,
35]. These studies have identified appearance attributes such as color, gloss, and roughness as important contributors to naturalness. These attributes were linked to technical variables that allow controlling an object’s perceived naturalness like its roughness, texture elevation [
15,
16], and surface texture profile [
35]. However, the challenges of achieving naturalness in 3D printed objects, specifically the complex interactions between elevation levels, texture roughness, and surface texture profiles, remain unexplored. An investigation into the inherent subjectivity of naturalness perception by observers, by using adequate statistical models, is also missing from previous research.
This paper aims to fill in these gaps in understanding and controlling naturalness perception using color 3D printing by investigating the influence and interaction between algorithmically generated surface textures and their elevation on perceived naturalness. This study also employs advanced statistical methods to model observer subjectivity in naturalness perception. We hypothesize that image processing algorithms can be used to produce more meaningful surface texture profiles and to control surface features, such as roughness, as demonstrated by Wang et al. [
35]. These generated surface texture profiles would produce more natural-looking 3D printed samples, but that also depends on the texture elevation levels applied. Based on the findings of Kadyrova et al. [
15], we hypothesize that lower elevation levels enhance perceived naturalness. Finally, we consider Bayesian analysis to reveal individual differences in naturalness perception that are not captured by the statistical methods already used in the previous literature.
We investigate this hypothesis by applying novel texture extraction algorithms on reference 2D images to produce semantically meaningful 3D surface texture reproduction with different elevations. We then conduct a subjective experiment to qualify the perceived naturalness of the 3D printed samples. Instead of relying on simple Mean Opinion Scores or z-scores, we use a Bayesian approach that accounts for individual observer differences and preferences, providing deeper insights into naturalness perception.
The main contributions of this work are as follows:
(1)
a novel application of texture extraction algorithms for generating displacement maps in color 3D printing;
(2)
a rigorous subjective evaluation of the impact of these algorithms and varying elevation levels on perceived naturalness;
(3)
the use of a Bayesian statistical approach to analyze ordinal naturalness ratings, providing a more nuanced understanding of observer preferences;
(4)
identification of key texture features that significantly influence naturalness perception in 3D printed objects.
This article is organized as follows. A brief literature review of similar works is presented in Section
2. Section
3 details the methodology, explaining the generation of displacement maps, the design and printing of 3D models, and the assessment procedure. Results are presented in Section
4 and discussed in Section
5. Finally, Section
6 concludes the paper.
5.
Discussion
Looking at Table
III, we see that model performance varies considerably depending on the choice of included variables. Models with a subset of variables included perform poorly compared to those where all variables are accounted for. This validates our initial choice of variables, meaning that all chosen variables have an effect on naturalness rating, albeit to varying extents. Furthermore, the variables should not be treated independently as evidenced by the Interaction model’s superior performance compared to the Combined model. With the nested interaction among
Algorithm,
Image, and
ID giving the highest predictive performance, we also validate our starting hypothesis of observer preference and image content effect on naturalness perception. With this initial verification of our choice of variables and hypotheses, we proceed to extract data from the best-performing prediction model.
As shown in Figs.
7 and
8, both models perform well, and the results obtained from the subjective psychometric experiments can be used for further evaluation. We can see that the predicted values (
y) are close to the observed data (
yrep) both generally and per algorithm. This shows that our model performs well and that the results from the said models can be trusted and used for further analysis of subjective data. The cumulative normal curve (refer to Fig.
7) being a “right-leaning” curve, the subjective experiment revealed that lower categories had more counts than higher categories, indicating that over half of the 3D samples appeared unnatural to observers. The model estimates are in accordance with the observed data in general although with some inconsistencies in the predictions. This may be due to the ratings being inherently noisy or that there are still some factors missing from the model. Nonetheless, it is still possible to rely on our model for further analysis. These prediction results confirm our first hypothesis concerning the ability to use more complex statistical tools for more in-depth analysis of naturalness perception.
To evaluate general trends and observer-specific preferences from the subjective experiment responses, we look at Figs.
9–
11. From Fig.
9 we can see that observers as a group perceived the Gray DM output as the most natural and the Canny Edge output as the least natural output from the seven DM algorithms. It is also apparent that observers preferred the lowest elevation (0.75 mm) for all algorithms except for Gray DM, where the 1 mm elevation was slightly more favorable. Furthermore, the decrease in observer response scores between elevations was not consistent across all algorithms. For example, the ForeFront algorithm shows the steepest drop in scoring, going from an average score of 3.59/5 at 0.75 mm to 2.49/5 at 1.5 mm, representing a 1.1-point drop in average score. However, this sharp decline was less pronounced for some algorithms, notably the Gray DM (mean score difference = 0.35), and 2bit_TAED (mean score difference = 0.7). So overall, there is a clear preference for Gray DMs and lower elevations for naturalness reproduction. These findings confirm our initial hypothesis that lower elevations are preferred for a more natural output. Furthermore, the choice of algorithm used has a significant effect on the naturalness rating and consequently, its perception. Therefore, we validate our starting hypotheses from the findings in Figs.
7–
9.
These results represent the aggregate responses of all observers. However, the statistical models reveal that analyzing scoring at the individual level provides a deeper understanding of the scoring process. To test our final hypothesis related to the subjectivity of naturalness perception, we look at the findings from nested models (see Table
II): ImageNest, which accounts for the effect of image content on naturalness ratings, and IDNest, which accounts for observer preferences. Figs.
10 and
11 illustrate the rating shift per algorithm for the reference images used and for observers, respectively. The rating shift represents the difference between an individual’s score and the average score for the group. The group average is indicated by the dashed black line; thus, the rating shift depends on the positioning of an observer’s scoring curve relative to the dashed black line. If an observer’s curve for a given algorithm is above the group average, it indicates that the observer rated that specific algorithm higher than average and vice versa. For instance, ID3 is an observer that generally rates higher than the group (as their rating curve consistently lies above the group average) while ID10 generally assigned lower ratings than the group. The same type of analysis can be conducted to see if there is a preference for an image as well.
Knowing this, we can see from Fig.
10 that there is not one trend that describes all of the observers’ naturalness ratings, as all rating shifts are highly different from each other and from the average group rating. This demonstrates the impact of subjectivity on naturalness perception as defined in Section
1. It suggests that observers perceived the naturalness of the samples differently, likely relating naturalness to their individual preconceived notions of the materials and scenes depicted in the samples.
The same can be said for images as well, as we can see in Fig.
11 that there are multiple curve structures across the different sets, indicating variability in image preferences as well. This difference in rating shifts does not appear to be separated by material category because for example, the Reddish Wood set was rated higher than average across most algorithms, but the Natural Wood set was rated lower than average. Consequently, the behavior of these rating shifts for both observers and images appears to be too subjective to be explained by a single trend across all algorithms. But we can still see one defining factor: the Gray DM algorithm is consistently rated the highest and the Canny Edge is consistently rated the lowest, with the ForeFront and 2bit_TAED algorithms switching between the second and the third most preferred algorithm for naturalness reproduction. Despite the inherent subjectivity of human ratings, this trend suggests a general inclination to favor the best-performing and worst-performing algorithms across observers and images.
Therefore, to understand why observers preferred an algorithm or an image over others, we rely on the audio recordings of sessions to understand each observer’s thought process when judging a sample’s naturalness and which appearance attributes influenced their categorization the most.
Analysis of the audio recordings revealed that participants used different definitions of naturalness, equating it to the beauty of the sample or to realism. This might be a point of contention because some observers would be basically rating a different attribute compared to others because realism, beauty, and naturalness are subjective concepts. Although observers defined naturalness differently, they primarily used the same descriptors, such as roughness (sometimes referred to as graininess), sharpness, and color to assess a sample’s naturalness. This indicates a common baseline among observers when rating the samples even if expressed using different terminologies.
With this, we proceeded to identify the relevant appearance attributes using the description given by the observers for each sample. Color was the first attribute assessed by nearly all the observers (similar to the findings of Kadyrova et al. [
15]). This was deduced from the description of the sample as “looking normal” or “looking regular” or “having nothing weird” when first receiving the samples. It is important to note that color is theoretically not a variable during the sample creation process because all samples of the same set were produced with the same color input and with the surface profile as the only variable. Consequently, we anticipated that color may not be the primary determinant in naturalness perception. Indeed, for the majority of observers (12 out of 15), the defining factor was the surface finish (the physical texture reproduction) and its effect on the visual texture (color +
gloss) as illustrated in Fig.
5. However, we can observe that even with the same color input for all samples, we get varying color outputs for each sample, which explains the reliance of the observers on color to rate a sample’s naturalness.
Overall, observers deemed natural surface finish as follows: smooth, with minimal roughness, having clear and smooth details and edges, having physical texture matching or enhancing the visual texture (primarily gloss), and having the correct color (based on material comprehension) with some color uniformity preferably. In simpler terms, smooth surfaces and clear physical textures that align with the scene’s visual texture were deemed natural. Conversely, what was deemed unnatural was the opposite: rough (also described as noisy/granular) samples where the roughness overshadowed the visual texture either by creating unrealistic light reflections and/or by altering the color tint of the sample. For example, in Fig.
5, we can observe that rough samples have a yellowish tint that was deemed unnatural by most observers. Another significant factor contributing to the unnatural appearance of samples was edge and color sharpness. In essence, samples were deemed unnatural due to roughness, color, and edge sharpness and a discrepancy between visual and physical textures. However, this is a generalized view over all the observers, and there might be some preferences for attributes over others, thus explaining the individual differences seen in Fig.
10. For example, some observers might prefer a rougher or a sharper sample, leading them to rate ForeFront-MAE higher than other observers. But generally, these individual preferences tend to be overridden when taking the whole group into account.
With this understanding of appearance preference, we can evaluate the statistical model to interpret the results obtained for each of the algorithms. Referring to Figs.
2 and
9, the analysis follows.
DM1 has performed the best. This is because it produces a full range grayscale image (2
8 = 255 levels), and the surface change between neighboring pixels is smooth resulting in smooth surfaces and edges and a physical texture that is coherent with the visual texture. Although DM1 was evaluated as the best in this particular study, we may have many cases where it would not be the best option to reproduce the surface. As Eq. (
1) shows, grayscale values depend on the color of the RGB reference image. Therefore, in cases where the forefront of an image is dark and the background is bright, an RGB-to-grayscale transformation alone would not suffice because the placement would be inverted, thus requiring an additional step to correct the placement of objects in the image. However, the surface finish should be smooth enough to be perceived as natural regardless of the placement of the foreground and background. It would be an interesting idea to test out this particularity in future work.
DM7 has performed the worst. This is because it outputs a binary DM that when applied produces very rough and granular surfaces. This excessive roughness hides the details of the samples, makes them unpleasant to touch and to look at, and creates incoherent physical and visual textures.
DM2 and DM5 perform relatively poorly compared to other algorithms with increase in elevation. This is because ForeFront Algorithms exhibit a drastic change in pixel value in certain parts of the DM, resulting in sharper samples produced compared to the others. That is also why they perform much worse at higher elevations because the higher the elevation, the sharper the edges become. Consequently, ForeFront DMs appear unnatural at higher elevations. ForeFront-MAE has the added drawback of being rough, which explains its poorer performance compared to regular ForeFront DMs. However, these DMs reproduce physical texture that is consistent with visual texture, which is why they are rated higher than Canny Edge DMs.
DM3 and DM4 perform very poorly overall, particularly at high elevations. Similar to DM7, they generate very granular surfaces that obscure details and are generally perceived as unpleasant. However, when we go from 1bit_TAED to 2bit_TAED, where the grayscale range expands from 21 = 2 to 22 = 4, the average score goes from 2.11 to 3.54, which is an increase of 1.43 score points. This corresponds to a shift in category from “unnatural” to “adequate.” This highlights the crucial role of sample surface and edge smoothness in determining perceived naturalness.
This assessment of appearance preferences also explains the lack of preference of any image set by the observers. By relating the rating shift to the ease of reproducing smooth edges and surfaces from the reference images, we can likely explain how observers rated the naturalness of the different image sets. For instance, comparing the Reddish Wood reference image with the Textured Stone Wall image (see Fig.
1), we can see that Reddish Wood has a very uniform surface and color while it is the opposite for Textured Stone Wall. Therefore, it is much easier to achieve smoother surfaces and color uniformity in the case of Reddish Wood than with Textured Stone Wall, where the edges between sections of the image are already highly pronounced. As can be seen in Fig.
11, Reddish Wood samples are therefore rated more natural overall while Textured Stone Wall samples tend to be rated as unnatural.