3.
The significance of the answered evaluations for each pattern was confirmed using a t-test after excluding the outlier data using the Smirnoff–Grubbs test and verifying the distribution equality of evaluations by an F-test. Here, we assumed the normality of the evaluated data. These tests were conducted separately for all 36 pattern combinations (9 × 8∕2) for each evaluation index (faithfulness and preference). Seven observers (six males, one female) and six observers (three males, three females) of 47 total observers were excluded from the evaluations of faithfulness and preference data, respectively. There was no significant difference in the answer results for the standard pattern, which was evaluated twice to confirm the reproducibility. Therefore, we used the second scores of the standard pattern for the analysis. The intra-observer variances (calculated from the evaluation of the two passes with the standard patterns) and inter-observer variances (calculated from the evaluations of all of the projected patterns) were 0.35 and 1.16 for the faithfulness evaluation, and 0.40 and 1.25 for the preference evaluation, respectively.
3.1
Faithfulness Evaluation
First, we checked whether the faithfulness evaluation for each projection pattern satisfied the results in Ref. [
17], as we used observers with different characteristics from those of the previous experiments. Figure
5(a) shows the average rating value (with the standard error among observers) for each pattern. For comparison, we show the results of Ref. [
17] in Fig.
5(b). As shown in the figure, except for the projection pattern S-2 (
p < 0.05), there was no significant difference in each projection pattern between the previous and present data. What is different from the subjects in the previous study was that the subjects in the present study were watching the actual starry sky on a daily basis as described in Section
2. Therefore, the rating became sensitive to the faithfulness of brighter stars, and the evaluation for the pattern S-2 had might be significantly worse. Even for S-2, the average rating values of the previous and present experiments were −1.0 and − 0.3, respectively; the tendencies of negative evaluation were consistent. These results confirmed that the rating values for faithful reproduction in this experiment supported the results in Ref. [
17].
Figure 5.
Faithfulness evaluation results for each projection pattern. (a) Present experiment. (b) Previous experiment.
The projection pattern with the highest rating of faithfulness was the pattern S-1, as shown in Fig.
5. There was a significant difference between the pattern S-1 and all other patterns. In addition, the significant difference between the Std and S-2 patterns with a negative rating value was confirmed. On the other hand, the C- and L-patterns did not have a significant difference with another pattern. These results indicate that the projection size significantly influenced the faithfulness evaluation. In contrast, the tolerance to lack of faithfulness was high regarding the color-temperature shift in the prepared patterns used in this experiment, as these patterns had no different rating values.
We can summarize the conclusions on faithful reproduction as follows. In addition, in the faithfulness evaluation for the projection pattern with a smaller star size, the rating of faithfulness improved. In contrast, the rating was remarkably negative when the pattern had stars with a larger size than that in the standard pattern. Regarding the shifts in the color temperature of the stars, the observers could distinguish differences in color but did not give negative ratings for changes in the color pattern.
3.2
Preference Evaluation
In this subsection, we consider the preferred reproduction. The preference evaluation was also performed by 47 observers. Figure
6 shows the average rating value with the standard error among the 41 observers excluded outliers for each pattern. A significant difference is indicated by symbols (*:
p < 0.05, **:
p < 0.01). The projection pattern with the highest rating of preference was the pattern Std, as shown in Fig.
6. There was a significant difference between the patterns C-1 and C-2. In contrast, the pattern L-1 had the lowest rating of preference with a larger significant difference with all of the other patterns (
p < 0.01). On the contrary, color or size change patterns or brighter patterns such as the C-3, B, S-1, S-2, and L-2 patterns had similar ratings without a significant difference with the highest-rated pattern Std. These results indicate that various observers accepted the Std pattern as the preference star reproduction, which was prepared considering the perceptual equivalence to the appearance of the actual starry sky. On the contrary, the darkness in the star reproduction sensitively negatively influenced the preference evaluation.
Figure 6.
Preference evaluation results of all of the observers for each projection pattern.
3.3
Relationship of the Evaluations between Faithfulness and Preference
In this subsection, we investigate the relationship between faithfulness and preference of stars in a planetarium.
3.3.1
Comparison of Evaluations for Faithfulness and Preference
Figure
7 summarizes the average rating values for faithful and preferred reproductions of each projection pattern. Patterns L-1, L-2, and S-2 were significantly different (
p < 0.05) between the two evaluations. Regarding the luminance variation pattern, the rating value of the preferred reproduction fluctuated, compared with the evaluation of the faithful reproduction. It is worth noting that compared with the rating values of faithful reproduction, the rating values of preferred reproduction decreased when the stars were dark and increased when the stars were bright.
Figure 7.
Total average rating values of preference and faithfulness evaluations. (a) Std and pattern C. (b) Std and pattern L. (c) Std and pattern S.
Regarding the size variation pattern, with the increase of the size, the rating value of faithful reproduction monotonously decreased; however, the rating value of preferred reproduction was almost the same between the patterns S-1 and S-2; i.e., although the large-size-star image was not faithful to the actual star field, it was evaluated as preferable. For the evaluation of the color pattern C, there was no significant difference of rating values between the faithful and preferred reproductions.
The faithfulness score for Std was similar to the average for all the patterns, but it had the second highest score overall, behind the S-1 pattern. This is because the reproduction S-1, with a size equivalent to 2/3 the size of Std, resembled the star size in the genuine starry sky. However, as the number of visible stars in the S-1 pattern decreased compared with the Std pattern, the preference score of S-1 decreased, and the Std pattern received the highest preference score.
3.4
Gender Evaluation Differences
The star-image analysis in Section
3.3.1 revealed differences between the faithfulness and preference evaluations in the three patterns (L-1, L-2, and S-2) for luminance and size. We further revealed that there was a characteristic gender difference.
The gender differences of the average rating values for preference and faithfulness evaluations are shown in Figure
8. The error bars represent the standard errors among each gender (male: 33 observers, female: 14 observers) for each pattern. As shown in Fig.
8(a), the evaluation of faithfulness seems to vary between male and female observers; however, there was not a statistically significant difference, except for the projection pattern C-2 (
p < 0.05). The evaluation of the pattern C-2 seems to be a convincing result, as, according to Ref. [
17], the evaluation was divided into a positive and negative evaluation group. Regarding the preference evaluation, as shown in Fig.
8(b), the tendency of evaluation, positive (plus rating) and negative (minus rating), was consistent between male and female observers, and there was no statistically significant difference except for the evaluation of the pattern L-2 (
p < 0.05). It is worth noting that for the female observers, the projection patterns with high brightness such as the patterns L-2 and S-2 were highly rated as preferred.
Figure 8.
Gender differences of average rating values for preference and faithfulness evaluations. (a) Faithfulness. (b) Preference.
Further, within each category of male and female observers, we investigate the relationship between faithfulness and preference evaluations. Figure
9 shows the differences in rating values between the two evaluations. As shown in Fig.
9(a), for the male observers, there was no significant difference in any pattern. This suggests that the male observers expected planetariums with faithful reproduction as a preferred reproduction. In contrast, as shown in Fig.
9(b), the female observers had different impressions. The evaluations by the female observers were significantly different for two patterns (L-1 and L-2;
p < 0.05). They preferred the high-luminance reproduction rather than the faithful reproduction. In other words, they expected the planetarium to reproduce the brilliant star.
Figure 9.
Relationships between faithfulness and preference evaluations for each gender. (a) Male. (b) Female.
Incidentally, we investigated the experience of astronomical observation by conducting a questionnaire after the experiments. We analyzed experience using observers’ data about years of astronomical observation experience, the frequency of observation, and locations where the sky was observed, etc., using several analysis methods including correlation and clustering. However, we did not find a relationship between observers’ ratings and their experience. Therefore, we concluded that the results of the gender differences did not depend on the experience in astronomical observations.
Figure 10.
Rating distributions of faithfulness and preference evaluations for total rating, color-changed patterns and Std, luminance-changed patterns and Std, and size-changed patterns and Std. (a) Results evaluated by the male observers. (b) Results evaluated by the female observers.
Figure
10 shows the rating distributions of faithfulness and preference evaluations for each projection pattern, for the male (Fig.
10(a)) and female (Fig.
10(b)) observers. (As a Figure
A1 in Appendix, we show the gender difference of rating distributions between faithfulness and preference evaluations for each pattern.) The size of the circle represents the answer ratio for each pattern. From the upper-left panel to the bottom-right panel, the total distribution for all patterns, Std and color variation patterns, Std and luminance variation patterns, and Std and projection size variation patterns are shown, respectively. In addition, a coefficient of correlation calculated by Pearson correlation with/without the significance between the evaluations for faithfulness and preference is shown in each figure. The total distribution had a significant correlation (**:
p < 0.01) for both genders; the correlation coefficient for the male observers (
N = 33) was higher than that for the female observers (
N = 14), despite the more than twice larger number of male observers than that of female observers. Furthermore, the results of the male observers had significant correlations for each variation pattern (
p < 0.01). In contrast, a significant correlation was confirmed only for the size variation patterns in the results of the female observers (
p < 0.05). In particular, the faithful and preferred reproductions were not correlated for the luminance variation pattern. These results support the results in Fig.
9; i.e., the faithful and preferred reproductions were consistent for the male observers, while for the female observers, they differed, in particular, with respect to the luminance evaluation. Figures
11(a) and (b) show the inter-participant correlation per pattern sorted in ascending order w.r.t. the correlation. As shown in the figure, in the case of a male observer, a significant correlation is confirmed between the evaluation of faithfulness and preference in all patterns. In contrast, in most patterns, there is no significant correlation, especially correlation with pattern L cannot be confirmed. These results also support the results that the faithful and preferred reproductions were consistent for the male observers, while for the female observers, they differed with respect to the luminance evaluation.
Figure 11.
Inter-participant correlation per pattern. (a) Male. (b) Female.