Photo aesthetic quality prediction with machine learning techniques is an active yet challenging research topic. One of the most critical components of this task is to obtain the reliable ground truth for photo aesthetic quality through psychophysical experiments. A common approach
is to use the average or the majority vote of all collected scores of a photo as its ground truth. However, these traditional approaches do not take into account different levels of expertise of the experiment subjects. Furthermore, this method tends to be unstable when the number of assessments
is small. In this paper, we propose a strategy that focuses on improving the reliability of the ground truth estimated from human-given photo aesthetic scores. Instead of simply calculating the majority vote score or average score of each photo, we adopt a generative Bayesian approach to simultaneously
infer each photo’s true aesthetic quality score, the difficulty of correctly assessing this photo, and each subject’s expertise. The statistic model fits into the expectation-maximization (EM) framework. This approach models the collected data with a discrete truncated Gaussian
distribution whose parameters represent the hidden ground truth score, the difficulty to correctly assess each photo, and each subject’s expertise.