We propose novel techniques for the evaluation of perceived facial gloss across subjects with varying surface reflections. Given a database of facial skin images from multiple subjects, ordered according to perceived gloss within each subject, we propose a head-tail (least and most glossy image of each subject) selective comparison approach for ordering the entire database. We conducted a two-alternative forced-choice empirical study to compare the facial gloss across subjects within each group. Using the gloss scores of selected candidates and the gloss range of a reference subject, we fit each within-subject gloss range to a global gloss range and quantized the scores into distinct gloss levels. We then conducted another empirical study to validate the quantized gloss levels. The results show that in 90% of the cases, the levels are consistent with human judgments. Based on the database with quantized gloss levels, we develop a max-margin learning model for facial skin gloss estimation. The model relies on gloss related statistics extracted from surface and subsurface reflection images obtained using multimodal photography. The predicted gloss level is decided by the nearest neighbors using the learned scoring function. Performance tests demonstrate that the best performance, with 82% accuracy, is obtained when we combine local statistics from both surface and subsurface reflections.