<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         article-type="research-article"
         dtd-version="3.0"><front>
      <journal-meta>
         <journal-id journal-id-type="publisher-id">jpi</journal-id>
         <journal-title-group>
            <journal-title>Journal of Perceptual Imaging</journal-title>
            <abbrev-journal-title abbrev-type="IST">J. Percept. Imaging</abbrev-journal-title>
            <abbrev-journal-title abbrev-type="publisher">J. Percept. Imaging</abbrev-journal-title>
         </journal-title-group>
         <issn pub-type="epub">2575-8144</issn>
         <publisher>
            <publisher-name>Society for Imaging Science and Technology</publisher-name>
         </publisher>
      </journal-meta>
      <article-meta>
         <article-id pub-id-type="publisher-id">000407</article-id>
         <article-id pub-id-type="doi">10.2352/J.Percept.Imaging.2025.7.000407</article-id>
         <article-id pub-id-type="manuscript">0187</article-id>
         <article-categories><subj-group subj-group-type="article-type"><subject>Remote Research Special Issue</subject></subj-group>
         </article-categories>
         <title-group>
            <article-title>Uncovering Cultural Influences on Perceptual Image and Video Quality Assessment through Adaptive Quantized Metric Models</article-title>
            <alt-title alt-title-type="short">Uncovering cultural influences on perceptual image and video quality assessment through adaptive quantized metric models</alt-title>
         </title-group>
         <contrib-group content-type="all">
            <contrib contrib-type="author">
               <name>
                  <surname>Saupe</surname>
                  <given-names>Dietmar</given-names>
               </name>
               <xref ref-type="aff" rid="jpi0187af1"/>
               <xref ref-type="aff" rid="jpi0187em1"/>
            </contrib>
            <contrib contrib-type="author">
               <name>
                  <surname>Del Pin</surname>
                  <given-names>Simon Hviid</given-names>
               </name>
               <xref ref-type="aff" rid="jpi0187af2"/>
            </contrib>
            <aff id="jpi0187af1">Department of Computer and Information Science, University of Konstanz, Konstanz, Germany</aff>
            <aff id="jpi0187af2">Department of Computer Science, Norwegian University of Science and Technology, Gj&#x00F8;vik, Norway</aff>
            <ext-link id="jpi0187em1" ext-link-type="email">dietmar.saupe@uni-konstanz.de</ext-link>
            <author-comment content-type="short-author-list">
               <p>Saupe and Del Pin</p>
            </author-comment>
         </contrib-group>
         <pub-date pub-type="ppub">
            <month>4</month>
            <year>2025</year>
         </pub-date>
         <volume>8</volume>
         <issue seq="8">0</issue>
         <issue-id>JPI_SPECIAL_002</issue-id>
         <issue-title>Special Issue on Remote Research</issue-title>
         <fpage>1</fpage>
         <lpage>13</lpage>
         <history>
            <date date-type="received">
               <day>31</day>
               <month>3</month>
               <year>2024</year>
            </date>
            <date date-type="accepted">
               <day>4</day>
               <month>12</month>
               <year>2024</year>
            </date>
         </history>
         <permissions>
            <copyright-statement>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</copyright-statement>
            <copyright-year>2025</copyright-year>
         </permissions>
         <abstract>
            <title>Abstract</title>
            <p>Evaluating perceptual image and video quality is crucial for multimedia technology development. This study investigated nation-based differences in quality assessment using three large-scale crowdsourced datasets (KonIQ-10k, KADID-10k, NIVD), analyzing responses from diverse countries including the US, Japan, India, Brazil, Venezuela, Russia, and Serbia. We hypothesized that cultural factors influence how observers interpret and apply rating scales like the Absolute Category Rating (ACR) and Degradation Category Rating (DCR). Our advanced statistical models, employing both frequentist and Bayesian approaches, incorporated country-specific components such as variable thresholds for rating categories and lapse rates to account for unintended errors. Our analysis revealed significant cross-cultural variations in rating behavior, particularly regarding extreme response styles. Notably, US observers showed a 35&#x2013;39% higher propensity for extreme ratings compared to Japanese observers when evaluating the same video stimuli, aligning with established research on cultural differences in response styles. Furthermore, we identified distinct patterns in threshold placement for rating categories across nationalities, indicating culturally influenced variations in scale interpretation. These findings contribute to a more comprehensive understanding of image quality in a global context and have important implications for quality assessment dataset design, offering new opportunities to investigate cultural differences difficult to capture in laboratory environments.</p>
         </abstract>
         <kwd-group>
            <kwd>image and video quality assessment</kwd>
            <kwd>absolute category ratings</kwd>
            <kwd>degradation category ratings</kwd>
            <kwd>category thresholds</kwd>
            <kwd>lapse rates</kwd>
            <kwd>extreme rating style</kwd>
            <kwd>statistical modeling</kwd>
            <kwd>maximum likelihood estimation</kwd>
            <kwd>method of successive intervals</kwd>
            <kwd>cumulative link mixed effects models</kwd>
            <kwd>crowdsourcing</kwd>
            <kwd>national differences</kwd>
         </kwd-group>
         <counts>
            <page-count count="13"/>
         </counts>
         <custom-meta-group>
            <custom-meta>
               <meta-name>ccc</meta-name>
               <meta-value>2575-8144/2025/7/000407/13/$00.00</meta-value>
            </custom-meta>
            <custom-meta>
               <meta-name>printed</meta-name>
               <meta-value>Printed in the USA</meta-value>
            </custom-meta>
         </custom-meta-group>
      </article-meta>
   </front>
   <body>
      <sec id="jpi0187us1">
         <label>1.</label>
         <title>Introduction</title>
         <p>Subjective image quality assessment involves observers rating sets of images, but beneath the surface lies a complex interplay of cultural influences on response styles. This study investigated cross-cultural differences in image quality assessment by examining whether observers from different countries demonstrate distinct tendencies when providing their ratings.</p>
         <p>Several phenomena may lead to variations in how people interpret and utilize discrete rating scales such as Absolute Category Rating (ACR) and Degradation Category Rating (DCR), which are ordinal scales with five categories ranging from &#x2018;bad&#x2019; to &#x2018;excellent&#x2019; for ACR and from &#x2018;imperceptible&#x2019; to &#x2018;very annoying&#x2019; for DCR.</p>
         <p>We developed and applied statistical models to explore nation-based differences in the use of the 5-level ACR and DCR scales for image and video quality assessment. Our study was based on data collected from observers of diverse countries who rated the same images or videos. Our objective was to uncover whether cultural nuances play a role in how observers tend to assign stimuli to given quality categories and to what extent extreme ratings are chosen.</p>
         <p>Many subjective image and video quality assessment studies were carried out across several countries, either in different labs or on crowdsourcing platforms. The category labels for the responses of the subjects were uniformly presented in English for participants from all countries, or may have been adapted to the respective languages. In either case, the interpretation of the category labels may depend on the cultural background of the participants.</p>
         <p>For example, an Italian observer might rate an image as &#x2018;mediocre&#x2019; (level 2) on the Italian language ACR scale shown in Table <xref ref-type="table" rid="jpi0187tabI">I</xref>, but rate the same image as &#x2018;fair&#x2019; (level 3) on an English language scale, despite the primary meaning of &#x2018;mediocre&#x2019; being &#x2018;poor&#x2019; (level 2). This is because &#x2018;mediocre&#x2019; can also be translated as &#x2018;moderate&#x2019; indicating &#x2018;average in quality&#x2019;, i.e., something that is neither particularly good nor particularly bad, which is just how &#x2018;fair&#x2019; quality can be defined.</p>
         <table-wrap id="jpi0187tabI">
            <label>Table&#x00A0;I.</label>
            <caption id="jpi0187tcI">
               <p>Graphical scaling for the CCIR (Consultative Committee on International Radio) quality scale terms in two populations with different languages. Data from&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib16">16</xref>].</p>
            </caption>
            <table frame="void">
               <colgroup>
                  <col align="center"/>
                  <col align="center"/>
                  <col align="center"/>
                  <col align="center"/>
                  <col align="center"/>
               </colgroup>
               <thead>
                  <tr>
                     <th align="center">ACR</th>
                     <th colspan="2" align="center">US</th>
                     <th colspan="2" align="center">Italy</th>
                  </tr>
                  <tr>
                     <th align="center"> Ordinal</th>
                     <th align="center">Name</th>
                     <th align="center">Value</th>
                     <th align="center">Name</th>
                     <th align="center">Value</th>
                  </tr>
               </thead>
               <tbody>
                  <tr>
                     <td align="center">5</td>
                     <td align="center">Excellent</td>
                     <td align="center">6.5 &#x00B1; 0.6</td>
                     <td align="center">Ottimo</td>
                     <td align="center">6.4 &#x00B1; 0.6</td>
                  </tr>
                  <tr>
                     <td align="center">4</td>
                     <td align="center">Good</td>
                     <td align="center">4.9 &#x00B1; 0.7</td>
                     <td align="center">Buono</td>
                     <td align="center">5.5 &#x00B1; 0.7</td>
                  </tr>
                  <tr>
                     <td align="center">3</td>
                     <td align="center">Fair</td>
                     <td align="center">3.5 &#x00B1; 0.8</td>
                     <td align="center">Discreto</td>
                     <td align="center">4.3 &#x00B1; 1.0</td>
                  </tr>
                  <tr>
                     <td align="center">2</td>
                     <td align="center">Poor</td>
                     <td align="center">1.4 &#x00B1; 0.6</td>
                     <td align="center">Mediocre</td>
                     <td align="center">1.9 &#x00B1; 1.5</td>
                  </tr>
                  <tr>
                     <td align="center">1</td>
                     <td align="center">Bad</td>
                     <td align="center">1.1 &#x00B1; 0.6</td>
                     <td align="center">Cattivo</td>
                     <td align="center">1.5 &#x00B1; 1.3</td>
                  </tr>
               </tbody>
            </table>
         </table-wrap><p>Thus, the interpretation of the terms for perceived quality on ACR and DCR scales can be influenced by language and culture. This was already investigated nearly 30 years ago in several studies using a technique known as graphic scaling. In&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib16">16</xref>], for example, subjects placed a marker for each of the terms on an interval scale with a length of 7.1 inches. Table&#x00A0;<xref ref-type="table" rid="jpi0187tabI">I</xref> shows abridged results, demonstrating that the terms are anchored at different positions by two study groups of US and Italian citizens. Moreover, the labeled positions of the ACR categories are not evenly distributed on the interval scale. Other studies have confirmed this, e.g., for the Dutch-language terms&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib29">29</xref>].</p>
         <p>Moreover, subjects from different cultural backgrounds may give different category ratings for the same stimulus, even when the perceived qualities are identical. For example, the chances for an image of very good quality to receive a rating &#x2018;excellent&#x2019; could be much larger when asking subjects from one country than from another one.</p>
         <p>Similarly, some people, due to cultural background or personal style, prefer choosing the most extreme option on the scale instead of more moderate middle responses. This is called an extreme response style. It means they are more likely to pick &#x2018;bad&#x2019; or &#x2018;excellent&#x2019; on the 5- point ACR scale rather than a mid-point response like &#x2018;fair&#x2019;&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib6">6</xref>, <xref ref-type="bibr" rid="jpi0187bib7">7</xref>, <xref ref-type="bibr" rid="jpi0187bib9">9</xref>].</p>
         <p>An extreme response style is not inherently positive or negative. However, it can lead to biases when comparing research findings across different cultures. If a particular group consistently leans towards extreme responses, it could distort the perceived cultural differences, making them appear larger or smaller than they truly are. A thorough understanding of response styles is crucial for the accurate interpretation of results. Perceived image and video quality is inherently subjective and cannot be directly measured. We therefore rely on latent variable models to infer perceived quality from observable data like subjective ratings. These models assume that an underlying latent variable, representing the viewer&#x2019;s perceived visual quality of a stimulus, drives the observed ratings. This latent variable is influenced by objective factors like resolution or color fidelity. The observer&#x2019;s judgement can also be shaped by subjective factors, including individual preferences and cultural background. Our study focuses on uncovering how cultural influences affect these judgements, leading to systematic differences in how viewers from different cultures interpret and use rating scales.</p>
         <p>Our study is presented as follows. In the next section, we provide a brief overview of related work on cultural differences that focused on the use of rating categories. We then explain our main modeling tools, namely discrete models derived from quantized continuous models of perceived quality on a latent scale, which we adapt to examine national differences in rating behavior. We then explain the two computational approaches for these models, i.e., maximum likelihood estimation and cumulative link mixed effect models that are usually solved by Bayesian estimation. In Section&#x00A0;<xref ref-type="sec" rid="jpi0187us4">4</xref>, we present the previously published large datasets that we selected for our study and explain how we created more balanced subsets of them. Moreover, we provide details of the analysis of the complete datasets and their subsets using different models and reconstruction techniques. Section&#x00A0;<xref ref-type="sec" rid="jpi0187us5">5</xref> presents the computational results of our models, focusing on adaptive country-specific thresholds for the rating categories and probabilities for extreme ratings. Before concluding, we point out the limitations of our study.</p>
         <p>This article builds upon and extends our previous work presented in&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib23">23</xref>], &#x201C;National differences in image quality assessment: An investigation on three large-scale IQA datasets,&#x201D; at the 16th International Conference on Quality of Multimedia Experience (QoMEX 2024). In that study, we investigated nation-based differences in image and video quality assessment using large-scale crowdsourced datasets and adaptive quantized metric models. We explored country-specific variations in rating thresholds and extreme response styles using maximum likelihood estimation (MLE) on the full datasets. We extend that study to the present one in several key aspects. First, recognizing potential biases due to the unbalanced nature of the full datasets, we introduce carefully constructed balanced subsets of KonIQ-10k, KADID-10k, and NIVD. This allows for a more controlled comparison of rating behavior across countries. Second, in addition to MLE, we employ cumulative link mixed effects models (CLMMs) with Bayesian parameter estimation, offering an alternative robust and nuanced approach to analyzing ordinal rating data while accounting for dependencies within the data. Finally, we refine the taxonomy and presentation of the different modeling approaches, providing a clearer and more comprehensive overview of the methodologies employed.</p>
         <p></p>
         <table-wrap id="jpi0187tabII">
            <table frame="void">
               <colgroup>
                  <col align="right"/>
                  <col align="left"/>
               </colgroup>
               <thead>
                  <tr>
                     <th align="right"/>
                     <th align="left">Frequently used acronyms</th>
                  </tr>
               </thead>
               <tbody>
                  <tr>
                     <td align="right">ACR</td>
                     <td align="left">Absolute category rating</td>
                  </tr>
                  <tr>
                     <td align="right">DCR</td>
                     <td align="left">Degradation category rating</td>
                  </tr>
                  <tr>
                     <td align="right">VAS</td>
                     <td align="left">Visual analog scale</td>
                  </tr>
                  <tr>
                     <td align="right">MOS</td>
                     <td align="left">Mean opinion score</td>
                  </tr>
                  <tr>
                     <td align="right">NIVD</td>
                     <td align="left">Netflix International Video Dataset</td>
                  </tr>
                  <tr>
                     <td align="right">KonIQ-10k</td>
                     <td align="left">Konstanz Image Quality Dataset</td>
                  </tr>
                  <tr>
                     <td align="right">KADID-10k</td>
                     <td align="left">Konstanz Artificially Distorted Image Quality</td>
                  </tr>
                  <tr>
                     <td align="right"/>
                     <td align="left">Database</td>
                  </tr>
                  <tr>
                     <td align="right">MLE</td>
                     <td align="left">Maximum likelihood estimation</td>
                  </tr>
                  <tr>
                     <td align="right">CLMM</td>
                     <td align="left">Cumulative link mixed effects models</td>
                  </tr>
                  <tr>
                     <td align="right"></td>
                  </tr>
               </tbody>
            </table>
         </table-wrap><p></p>
      </sec>
      <sec id="jpi0187us2">
         <label>2.</label>
         <title>Related Work Regarding Cultural Differences in Perceptual Rating Categories</title>
         <p>Cultural effects are expressed in international image quality assessment studies using CCIR terms (Consultative Committee on International Radio, founded 1927). However, little work has been done to extract national differences. An international study&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib21">21</xref>] did not determine any apparent influence of language or culture on the Mean Opinion Score (MOS) from ACR of audio-visual stimuli.</p>
         <p>Scott et&#x00A0;al.&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib25">25</xref>] investigated how personality and cultural traits influenced the perception of multimedia quality. Their study used a dataset of 144 video sequences rated by 114 participants from diverse cultural backgrounds. Analysis showed that personality and cultural traits accounted for 9.3% of the variance in perceived quality, a significant proportion compared to system factors. Specifically, cultural dimensions like individualism, masculinity, uncertainty avoidance, and indulgence showed correlations with perceived quality and enjoyment, highlighting the impact of national cultural differences on subjective video quality experiences. The study underscored the importance of considering individual and cultural factors in multimedia quality assessments.</p>
         <p>Recently, Bampis et&#x00A0;al.&#x00A0;created a much larger video quality dataset (NIVD) by collecting ratings from 12,812 people of four countries&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib1">1</xref>]. The work focused on how different spatial video resolutions and screen sizes affected perceived quality. Few scatter plots showed that there were nation-based differences. The authors suggested development of better subject models to reduce cross-national biases, which would aggregate the data across countries appropriately. Their NIVD dataset has been made publicly available and has been used in our study.</p>
         <p>Extreme response styles can vary across cultures. For example, participants from individualistic cultures, like the US, are often more inclined towards extreme responses than those from collectivist cultures, like East Asian countries&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib4">4</xref>, <xref ref-type="bibr" rid="jpi0187bib5">5</xref>, <xref ref-type="bibr" rid="jpi0187bib10">10</xref>]. Even within the US, there are differences in extreme response styles among different ethnic groups&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib6">6</xref>, <xref ref-type="bibr" rid="jpi0187bib11">11</xref>]. In a study by Zax and Takahashi (1967), it was determined that US respondents were 41% more likely to select the extreme responses compared to Japanese respondents (19.2% versus 13.6% respectively). Conversely, Japanese respondents selected the neutral response 33% more frequently (23.2% versus 17.4%)&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib34">34</xref>].</p>
         <p>In another study by Chen, Lee, and Stevenson (1995), respondents from four cultures were found to make differential use of certain points on scales. Japanese and Chinese students were more likely than US and Canadian students to select midpoints; US students, more frequently than Japanese, Chinese, or Canadian students, selected the extreme values&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib4">4</xref>].</p>
         <p>The design of questionnaires can also impact the prevalence of extreme response styles. Adjustments such as modifying the number of response options, altering the phrasing of questions, or changing the response format can reduce a scale&#x2019;s sensitivity to a respondent&#x2019;s cultural inclinations&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib6">6</xref>, <xref ref-type="bibr" rid="jpi0187bib13">13</xref>].</p>
         <p>Given the large Japanese and American subsamples in the NIVD dataset, we focus our examination on whether these previously documented cultural differences in extreme response styles manifest in these video quality ratings.</p>
      </sec>
      <sec id="jpi0187us3">
         <label>3.</label>
         <title>Adaptive Quantized Metric Models</title>
         <p>By nature, perceived image or video quality is a latent variable. It cannot be measured directly, but must be inferred by a mathematical model from responses of subjects who judge the quality of the stimuli in an experiment. In these models, latent variables are commonly treated as continuous normally distributed variables. Such models were introduced by Thurstone in 1927&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib30">30</xref>] and are referred to as Thurstonian.</p>
         <p>Likert items, commonly used in research to collect subjective judgments, provide ordinal data often summarized as a metric model by per-item means and standard deviations. For example, the five ACR categories are commonly interpreted as values 1,2, &#x2026;, 5 on an interval scale, i.e., the categories are not only ordered but also have values that are evenly spaced. The mean opinion score (MOS) is the average of the collected ratings for a stimulus. It follows that the MOS is the maximum likelihood estimate (MLE) of the mean of the corresponding normally distributed random variable&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib18">18</xref>].</p>
         <p>In a recent study, Liddell and Kruschke found for three top-tier journals in psychology that treated ordinal data as interval/ratio scale data is the rule rather than the exception&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib19">19</xref>]. However, this approach may lead to erroneous conclusions due to the inherent unequal distances between categories and the different variances of stimulus ratings. Metric models combined with statistical tests such as the t-test may fail to detect existing differences between stimulus qualities, lead to reversals in the ranking of quality estimates, and produce unreliable effect size estimates. The debate about the validity of applying metric models to discrete, categorical data is not new. It has been going on for decades in many areas of science, as elucidated by Seufert&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib26">26</xref>].</p>
         <p>As in psychology, the vast majority of data analyses for ACR/DCR data in quality of experience research to date have used the metric modeling approach, i.e. reporting the MOS values and occasionally the variances. In addition, such methods are recommended in the published standards of the International Telecommunication Union&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib14">14</xref>].</p>
         <p>In this study, we depart from this position and apply ordinal statistical models derived from quantized metric models, which are outlined in this section and elaborated in Sections&#x00A0;<xref ref-type="sec" rid="jpi0187us4-2">4.2</xref> and&#x00A0;<xref ref-type="sec" rid="jpi0187us4-3">4.3</xref>. We thus follow the conclusion of Liddell and Kruschke&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib19">19</xref>]: &#x201C;Because it is impossible to know in advance whether or not treating a particular ordinal dataset as metric would produce a different result than treating it as ordinal, we recommend that the default treatment of ordinal data should be with an ordinal model&#x201D;. Another, equally important reason is that our adaptive quantized metric models permit inclusion of country-specific components that can better explain the differences between groups than simply comparing their MOS.</p>
         <p>As an alternative to MOS, Liddell and Kruschke proposed the use of cumulative ordinal models. In these models, a continuous cumulative density function of perceived quality is thresholded at multiple values, which yields the modeled probabilities of the rating categories.</p>
         <p>This approach can also be described as a quantized metric model based on continuous distributions that model the perceived stimulus quality on the latent scale (Figures&#x00A0;<xref ref-type="fig" rid="jpi0187fig1">1</xref> and&#x00A0;<xref ref-type="fig" rid="jpi0187fig2">2</xref>). The probabilities for the rating categories are determined by quantizing the corresponding random variable using fitted thresholds. These thresholds when used in quantization permit consideration of potential nonlinear associations between ordinal data and the latent quality scale, providing a more accurate interpretation of the ordinal data.</p>
         <fig id="jpi0187fig1"><label>Figure&#x00A0;1.</label>
            <caption id="jpi0187fc1">
               <p>The quantized metric model for perceived quality. The probabilities for the ACR ratings  &#x2018;poor&#x2019; to  &#x2018;excellent&#x2019; can be modeled in a two-stage process. The latent perceived quality is assumed to be a normally distributed random variable parameterized by its mean and variance. Second, the random variable is quantized into ACR categories that correspond to successive intervals on the quality scale and are separated by thresholds <italic>&#x03C4;</italic><sub>1</sub> &#x003C; &#x22EF; &#x003C; <italic>&#x03C4;</italic><sub>4</sub>. The probabilities of an ACR classification are indicated by the areas under the curve in the corresponding interval. Here, the mean value is 3.0 and the probability of a &#x2018;fair&#x2019; rating (3) is the highest.</p>
            </caption>
            <graphic id="jpi0187f1_online" content-type="online" xlink:href="jpi0187f1_online.jpg"/>
         </fig><fig id="jpi0187fig2"><label>Figure&#x00A0;2.</label>
            <caption id="jpi0187fc2">
               <p>The quantized metric model as viewed in cumulative models with random effects. The figure shows how a typical person from the US would rate a typical video in the NIVD dataset. For a concrete video stimulus, the distribution would be shifted left or right by the value of an appropriate intercept. The figure is based on code provided by&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib28">28</xref>].</p>
            </caption>
            <graphic id="jpi0187f2_online" content-type="online" xlink:href="jpi0187f2_online.jpg"/>
         </fig><p>There is a fundamental difference between a metric model and a quantized one: The metric model specifies the likelihood of a rating as the corresponding density value of the continuous distribution&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib14">14</xref>, <xref ref-type="bibr" rid="jpi0187bib18">18</xref>], while the cumulative ordinal model specifies the probability of an ACR-type rating as the integral of the density function over the interval corresponding to the rating. This integral is equal to the difference between the values of the cumulative density function at the boundaries of the interval.</p>
         <p>To account for the effect that the ACR categories may not be equally spaced on the quality scale, we let the thresholds define intervals of different widths. For the five categories this yields a sequence of five successive intervals that partition the real number line, as shown in Figs&#x00A0;<xref ref-type="fig" rid="jpi0187fig1">1</xref> and&#x00A0;<xref ref-type="fig" rid="jpi0187fig2">2</xref>. For a given number of observers and a set of stimuli, the corresponding statistical model is given by the mean and variance for each stimulus and the list of thresholds as intercepts in a cumulative model that separate the category intervals in the figures.</p>
         <p>The quantized metric model was introduced by Thurstone in his lectures in the framework of his Law of Categorical Judgement. It was first reported by Saffir in 1937&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib22">22</xref>] titled as Method of Successive Intervals. In the following years, a number of techniques were developed to solve the system of equations for the parameters that arises with the approach, the most prominent ones being least-squares methods. The standard reference is Torgerson&#x2019;s book&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib31">31</xref>]. (The Law of Categorical Judgement is more general by letting the thresholds be random variables instead of fixed numbers. However, this allows the order of the thresholds to vary which complicates theory and algorithms.)</p>
         <p>A quantized metric model is probabilistic by definition and gives rise to two natural computational approaches to estimate their model parameters. The first one is maximum likelihood estimation (MLE), and the other is Bayesian estimation. Only when electronic computing machinery became available, it became practical to consider MLE to estimate the model parameters. Sch&#x00F6;nemann and Tucker were the first to develop this method, in 1967, including an implementation on an ILLIAC supercomputer&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib24">24</xref>]. In this study, we apply both estimation methods.</p>
         <p>The quantized metric model as applied for Bayesian estimation of cumulative models with random effects (Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig2">2</xref>) is very similar to the standard one using MLE (Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig1">1</xref>). The probability of observing a given ACR response is the probability of a value being drawn from the latent zero-mean distribution within that response&#x2019;s region. Several factors such as the stimulus and the subject for the rating may shift the mean (and additionally change the variance) of the continuous distribution. In contrast to the previous models, these effects are taken to be &#x2018;random&#x2019;, averaging to zero. Therefore, latent values are spread around zero. For example, the figure presents our model&#x2019;s estimates for the US, accounting for variability across videos and raters. It shows how a typical person from the US would rate a typical video in the NIVD dataset. For a concrete video stimulus, the distribution would be shifted left or right by the value of an appropriate intercept. The density plot shows latent values, and the bar graph shows response percentages, with a central tendency towards rating 3. This visualization highlights the model&#x2019;s ability to disentangle rating tendencies and make reliable cross-cultural inferences.</p>
         <p>Two recent articles have built on this approach to demonstrate how Bayesian cumulative link mixed models (CLMMs) can be applied to provide more principled norms from ordinal rating data&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib3">3</xref>, <xref ref-type="bibr" rid="jpi0187bib28">28</xref>]. CLMMs extend the basic cumulative link model by allowing random effects that capture dependencies in the data due to clustered observations (e.g. by participants or items). Taylor et al.&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib28">28</xref>] posited that CLMMs should be used to calculate rating norms from ordinal data, rather than taking means of the ratings directly. Their simulations showed that CLMMs can determine latent means and standard deviations for items in a way that is disentangled from overall response patterns and biases in the ratings.</p>
         <p>The CLMM framework offers additional flexibility to estimate discrimination (i.e. variance) parameters that allow item differences in latent variance as well as means&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib3">3</xref>]. CLMMs make fewer assumptions about the shape of the underlying latent distribution compared to traditional modeling approaches. Overall, CLMMs provide a powerful and flexible tool to analyze ordinal data, accounting for overall response patterns and dependencies to yield more appropriate item-level estimates&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib28">28</xref>]. Given the widespread collection and analysis of ordinal ratings across psychological research, these advantages of CLMMs represent an important methodological consideration.</p>
         <p>Similar cumulative models have only been used in few studies to estimate the quality of experience (QoE). In&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib15">15</xref>, <xref ref-type="bibr" rid="jpi0187bib27">27</xref>], the effect of several factors such as channel bandwidth, link capacity, task content, user bias, and gender on QoE was studied. Another study&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib8">8</xref>] analyzed the non-linear usage of ACR scales using CLMMs, but did not investigate changes in rating thresholds.</p>
         <p>In this study, we applied quantized metric models to investigate potential nation-based differences in perceptual image and video quality assessment. Specifically, we fitted such models using maximum likelihood estimation or Bayesian hierarchical regression to incorporate country-specific components. People from different cultural or national backgrounds may associate the rating categories with different intervals on the scale of perceptual quality. Thus, our main mechanism to account for country-specific differences in rating behavior was to adapt the thresholds and intercepts for each country. In this approach, we assume that the quality of a each image or video stimulus on the latent scale is a fixed value. Then the differences between countries in the adjusted thresholds imply different probabilities for the ACR/DCR categories. In addition, we also adapted other parameters in a similar way. For example, the variance parameter (dispersion) of ratings was adapted per country.</p>
         <p>Extreme response style refers to individuals with a preference for choosing options at the extreme ends of the rating scales, which are influenced by cultural backgrounds and personal styles. To examine country-specific extreme response styles, we extracted the probabilities of extreme ratings from the results of our models fitted to the data. We also compared the empirical proportions of extreme ratings between countries.</p>
         <p>An additional, technical contribution is the adoption of a lapse rate. When reconstructed by MLE, a stimulus of high quality can result in a probability for the low category &#x2018;bad&#x2019; that is almost equal to zero. According to the model, a &#x2018;bad&#x2019; rating is therefore extremely unlikely. In practice, however, such ratings can occur if subjects are momentarily inattentive and make a wrong decision, or if they accidentally press the wrong answer key even though they had made a correct decision (a &#x2018;finger error&#x2019;). These lapses have an inappropriate influence on the MLE of the model parameters and distort the model parameters, which impairs the model quality. A lapse rate introduces a small prior probability for all categories, which is then combined with the evidence, i.e. the ratings in the experiment. This helps to mitigate the negative effects of lapses. Lapse rates are often used in cognitive science to fit models of psychometric functions &#x00A0;[<xref ref-type="bibr" rid="jpi0187bib32">32</xref>], but have not yet been considered for reconstructions by MLE from ACR/DCR response data.</p>
         <p></p>
      </sec>
      <sec id="jpi0187us4">
         <label>4.</label>
         <title>Materials and Methods</title>
         <p>This section details the datasets and the statistical models used to extract country-specific traits in image and video quality assessment.</p>
         <sec id="jpi0187us4-1">
            <label>4.1</label>
            <title>Datasets</title>
            <p>To ensure statistical evidence of our results, we focused on three datasets with large numbers of ratings from a diverse range of countries. KonIQ-10k&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib12">12</xref>] and KADID-10k&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib20">20</xref>] were collected via crowdsourcing, attracting participants from over 70 countries, with the largest contributions coming from Russia (KonIQ-10k), Venezuela, Egypt, and India (KADID-10k). The NIVD dataset&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib1">1</xref>], by focusing on four key countries (Japan, Brazil, the US, and India) and having the largest population of observers, offered the greatest potential for a cross-cultural analysis. Table&#x00A0;<xref ref-type="table" rid="jpi0187tabIII">II</xref> lists the dataset summaries. The first two are image quality datasets; KonIQ-10k uses no-reference IQA with ACR, and KADID-10k uses full-reference IQA with DCR. NIVD is a video quality dataset assessed on a visual analog scale (VAS). The nationality was unknown for a few subjects, so we removed their ratings from the datasets.</p>
            <table-wrap id="jpi0187tabIII">
               <label>Table&#x00A0;II.</label>
               <caption id="jpi0187tcIII">
                  <p>Overview of datasets. The average number of ratings per image, subject, and country are given.</p>
               </caption>
               <table frame="void">
                  <colgroup>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                  </colgroup>
                  <thead>
                     <tr>
                        <th align="center">Dataset</th>
                        <th colspan="2" align="center">KonIQ-10k</th>
                        <th colspan="2" align="center">KADID-10k</th>
                        <th colspan="2" align="center">NIVD</th>
                     </tr>
                     <tr>
                        <th align="center">Reference/year</th>
                        <th colspan="2" align="center">[<xref ref-type="bibr" rid="jpi0187bib12">12</xref>]/2020</th>
                        <th colspan="2" align="center">[<xref ref-type="bibr" rid="jpi0187bib20">20</xref>]/2019</th>
                        <th colspan="2" align="center">[<xref ref-type="bibr" rid="jpi0187bib1">1</xref>]/2023</th>
                     </tr>
                     <tr>
                        <th align="center"> Rating type</th>
                        <th colspan="2" align="center">ACR</th>
                        <th colspan="2" align="center">DCR</th>
                        <th colspan="2" align="center">VAS</th>
                     </tr>
                     <tr>
                        <th align="center"></th>
                        <th align="center">Full set</th>
                        <th align="center">Subset</th>
                        <th align="center">Full set</th>
                        <th align="center">Subset</th>
                        <th align="center">Full set</th>
                        <th align="center">Subset</th>
                     </tr>
                  </thead>
                  <tbody>
                     <tr>
                        <td align="center">Images or videos</td>
                        <td align="center">10076</td>
                        <td align="center">168</td>
                        <td align="center">11085</td>
                        <td align="center">89</td>
                        <td align="center">1860</td>
                        <td align="center">1488</td>
                     </tr>
                     <tr>
                        <td align="center">Subjects</td>
                        <td align="center">1261</td>
                        <td align="center">351</td>
                        <td align="center">2212</td>
                        <td align="center">92</td>
                        <td align="center">12812</td>
                        <td align="center">12812</td>
                     </tr>
                     <tr>
                        <td align="center">Countries</td>
                        <td align="center">75</td>
                        <td align="center">2</td>
                        <td align="center">72</td>
                        <td align="center">2</td>
                        <td align="center">4</td>
                        <td align="center">4</td>
                     </tr>
                     <tr>
                        <td align="center">Ratings/stimulus</td>
                        <td align="center">107.0</td>
                        <td align="center">45.4</td>
                        <td align="center">35.3</td>
                        <td align="center">23.2</td>
                        <td align="center">265.3</td>
                        <td align="center">302.5</td>
                     </tr>
                     <tr>
                        <td align="center">Ratings/subject</td>
                        <td align="center">854.8</td>
                        <td align="center">21.7</td>
                        <td align="center">176.9</td>
                        <td align="center">22.4</td>
                        <td align="center">38.5</td>
                        <td align="center">35.1</td>
                     </tr>
                     <tr>
                        <td align="center">Ratings/country</td>
                        <td align="center">14372.8</td>
                        <td align="center">3810.5</td>
                        <td align="center">5435.8</td>
                        <td align="center">1031.5</td>
                        <td align="center">123368</td>
                        <td align="center">112543</td>
                     </tr>
                     <tr>
                        <td align="center">Ratings total</td>
                        <td align="center">1077960</td>
                        <td align="center">7621</td>
                        <td align="center">391376</td>
                        <td align="center">2063</td>
                        <td align="center">493472</td>
                        <td align="center">450172</td>
                     </tr>
                  </tbody>
               </table>
            </table-wrap><p>Table&#x00A0;<xref ref-type="table" rid="jpi0187tabIV">III</xref> provides a more detailed breakdown of the major contributing countries for each dataset. The &#x2018;Other&#x2019; category in this table represents the combined contributions from the remaining countries, which include various European nations, South American countries, and other regions of East Asia.</p>
            <table-wrap id="jpi0187tabIV">
               <label>Table&#x00A0;III.</label>
               <caption id="jpi0187tcIV">
                  <p>The countries with most ratings per dataset.</p>
               </caption>
               <table frame="void">
                  <colgroup>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="right"/>
                     <col align="right"/>
                     <col align="right"/>
                  </colgroup>
                  <thead>
                     <tr>
                        <th align="center">Dataset</th>
                        <th align="center">Country</th>
                        <th align="right">Subjects</th>
                        <th align="right">Stimuli</th>
                        <th align="right">Ratings</th>
                     </tr>
                  </thead>
                  <tbody>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="right">359</td>
                        <td align="right">10074</td>
                        <td align="right">423400</td>
                     </tr>
                     <tr>
                        <td align="center">KonIQ-10k</td>
                        <td align="center">Venezuela</td>
                        <td align="right">212</td>
                        <td align="right">10074</td>
                        <td align="right">129236</td>
                     </tr>
                     <tr>
                        <td align="center">Full set</td>
                        <td align="center">Russia</td>
                        <td align="right">66</td>
                        <td align="right">9871</td>
                        <td align="right">62077</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Serbia</td>
                        <td align="right">62</td>
                        <td align="right">9884</td>
                        <td align="right">49428</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Other</td>
                        <td align="right">563</td>
                        <td align="right">10076</td>
                        <td align="right">413819</td>
                     </tr>
                     <tr>
                        <td align="center">KonIQ-10k</td>
                        <td align="center">India</td>
                        <td align="right">213</td>
                        <td align="right">168</td>
                        <td align="right">3940</td>
                     </tr>
                     <tr>
                        <td align="center">Subset</td>
                        <td align="center">Venezuela</td>
                        <td align="right">138</td>
                        <td align="right">168</td>
                        <td align="right">3681</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">Venezuela</td>
                        <td align="right">1332</td>
                        <td align="right">11085</td>
                        <td align="right">269923</td>
                     </tr>
                     <tr>
                        <td align="center">KADID-10k</td>
                        <td align="center">Egypt</td>
                        <td align="right">97</td>
                        <td align="right">5980</td>
                        <td align="right">17326</td>
                     </tr>
                     <tr>
                        <td align="center">Full set</td>
                        <td align="center">India</td>
                        <td align="right">83</td>
                        <td align="right">5854</td>
                        <td align="right">11784</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center"> Russia</td>
                        <td align="right">48</td>
                        <td align="right">5122</td>
                        <td align="right">9797</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Other</td>
                        <td align="right">652</td>
                        <td align="right">11070</td>
                        <td align="right">82636</td>
                     </tr>
                     <tr>
                        <td align="center">KADID-10k</td>
                        <td align="center">Venezuela</td>
                        <td align="right">68</td>
                        <td align="right">89</td>
                        <td align="right">1271</td>
                     </tr>
                     <tr>
                        <td align="center">Subset</td>
                        <td align="center">Egypt</td>
                        <td align="right">24</td>
                        <td align="right">89</td>
                        <td align="right">792</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Japan</td>
                        <td align="right">3298</td>
                        <td align="right">1860</td>
                        <td align="right">129244</td>
                     </tr>
                     <tr>
                        <td align="center">NIVD</td>
                        <td align="center">Brazil</td>
                        <td align="right">3264</td>
                        <td align="right">1860</td>
                        <td align="right">127720</td>
                     </tr>
                     <tr>
                        <td align="center">Full set</td>
                        <td align="center">US</td>
                        <td align="right">3287</td>
                        <td align="right">1860</td>
                        <td align="right">124308</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="right">2963</td>
                        <td align="right">1860</td>
                        <td align="right">112200</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Japan</td>
                        <td align="right">3298</td>
                        <td align="right">1488</td>
                        <td align="right">108164</td>
                     </tr>
                     <tr>
                        <td align="center">NIVD</td>
                        <td align="center">Brazil</td>
                        <td align="right">3264</td>
                        <td align="right">1488</td>
                        <td align="right">121620</td>
                     </tr>
                     <tr>
                        <td align="center">Subset</td>
                        <td align="center">US</td>
                        <td align="right">3287</td>
                        <td align="right">1488</td>
                        <td align="right">111328</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="right">2963</td>
                        <td align="right">1488</td>
                        <td align="right">109060</td>
                     </tr>
                  </tbody>
               </table>
            </table-wrap><p>KonIQ-10k and KADID-10k were collected by crowdsourcing without restrictions. This means that subjects from any country were accepted as long as they met the qualification requirements. For both datasets, subjects from over 70 countries contributed. For many of these countries, only very few subjects are included in the dataset. In addition, there was no fixed number of stimuli that a respondent could rate. Therefore, the resulting ratings are not evenly distributed across the test subjects and countries.</p>
            <p>A key challenge in analyzing large-scale crowdsourced datasets like KonIQ-10k and KADID-10k is the sparse and uneven distribution of ratings. A single image may have received numerous ratings from one country but none from another, hindering reliable estimation of country-specific effects. To address this, we created balanced subsets focusing on a smaller set of images with more comparable numbers of ratings across selected countries. This balancing improves the statistical power for estimating country-specific parameters, enabling more robust cross-cultural comparisons.</p>
            <p>For this reason, we considered two approaches in our analysis of KonIQ-10k and KADID-10k, which differ in the scope and balance of the ratings between countries and images. In the first approach, we considered all available ratings. However, we focused on the four countries that provided the most ratings and grouped the remaining ratings into a fifth category labeled &#x2018;Other&#x2019;. A summary of the resulting breakdown into five categories is shown in Table
<xref ref-type="table" rid="jpi0187tabIV">III</xref>, which shows that even between the four countries with the most subjects and ratings, there are significant differences in the numbers or ratings.</p>
            <p>Therefore, in our second approach, we limited the dataset to only two countries for KonIQ-10k and KADID-10k, and to obtain a more balanced, albeit much smaller, subset. To this end, we applied the following criteria to the first dataset, KonIQ-10k.</p>
            <list list-type="arabic">
               <list-item>
                  <label>(1)</label>
                  <p><bold>Selection of countries.</bold> We identified the two countries with the most ratings: India and Venezuela. To ensure a balanced representation, we first selected 4500 images with the most ratings from the country with the second highest ratings (Venezuela). We then extracted the ratings for the same images from the country with the most ratings (India). In this way, we obtained similar number of ratings for both countries.</p>
               </list-item>
               <list-item>
                  <label>(2)</label>
                  <p><bold>Balance.</bold> To create a balanced dataset, we attempted to source an equal proportion of ratings from India and Venezuela for each image. We calculated the total number of ratings and the number of ratings from India for each image. We then calculated the proportion of ratings from India for each image.</p>
               </list-item>
               <list-item>
                  <label>(3)</label>
                  <p><bold>Optimization.</bold> We defined an objective function that calculated the absolute difference between the mean proportion of Indian ratings and 0.5 (the target value for a perfectly balanced dataset). Using a genetic algorithm (package GenSA&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib33">33</xref>]), we optimized the selection of images to minimize this objective function. The optimization process was aimed for 200 images but the result was a subset of 168 images with a mean proportion of Indian ratings close to 0.5.</p>
               </list-item>
               <list-item>
                  <label>(4)</label>
                  <p><bold>Final dataset.</bold> The optimized subset of 168 images, along with their respective ratings from India and Venezuela, formed the final balanced dataset for analysis.</p>
               </list-item>
            </list>
            <p>For the KADID-10k dataset, we proceeded similarly but achieved less balance between the countries. The resulting balanced subsets are also listed in Table&#x00A0;<xref ref-type="table" rid="jpi0187tabIV">III</xref>.</p>
            <p>In contrast to the first two datasets, the Netflix International Video Dataset (NIVD) was developed to capture country-specific differences by collecting an almost equal number of ratings from only four selected countries. The ratings in NIVD were acquired using the SAMVIQ scheme, i.e., a visual analog scale was used together with tick marks and the descriptive ACR labels positioned at 0, 25, 50, 75, and 100% of the interval scale.</p>
            <p>However, despite the continuous nature of the data collection on an interval scale, the resulting score distributions could not be considered normally distributed. This is evident from the overall histogram of all ratings together, which is shown in Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig3">3</xref>. This histogram shows pronounced peaks at positions 0, 25, 50, 75 and 100 percent of the VAS scale, indicating that the subjects generally preferred the ACR labels that were printed at these positions and gave a discrete ACR scale rating instead of a continuous interval scale rating.</p>
            <fig id="jpi0187fig3"><label>Figure&#x00A0;3.</label>
               <caption id="jpi0187fc3">
                  <p>Histogram of ratings in NIVD, showing the quantization of the percentages of the VAS to the five ACR categories.</p>
               </caption>
               <graphic id="jpi0187f3_online" content-type="online" xlink:href="jpi0187f3_online.jpg"/>
            </fig><p>Therefore, we quantized the continuous VAS scores into integer ACR scores, as shown in Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig3">3</xref>, using thresholds midway between the tick marks, i.e., at 12.5, 37.5, 62.5 and 87.5 percent of the scale. We then applied the same methods of discrete data analysis as for the other two datasets.</p>
            <p>The NIVD dataset showed an excellent balance between the countries and the video stimuli. However, there were several videos with fewer ratings. Removing these stimuli and only keeping those with over 200 ratings created the balanced NIVD subset summarized in Table <xref ref-type="table" rid="jpi0187tabIV">III</xref>.</p>
            <p>To summarize, we compiled three large datasets, each in two versions. The first version consisted of the full datasets with grouped countries that submitted fewer ratings than the four most common ones. The second set consisted of subsets that were more balanced but much smaller. (The anonymized datasets are available, with annotations by subjects and their nationalities, at
<uri xlink:href="database.mmsp-kn.de/vqacountry-database.html">database.mmsp-kn.de/vqacountry-database.html</uri>.)</p>
            <p>For data analysis of ACR/DCR data, we applied MLE of the parameters for our models to the larger versions of the datasets. For the smaller, more balanced datasets, we applied CLMMs with Bayesian parameter estimation. Bayesian estimation for the models of the complete datasets with more than 10,000 parameters would have been computationally intensive to apply.</p>
         </sec>
         <sec id="jpi0187us4-2">
            <label>4.2</label>
            <title>Adaptive Quantized Metric Model with Lapse Rate</title>
            <p>The common statistical models for the perceived quality of sensory stimuli assume a one-dimensional latent quality scale of real numbers that is shared by all subjects, but not directly observable. The actual responses in a subjective experiment are also influenced by the decisional process that is modulated by personal and cultural influence. In addition, a third layer given by errors in the physical action of communicating the decision by, e.g., a mouse click, may distort the decided rating (so-called finger errors or lapses).</p>
            <p>A stimulus <italic>j</italic> corresponds to a particular value <inline-formula><mml:math><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:math></inline-formula> on the real latent quality scale. The quality as perceived by a subject is modeled by a random variable <italic>U</italic><sub><italic>j</italic></sub>. In the most basic model, <italic>U</italic><sub><italic>j</italic></sub> is chosen with a normal distribution centered at the latent quality value <italic>&#x03C8;</italic><sub><italic>j</italic></sub> and with a global variance <italic>&#x03C3;</italic><sup>2</sup> that applies to all stimuli. With this setting, we have <disp-formula id="jpi0187eqn1"><label>(1)</label><mml:math><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mi>W</mml:mi></mml:mrow></mml:math></disp-formula> where <italic>U</italic><sub><italic>j</italic></sub> is the random variable producing the observed opinion score for stimulus <italic>j</italic>, and <italic>W</italic> is a Gaussian random variable <italic>W</italic> &#x223C; <italic>N</italic>(0,1). <italic>&#x03C3;</italic> &#x003E; 0 is the standard deviation of <italic>U</italic><sub><italic>j</italic></sub> and determines the spread of the random variable <italic>u</italic><sub><italic>j</italic></sub>.</p>
            <p>To account for the finite discrete nature of ACR-type data with <italic>K</italic> = 5 categories, we sort the real values of <italic>U</italic><sub><italic>j</italic></sub> into <italic>K</italic> successive intervals. For this purpose, we introduce a monotonic sequence of thresholds <italic>&#x03C4;</italic> = (<italic>&#x03C4;</italic><sub>0</sub>, &#x2026;, <italic>&#x03C4;</italic><sub><italic>K</italic></sub>), <disp-formula id="jpi0187eqn2"><label>(2)</label><mml:math><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x221E;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x221E;</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula> and define the quantization function <inline-formula><mml:math><mml:mrow><mml:msub><mml:mrow><mml:mi>Q</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mspace width="1em"/><mml:mi mathvariant="double-struck">R</mml:mi><mml:mo>&#x2192;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>K</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> by <disp-formula id="jpi0187eqn3"><label>(3)</label><mml:math><mml:mrow><mml:msub><mml:mrow><mml:mi>Q</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mspace width="1em"/><mml:mo>&#x21D4;</mml:mo><mml:mspace width="1em"/><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:mi>u</mml:mi><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula> Given a metric model for the <italic>j</italic>-th stimulus in the form of a continuous random variable <italic>U</italic><sub><italic>j</italic></sub>, we define the corresponding quantized metric model by the discrete random variable <italic>Q</italic><sub><italic>&#x03C4;</italic></sub>(<italic>U</italic><sub><italic>j</italic></sub>). In addition to the quantization, we take into account a small lapse rate 0 &#x2264; <italic>&#x03BB;</italic> &#x226A; 1. This yields a discrete random variable <italic>V</italic> <sub><italic>j</italic></sub> that determines the probability of a rating for category <italic>k</italic> as <disp-formula id="jpi0187eqn4"><label>(4)</label><mml:math><mml:mrow><mml:mo>Pr</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x03C3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x03C3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula> where <italic>G</italic><sub><italic>&#x03C8;</italic><sub><italic>j</italic></sub>, <italic>&#x03C3;</italic></sub> denotes the Gaussian cumulative density function with mean <italic>&#x03C8;</italic><sub><italic>j</italic></sub> and variance <italic>&#x03C3;</italic><sup>2</sup>. Thus, with zero lapse rate, Pr[<italic>V</italic> = <italic>k</italic>] is just the area under the Gaussian between the thresholds <italic>&#x03C4;</italic><sub><italic>k</italic></sub> and <italic>&#x03C4;</italic><sub><italic>k</italic>&#x2212;1</sub>, as shown in Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig1">1</xref>.</p>
            <p>The above model cannot yet distinguish between ratings from different nationalities. To achieve this, we adopted the following parameters separately for each country: the rating spread <italic>&#x03C3;</italic>, the lapse rate <italic>&#x03BB;</italic>, and the category thresholds <italic>&#x03C4;</italic><sub>1</sub>, . . . , <italic>&#x03C4;</italic><sub>4</sub>. Thus, the total number of parameters was equal to the number of stimuli plus six times the number of countries.</p>
            <p>For optimization, we applied an interior point algorithm, implemented in the MATLAB function <italic>fmincon</italic>.</p>
         </sec>
         <sec id="jpi0187us4-3">
            <label>4.3</label>
            <title>Cumulative Link Mixed Effects Models</title>
            <p>For the larger datasets KonIQ-10k and KADID-10k, we had over 10,000 parameters to estimate from about one half to a whole million ratings. For problems of this size, Bayesian estimation takes a very long time (several days on a personal computer or laptop). Therefore, for Bayesian estimation, we computed the parameters only for the smaller balanced subsets.</p>
            <p>Cumulative ordinal models are designed for ordered categorical data like ratings, where the intervals between categories may not be equal. They model the cumulative probability of a response being at or below a certain category, e.g., the probability of a rating being &#x2018;poor&#x2019; (2) or &#x2018;bad&#x2019; (1). This approach respects the ordered nature of the data without assuming equal spacing between categories, unlike metric models. In our Bayesian framework, we used these models to estimate the probabilities of each rating category and the thresholds separating them on the underlying latent scale.</p>
            <p>It also evaluates group-level effects, which include random intercepts for items, permitting each item to have its unique distribution along the latent dimension. Additionally, it can incorporate random intercepts for raters, addressing individual biases in how raters map their assessments onto the ordinal scale.</p>
            <p>By modeling these group-level effects, the hierarchical CLMM accounts for variability from the stimuli and raters, enhancing the accuracy of threshold estimates. This facilitates reliable inferences about differences in how the ordinal rating scale is interpreted across groups.</p>
            <p>We utilized the Bayesian BRMS&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib2">2</xref>] package for R to fit CLMMs to the ordinal rating data of the smaller data subsets. CLMMs are hierarchical and can account for dependencies and variability in the data due to clustered observations, such as multiple ratings from the same subject, the same country, or for the same image/video.</p>
            <p>For the balanced KonIQ-10k and KADID-10k subsets, the models were defined as: <disp-formula id="jpi0187ueqn1"><mml:math><mml:mrow><mml:mtext>rate&#x00A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>thres&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>gr&#x00A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>country</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x223C;</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>|</mml:mo><mml:mtext>image</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula> These models estimated four category thresholds per country and incorporated random intercepts for each image, accounting for variations in the perceived quality of different images. We only estimated the effect of images due to the low level of ratings from each individual rater.</p>
            <p>For the balanced NIVD subset, the CLMM was defined as: <disp-formula id="jpi0187ueqn2"><mml:math><mml:mrow><mml:mtext>rate&#x00A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>thres&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>gr=country</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x223C;</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>|</mml:mo><mml:mtext>video</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>|</mml:mo><mml:mtext>subject</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula> This model estimated four category thresholds (separating the five rating levels) for each country. It also included random intercepts for both videos and raters, allowing for variation in rating tendencies across different videos and individual raters. Note that the total number of parameters, even for this slightly smaller balanced subset, is larger than 14,000. Therefore, the computations took exceptionally long, nearly two days.</p>
            <p></p>
         </sec>
      </sec>
      <sec id="jpi0187us5">
         <label>5.</label>
         <title>Results</title>
         <sec id="jpi0187us5-1">
            <label>5.1</label>
            <title>Data Analysis of the Original Datasets using Maximum Likelihood Estimation</title>
            <p>The results of the data analysis using the quantized metric models with successive intervals is shown in Table&#x00A0;<xref ref-type="table" rid="jpi0187tabV">IV</xref> and Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig4">4</xref>. The scale values for stimuli were also estimated, but are not shown here to keep the focus on the country-specific differences.</p>
            <table-wrap id="jpi0187tabV">
               <label>Table&#x00A0;IV.</label>
               <caption id="jpi0187tcV">
                  <p>Results for the full datasets with 95% confidence intervals, compare with Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig4">4</xref>. The most important results are the category thresholds that define the intervals on the latent quality scale corresponding to the five categories.</p>
               </caption>
               <table frame="void">
                  <colgroup>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                  </colgroup>
                  <thead>
                     <tr>
                        <th align="center">Dataset</th>
                        <th align="center">Country</th>
                        <th align="center">Std deviation</th>
                        <th align="center">Lapse rate</th>
                        <th colspan="4" align="center">Category thresholds</th>
                     </tr>
                     <tr>
                        <th align="center"/>
                        <th align="center"/>
                        <th align="center"><italic>&#x03C3;</italic></th>
                        <th align="center"><italic>&#x03BB;</italic></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>1</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>2</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>3</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>4</sub></th>
                     </tr>
                  </thead>
                  <tbody>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center">0.5050&#x2009;&#x00B1;&#x2009;0.0016</td>
                        <td align="center">0.0039&#x2009;&#x00B1;&#x2009;0.0004</td>
                        <td align="center">1.3867&#x2009;&#x00B1;&#x2009;0.0071</td>
                        <td align="center">2.3608&#x2009;&#x00B1;&#x2009;0.0028</td>
                        <td align="center">3.4061&#x2009;&#x00B1;&#x2009;0.0022</td>
                        <td align="center">4.6590&#x2009;&#x00B1;&#x2009;0.0087</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">Venezuela</td>
                        <td align="center">0.4179&#x2009;&#x00B1;&#x2009;0.0022</td>
                        <td align="center">0.0078&#x2009;&#x00B1;&#x2009;0.0011</td>
                        <td align="center">1.6998&#x2009;&#x00B1;&#x2009;0.0086</td>
                        <td align="center">2.5069&#x2009;&#x00B1;&#x2009;0.0042</td>
                        <td align="center">3.2330&#x2009;&#x00B1;&#x2009;0.0033</td>
                        <td align="center">4.1030&#x2009;&#x00B1;&#x2009;0.0064</td>
                     </tr>
                     <tr>
                        <td align="center">KonIQ-10k</td>
                        <td align="center">Russia</td>
                        <td align="center">0.3813&#x2009;&#x00B1;&#x2009;0.0030</td>
                        <td align="center">0.0038&#x2009;&#x00B1;&#x2009;0.0011</td>
                        <td align="center">1.7161&#x2009;&#x00B1;&#x2009;0.0116</td>
                        <td align="center">2.5190&#x2009;&#x00B1;&#x2009;0.0058</td>
                        <td align="center">3.2646&#x2009;&#x00B1;&#x2009;0.0045</td>
                        <td align="center">4.2292&#x2009;&#x00B1;&#x2009;0.0119</td>
                     </tr>
                     <tr>
                        <td align="center">Images, ACR</td>
                        <td align="center">Serbia</td>
                        <td align="center">0.3811&#x2009;&#x00B1;&#x2009;0.0035</td>
                        <td align="center">0.0087&#x2009;&#x00B1;&#x2009;0.0018</td>
                        <td align="center">1.7089&#x2009;&#x00B1;&#x2009;0.0138</td>
                        <td align="center">2.5043&#x2009;&#x00B1;&#x2009;0.0066</td>
                        <td align="center">3.2889&#x2009;&#x00B1;&#x2009;0.0051</td>
                        <td align="center">4.1533&#x2009;&#x00B1;&#x2009;0.0116</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Other</td>
                        <td align="center">0.4132&#x2009;&#x00B1;&#x2009;0.0012</td>
                        <td align="center">0.0053&#x2009;&#x00B1;&#x2009;0.0005</td>
                        <td align="center">1.6536&#x2009;&#x00B1;&#x2009;0.0050</td>
                        <td align="center">2.5007&#x2009;&#x00B1;&#x2009;0.0023</td>
                        <td align="center">3.2752&#x2009;&#x00B1;&#x2009;0.0019</td>
                        <td align="center">4.2205&#x2009;&#x00B1;&#x2009;0.0044</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">Venezuela</td>
                        <td align="center">0.6372&#x2009;&#x00B1;&#x2009;0.0024</td>
                        <td align="center">0.0065&#x2009;&#x00B1;&#x2009;0.0009</td>
                        <td align="center">1.7941&#x2009;&#x00B1;&#x2009;0.0047</td>
                        <td align="center">2.7047&#x2009;&#x00B1;&#x2009;0.0036</td>
                        <td align="center">3.2799&#x2009;&#x00B1;&#x2009;0.0036</td>
                        <td align="center">4.1751&#x2009;&#x00B1;&#x2009;0.0045</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Egypt</td>
                        <td align="center">0.6910&#x2009;&#x00B1;&#x2009;0.0104</td>
                        <td align="center">0.0105&#x2009;&#x00B1;&#x2009;0.0042</td>
                        <td align="center">1.5611&#x2009;&#x00B1;&#x2009;0.0218</td>
                        <td align="center">2.7732&#x2009;&#x00B1;&#x2009;0.0147</td>
                        <td align="center">3.2528&#x2009;&#x00B1;&#x2009;0.0147</td>
                        <td align="center">4.4835&#x2009;&#x00B1;&#x2009;0.0211</td>
                     </tr>
                     <tr>
                        <td align="center">KADID-10k</td>
                        <td align="center">India</td>
                        <td align="center">0.6442&#x2009;&#x00B1;&#x2009;0.0120</td>
                        <td align="center">0.0144&#x2009;&#x00B1;&#x2009;0.0056</td>
                        <td align="center">1.7302&#x2009;&#x00B1;&#x2009;0.0240</td>
                        <td align="center">2.8174&#x2009;&#x00B1;&#x2009;0.0178</td>
                        <td align="center">3.3726&#x2009;&#x00B1;&#x2009;0.0179</td>
                        <td align="center">4.4110&#x2009;&#x00B1;&#x2009;0.0240</td>
                     </tr>
                     <tr>
                        <td align="center">Images, DCR</td>
                        <td align="center">Russia</td>
                        <td align="center">0.5403&#x2009;&#x00B1;&#x2009;0.0111</td>
                        <td align="center">0.0058&#x2009;&#x00B1;&#x2009;0.0038</td>
                        <td align="center">1.8995&#x2009;&#x00B1;&#x2009;0.0221</td>
                        <td align="center">2.7440&#x2009;&#x00B1;&#x2009;0.0185</td>
                        <td align="center">3.3349&#x2009;&#x00B1;&#x2009;0.0183</td>
                        <td align="center">4.1006&#x2009;&#x00B1;&#x2009;0.0208</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Other</td>
                        <td align="center">0.6013&#x2009;&#x00B1;&#x2009;0.0043</td>
                        <td align="center">0.0125&#x2009;&#x00B1;&#x2009;0.0019</td>
                        <td align="center">1.8659&#x2009;&#x00B1;&#x2009;0.0082</td>
                        <td align="center">2.7664&#x2009;&#x00B1;&#x2009;0.0065</td>
                        <td align="center">3.3550&#x2009;&#x00B1;&#x2009;0.0065</td>
                        <td align="center">4.1922&#x2009;&#x00B1;&#x2009;0.0080</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Japan</td>
                        <td align="center">0.7028&#x2009;&#x00B1;&#x2009;0.0038</td>
                        <td align="center">0.0356&#x2009;&#x00B1;&#x2009;0.0026</td>
                        <td align="center">1.8249&#x2009;&#x00B1;&#x2009;0.0079</td>
                        <td align="center">2.8243&#x2009;&#x00B1;&#x2009;0.0054</td>
                        <td align="center">3.7092&#x2009;&#x00B1;&#x2009;0.0056</td>
                        <td align="center">4.5132&#x2009;&#x00B1;&#x2009;0.0084</td>
                     </tr>
                     <tr>
                        <td align="center">NIVD</td>
                        <td align="center">Brazil</td>
                        <td align="center">0.6343&#x2009;&#x00B1;&#x2009;0.0035</td>
                        <td align="center">0.0353&#x2009;&#x00B1;&#x2009;0.0027</td>
                        <td align="center">1.8820&#x2009;&#x00B1;&#x2009;0.0071</td>
                        <td align="center">2.6355&#x2009;&#x00B1;&#x2009;0.0049</td>
                        <td align="center">3.3261&#x2009;&#x00B1;&#x2009;0.0049</td>
                        <td align="center">4.1522&#x2009;&#x00B1;&#x2009;0.0068</td>
                     </tr>
                     <tr>
                        <td align="center">Videos, ACR/VAS</td>
                        <td align="center">US</td>
                        <td align="center">0.7603&#x2009;&#x00B1;&#x2009;0.0044</td>
                        <td align="center">0.0543&#x2009;&#x00B1;&#x2009;0.0036</td>
                        <td align="center">1.6418&#x2009;&#x00B1;&#x2009;0.0091</td>
                        <td align="center">2.4355&#x2009;&#x00B1;&#x2009;0.0059</td>
                        <td align="center">3.1706&#x2009;&#x00B1;&#x2009;0.0055</td>
                        <td align="center">4.1098&#x2009;&#x00B1;&#x2009;0.0075</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center">0.7467&#x2009;&#x00B1;&#x2009;0.0044</td>
                        <td align="center">0.0416&#x2009;&#x00B1;&#x2009;0.0033</td>
                        <td align="center">1.5897&#x2009;&#x00B1;&#x2009;0.0099</td>
                        <td align="center">2.4910&#x2009;&#x00B1;&#x2009;0.0061</td>
                        <td align="center">3.2721&#x2009;&#x00B1;&#x2009;0.0058</td>
                        <td align="center">4.2185&#x2009;&#x00B1;&#x2009;0.0082</td>
                     </tr>
                  </tbody>
               </table>
            </table-wrap><fig id="jpi0187fig4"><label>Figure&#x00A0;4.</label>
               <caption id="jpi0187fc4">
                  <p>Country-specific thresholds estimated with maximum likelihood estimation (MLE). For the numerical values and confidence intervals, see Table&#x00A0;<xref ref-type="table" rid="jpi0187tabV">IV</xref>. Direct comparisons of countries between experiments are not recommended due to variations in experimental design, including differences in stimuli (videos versus images) and task formats (ACR versus DCR).</p>
               </caption>
               <graphic id="jpi0187f4_online" content-type="online" xlink:href="jpi0187f4_online.jpg"/>
            </fig><p>Clearly, most thresholds <italic>&#x03C4;</italic><sub><italic>k</italic></sub>, and also the standard deviations and lapse rates, significantly differ between countries. For example, in the first two rows of the table for the KonIQ-10k ratings of India and Venezuela, all parameters differ between the countries without overlap of 95% confidence intervals.</p>
            <p>These results are elucidated by considering an example in detail (Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig5">5</xref>). In NIVD, the video 964 was scaled by the statistical model at quality <italic>&#x03BC;</italic> = 4.360. The distribution of the latent perceived video quality corresponded to the model parameters for Japan and the US (lines 11 and 13 in Table&#x00A0;<xref ref-type="table" rid="jpi0187tabV">IV</xref>). Based on the assumption of a globally unique perceived quality, we have that for all countries, the mean of the distribution is at <italic>&#x03BC;</italic> = 4.360. The dispersion of the qualities, the lapse rates, and the ACR category thresholds are different between countries, though. This implies that probabilities for the ACR categories also differ between the countries. These are shown in table included in&#x00A0;<xref ref-type="fig" rid="jpi0187fig5">5</xref>.</p>
            <fig id="jpi0187fig5"><label>Figure&#x00A0;5.</label>
               <caption id="jpi0187fc5">
                  <p>Results of the model with successive intervals for the video stimulus numbered 964 in the Netflix International Video Dataset, shown for Japan and US. The category thresholds <italic>&#x03C4;</italic><sub>2</sub>, <italic>&#x03C4;</italic><sub>3</sub>, and <italic>&#x03C4;</italic><sub>4</sub> for the subjective ratings of the perceived quality in Japan are larger than those in the US. In effect, according to the statistical model, the sampled US population generally preferred higher ACR ratings for the video stimuli in NIVD. The table shows the numerical values of the resulting probabilities of the ACR categories for this example. Each of the model probabilities is the corresponding area under the curves plus 1/5 of the lapse rate (0.0356 for Japan and 0.0543 for the US), see Equation (<xref ref-type="disp-formula" rid="jpi0187eqn4">4</xref>). For comparison, the fractions of the collected VAS ratings that were quantized to ACR for this study are shown.</p>
               </caption>
               <graphic id="jpi0187f5_online" content-type="online" xlink:href="jpi0187f5_online.jpg"/>
            </fig><p>The table also confirmed that for this example that the model presents an accurate fit to the collected ratings. The corresponding probabilities for the five categories are close to each other; the measured MOS from the collected ratings differs from the predicted MOS of the model by only about 0.5%.</p>
            <p>The estimated lapse rates generally are very small, around 1% for the assessment of the two image datasets, and 3 to 5 % for the video dataset. The larger values for NIVD could be attributed to the more complicated SAMVIQ user interface that was applied for this dataset&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib1">1</xref>]. Participants evaluated the videos in groups of five by interactively selecting which video to play and rated the visual quality using four sliders. Moreover, they were also allowed to modify their votes as many times as they wished.</p>
            <p>The country specific differences are even smaller and probably not influential even though statistically significant in some cases. Wichmann and Hill &#x00A0;[<xref ref-type="bibr" rid="jpi0187bib32">32</xref>] have cautioned that the lapse parameter is, in general, not a very good estimator of the subjects&#x2019; true lapse rate. Thus, we hesitate to interpret these differences and would recommend for future studies to use only a single global lapse rate for each dataset.</p>
         </sec>
         <sec id="jpi0187us5-2">
            <label>5.2</label>
            <title>Data Analysis of the Balanced Datasets using Bayesian Estimation</title>
            <p>The results from the CLMMs are shown in Table&#x00A0;<xref ref-type="table" rid="jpi0187tabVI">V</xref> and Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig6">6</xref>. They confirm that for the smaller balanced data subsets, the estimated thresholds, which demarcate the boundaries between successive ordinal rating categories, vary by country. For example, consider the quality of a video stimulus from the NIVD dataset for which the probability is at least 50% to obtain a rating of &#x2018;excellent&#x2019;. For observers from the US a video quality of only 1.67 on the CLMM scale was sufficient for that, while for Japanese viewers, the video quality had to be at least 2.45. This is a significant difference, corresponding to roughly one half on the 5-level ACR scale.</p>
            <fig id="jpi0187fig6"><label>Figure&#x00A0;6.</label>
               <caption id="jpi0187fc6">
                  <p>Country-specific thresholds estimated with CLMMs. This figure displays the threshold estimates of image ratings for six countries, derived from balancing three quality databases. Each point represents an estimate, with horizontal lines indicating the 95% confidence intervals. The model accounts for variability in rating tendencies across images for all datasets and additionally, raters for only the NIVD dataset, estimating cultural differences in rating scale usage. We again discourage direct comparisons across experiments due to different designs.</p>
               </caption>
               <graphic id="jpi0187f6_online" content-type="online" xlink:href="jpi0187f6_online.jpg"/>
            </fig><table-wrap id="jpi0187tabVI">
               <label>Table&#x00A0;V.</label>
               <caption id="jpi0187tcVI">
                  <p>Results for the reduced, balanced data subsets from the CLMM with 95% confidence intervals.</p>
               </caption>
               <table frame="void">
                  <colgroup>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                  </colgroup>
                  <thead>
                     <tr>
                        <th align="center">Dataset</th>
                        <th align="center">Country</th>
                        <th colspan="4" align="center">Intercepts for thresholds</th>
                     </tr>
                     <tr>
                        <th align="center"/>
                        <th align="center"/>
                        <th align="center"><italic>&#x03C4;</italic><sub>1</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>2</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>3</sub></th>
                        <th align="center"><italic>&#x03C4;</italic><sub>4</sub></th>
                     </tr>
                  </thead>
                  <tbody>
                     <tr>
                        <td align="center">KonIQ-10k</td>
                        <td align="center">India</td>
                        <td align="center"> &#x2212; 3.36 &#x00B1; 0.22</td>
                        <td align="center"> &#x2212; 1.45 &#x00B1; 0.17</td>
                        <td align="center">0.69 &#x00B1; 0.16</td>
                        <td align="center">3.09 &#x00B1; 0.21</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">Venezuela</td>
                        <td align="center"> &#x2212; 2.97 &#x00B1; 0.21</td>
                        <td align="center"> &#x2212; 1.29 &#x00B1; 0.17</td>
                        <td align="center">0.35 &#x00B1; 0.16</td>
                        <td align="center">2.34 &#x00B1; 0.18</td>
                     </tr>
                     <tr>
                        <td align="center">KADID-10k</td>
                        <td align="center">Venezuela</td>
                        <td align="center"> &#x2212; 1.89 &#x00B1; 0.31</td>
                        <td align="center"> &#x2212; 0.51 &#x00B1; 0.30</td>
                        <td align="center">0.38 &#x00B1; 0.29</td>
                        <td align="center">1.71 &#x00B1; 0.30</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Egypt</td>
                        <td align="center"> &#x2212; 2.04 &#x00B1; 0.31</td>
                        <td align="center"> &#x2212; 0.30 &#x00B1; 0.29</td>
                        <td align="center">0.29 &#x00B1; 0.30</td>
                        <td align="center">2.20 &#x00B1; 0.31</td>
                     </tr>
                     <tr>
                        <td align="center">NIVD</td>
                        <td align="center">Japan</td>
                        <td align="center"> &#x2212; 2.15 &#x00B1; 0.08</td>
                        <td align="center"> &#x2212; 0.45 &#x00B1; 0.09</td>
                        <td align="center">1.09 &#x00B1; 0.08</td>
                        <td align="center">2.45 &#x00B1; 0.08</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Brazil</td>
                        <td align="center"> &#x2212; 2.20 &#x00B1; 0.08</td>
                        <td align="center"> &#x2212; 0.83 &#x00B1; 0.08</td>
                        <td align="center">0.47 &#x00B1; 0.08</td>
                        <td align="center">1.95 &#x00B1; 0.08</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">   US</td>
                        <td align="center"> &#x2212; 2.35 &#x00B1; 0.08</td>
                        <td align="center"> &#x2212; 1.09 &#x00B1; 0.08</td>
                        <td align="center">0.15 &#x00B1; 0.08</td>
                        <td align="center">1.67 &#x00B1; 0.08</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center"> &#x2212; 2.56 &#x00B1; 0.08</td>
                        <td align="center"> &#x2212; 1.04 &#x00B1; 0.08</td>
                        <td align="center">0.34 &#x00B1; 0.08</td>
                        <td align="center">1.93 &#x00B1; 0.08</td>
                     </tr>
                  </tbody>
               </table>
            </table-wrap><p>The characteristics of the data, such as the number of observations and the balance of ratings across categories influenced the precision of these estimates. Notably, the NIVD dataset, which is well-balanced and has a significantly larger number of ratings in the balanced subset compared to the other datasets, yielded the highest precision in estimates.</p>
            <p>For the NIVD and KonIQ-10k data subsets, the 95% CI of the estimates did not overlap in few cases, indicating discernible differences between the countries. However, for the KADID-10k data subset, which had the smallest number of images and ratings, the 95% CI was wider and overlapped, indicating less precision in the estimates.</p>
            <p>To study country-specific differences of extreme ratings we computed their occurrences by (a) averaging the probability Pr[<italic>V</italic> <sub><italic>j</italic></sub> &#x2208;{1,5}] from the Thurstonian model (<xref ref-type="disp-formula" rid="jpi0187eqn4">4</xref>) over all stimuli <italic>j</italic> per country, (b) the corresponding averages derived from the CLMM model applied to the balanced subsets of the full datasets, and (c) the sum of the empirical proportions of ratings at ACR levels 1 and 5. Table <xref ref-type="table" rid="jpi0187tabVII">VI</xref> shows the summarized results. Clearly, there are significant differences between countries. The largest differences were found for the ACR modality in KonIQ-10k, in which extreme ratings from Venezuela were nearly three times more likely than those of India.</p>
            <table-wrap id="jpi0187tabVII">
               <label>Table&#x00A0;VI.</label>
               <caption id="jpi0187tcVII">
                  <p>Probabilities of extreme ratings. Results of the Thurstonian quantized metric model for the full datasets, the CLMM model for the balanced data subsets, and the empirical proportions of ratings in extreme categories 1 and 5 together. Rows are sorted according to their magnitudes in the full dataset.</p>
               </caption>
               <table frame="void">
                  <colgroup>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                     <col align="center"/>
                  </colgroup>
                  <thead>
                     <tr>
                        <th align="center"/>
                        <th align="center"/>
                        <th colspan="2" align="center">Full dataset</th>
                        <th colspan="2" align="center">Balanced subset</th>
                     </tr>
                     <tr>
                        <th align="center"> Dataset</th>
                        <th align="center">Country</th>
                        <th align="center">Prob</th>
                        <th align="center">ACR Prop</th>
                        <th align="center">Prob</th>
                        <th align="center">ACR Prop</th>
                     </tr>
                  </thead>
                  <tbody>
                     <tr>
                        <td align="center"/>
                        <td align="center">Venezuela</td>
                        <td align="center">0.0662</td>
                        <td align="center">0.0674</td>
                        <td align="center">0.0723</td>
                        <td align="center">0.0736</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Serbia</td>
                        <td align="center">0.0508</td>
                        <td align="center">0.0505</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center">KonIQ</td>
                        <td align="center">Other</td>
                        <td align="center">0.0462</td>
                        <td align="center">0.0479</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center"> Russia</td>
                        <td align="center">0.0428</td>
                        <td align="center">0.0462</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center">0.0220</td>
                        <td align="center">0.0201</td>
                        <td align="center">0.0254</td>
                        <td align="center">0.0244</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center"> Russia</td>
                        <td align="center">0.346</td>
                        <td align="center">0.375</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Other</td>
                        <td align="center">0.327</td>
                        <td align="center">0.347</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center">KADID</td>
                        <td align="center">Venezuela</td>
                        <td align="center">0.322</td>
                        <td align="center">0.337</td>
                        <td align="center">0.304</td>
                        <td align="center">0.307</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center">0.260</td>
                        <td align="center">0.279</td>
                        <td align="center">&#x2013;</td>
                        <td align="center">&#x2013;</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Egypt</td>
                        <td align="center">0.225</td>
                        <td align="center">0.240</td>
                        <td align="center">0.215</td>
                        <td align="center">0.213</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">   US</td>
                        <td align="center">0.260</td>
                        <td align="center">0.255</td>
                        <td align="center">0.255</td>
                        <td align="center">0.251</td>
                     </tr>
                     <tr>
                        <td align="center">NIVD</td>
                        <td align="center">Brazil</td>
                        <td align="center">0.249</td>
                        <td align="center">0.238</td>
                        <td align="center">0.228</td>
                        <td align="center">0.235</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  India</td>
                        <td align="center">0.223</td>
                        <td align="center">0.213</td>
                        <td align="center">0.209</td>
                        <td align="center">0.211</td>
                     </tr>
                     <tr>
                        <td align="center"/>
                        <td align="center">  Japan</td>
                        <td align="center">0.191</td>
                        <td align="center">0.189</td>
                        <td align="center">0.184</td>
                        <td align="center">0.183</td>
                     </tr>
                  </tbody>
               </table>
            </table-wrap><p>Focusing on the Japanese and US subsamples, which were balanced in our NIVD dataset, we observed clear national differences in rating patterns in Table&#x00A0;<xref ref-type="table" rid="jpi0187tabVII">VI</xref>. The combined proportion of extreme ratings was about 25% for the US group versus only about 19% for the Japanese raters.</p>
            <p>Our analysis revealed systematic differences in how US and Japanese participants utilize the rating scales. The observed shift in category thresholds suggests that a video typically judged as &#x2018;good&#x2019; by Japanese viewers might typically be judged as &#x2018;excellent&#x2019; by US viewers.</p>
            <p>We note that the methods of assessment of occurrences of extreme ratings unanimously agree on the ranking of the countries according to the frequencies of extreme ratings. The Thurstonian and CLMM probabilities were very close to the empirically measured frequencies.</p>
         </sec>
         <sec id="jpi0187us5-3">
            <label>5.3</label>
            <title>Comparison</title>
            <p>When comparing the results of this analysis of the balanced datasets using CLMMs (Table&#x00A0;<xref ref-type="table" rid="jpi0187tabVI">V</xref> and Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig6">6</xref>) with the previous ones for the original, large datasets (Table&#x00A0;<xref ref-type="table" rid="jpi0187tabV">IV</xref> and Fig.&#x00A0;<xref ref-type="fig" rid="jpi0187fig4">4</xref>), the varying conditions used to derive these estimates have to be accounted for. Besides the differences in the dataset sizes, the balancing, and the computational methods, the mathematical models are distinct. With MLE, we included adaptive standard deviations and lapse rates, while for the CLMM model we used a subject model for the case of NIVD. However, the scatter plot of the thresholds in Figure&#x00A0;<xref ref-type="fig" rid="jpi0187fig7">7</xref> confirms that the results are similar. In particular, they indicate that the country-specific differences in the thresholds as derived from the original large datasets cannot be attributed to the differing numbers of subjects from the countries.</p>
            <fig id="jpi0187fig7"><label>Figure&#x00A0;7.</label>
               <caption id="jpi0187fc7">
                  <p>Scatter plot of the four thresholds of the ACR categories, estimated in the large datasets by MLE and the balanced datasets by CLMM.</p>
               </caption>
               <graphic id="jpi0187f7_online" content-type="online" xlink:href="jpi0187f7_online.jpg"/>
            </fig></sec>
      </sec>
      <sec id="jpi0187us6">
         <label>6.</label>
         <title>Limitations</title>
         <p>This study leveraged large-scale, cross-cultural datasets, but limitations related to sampling and demographic information require careful consideration. Addressing these limitations is crucial for appropriately interpreting our findings and for guiding future research in the field.</p>
         <p>Regarding sample size and representativeness, the NIVD dataset, with 14,450 participants before outlier removal, represented a significant advancement in multimedia quality assessment research, exceeding typical sample sizes by order of magnitude and, to our knowledge, comprised the largest publicly available cross-cultural video quality study. Though this large sample size contributes to the statistical power of our analyses, it&#x2019;s important to acknowledge that NIVD, while designed to be representative of targeted age ranges (18&#x2013;30, 31&#x2013;44, and 45&#x2013;65) and gender within the US, Japan, India, and Brazil, does not encompass the full diversity of global populations. Furthermore, the specific sampling methodology employed by Survey Sampling International (SSI) is not publicly disclosed, which limits a more precise evaluation of the sample&#x2019;s representativeness. Future research aimed at generalizing findings to broader populations should prioritize even wider cultural representation and transparently report sampling methodologies.</p>
         <p>Another limitation was the use of country of residence as the sole proxy for cultural background. While providing a useful starting point, this approach may not fully capture the nuances of cultural influences on response styles, as it overlooks within-country variations, such as regional, ethnic, or linguistic differences. Furthermore, individual factors like age, gender, education, personality, and other individual characteristics may interact with cultural factors to influence how people perceive and rate image quality. Critically, none of the datasets used in this study (NIVD, KonIQ-10k, and KADID-10k) made detailed demographic data readily available, precluding a more thorough investigation of these potentially confounding factors. Future studies should incorporate more detailed and multidimensional measures of both cultural background and individual differences&#x2014;and ensure the public availability of such data&#x2014;to better understand these complex interactions and their impact on response styles.</p>
         <p>Despite these limitations, the scale and scope of the datasets employed, particularly NIVD&#x2019;s unique size and cross-cultural design, provide valuable insights into the complex relationship between culture and subjective quality perception, laying a strong foundation for future work.</p>
         <p>We introduced the lapse rate in the statistical model for ACR/DCR quality assessment. A general analysis of the advantages and limitations of lapse rates in quantized metric models is worthwhile, but is beyond the scope of this study.</p>
         <p>Though this study focused on cross-cultural variations in rating scale usage, future research could explore the relationship between our model&#x2019;s predictions and traditional MOS values. Such a comparison could provide further insights into the practical implications of our findings for established practices in image quality assessment.</p>
         <p>One limitation is the long runtime for calculating the parameters of quantized metric models if the dataset is very large. For example, calculating the 10092 parameters for KonIQ-10k even with MLE took 13 hours using Matlab on a MacBook Pro (2.6&#x2009; GHz 6-core Intel Core i7 processor). However, the MLE for NIVD with 1884 parameters, took less than 30 minutes. We did not perform any code optimization and did not try alternative solvers such as ADAM&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib17">17</xref>].</p>
      </sec>
      <sec id="jpi0187us7">
         <label>7.</label>
         <title>Conclusion: Navigating Cultural Nuances in Image Quality Assessment</title>
         <p>Our study explored the impact of cultural factors on image quality assessment by adapting statistical models to include country-specific components. Across three large-scale datasets (KonIQ-10k, KADID-10k, NIVD) containing subjective image and video quality ratings from several countries, we found significant nation-based differences in extreme response styles. Notably, our findings indicate that US observers exhibited a higher propensity to provide extreme ratings compared to Japanese observers when evaluating the same video stimuli. We estimated that US observers employ extreme ratings 35&#x2013;39% more frequently than their Japanese counterparts (Table <xref ref-type="table" rid="jpi0187tabVII">VI</xref>). Remarkably, this observed discrepancy aligns closely with the 41% higher likelihood reported over five decades ago&#x00A0;[<xref ref-type="bibr" rid="jpi0187bib34">34</xref>], reinforcing long-standing cross-cultural research on systematic differences in extreme response tendencies between individualistic and collectivistic cultures like the US and Japan.</p>
         <p>These results underscore the importance of considering cultural factors when designing and interpreting subjective quality assessments. Failing to account for these differences could lead to biased or inaccurate conclusions about user experience across different cultural groups.</p>
         <p>A key strength of this study was the utilization of quantized metric models as a unified statistical framework. Parameters were computed by maximum likelihood estimation for very large datasets and by Bayesian estimation using cumulative link mixed effects models (CLMMs) for the smaller ones. Our models explicitly model the ordinal rating process without assuming equal category spacing. Furthermore, by incorporating random effects, CLMMs disentangle stimuli quality estimates from overall rater biases and response patterns. Their hierarchical structure facilitated quantifying culture-specific effects like divergent rating thresholds and extreme tendencies, while simultaneously yielding posterior distributions for the latent quality of each image/video.</p>
         <p>This approach represents a significant methodological contribution, merging cross-cultural psychological inquiry with applied multimedia quality assessment aims, and provides a rigorous psychometric technique for disentangling cultural influences from true quality perceptions. As the field increasingly relies on crowdsourced remote data collection, such principled methods are crucial for reliable cross-population comparisons and quality predictions.</p>
         <p>Our results highlight the importance of considering cultural nuances in image quality assessment to avoid distorted interpretations. Accounting for differences in response styles is vital for meaningful cross-national comparisons of subjective rating data. These findings contribute to a more comprehensive global understanding of image quality perceptions and have implications for the collection and analysis of current and future datasets.</p>
         <p>To further refine this understanding, we recommend exploring the specific cultural factors driving the observed response style variations. Potential influences include individualism/collectivism, values of moderation/expressiveness, and preferences for direct/indirect communication. Understanding these roots can guide designing more culturally appropriate assessment surveys that minimize the biasing effects of extreme response tendencies. While we have shown that datasets can be balanced after data collection, we also advocate for the proactive balancing of nationalities in these datasets, as exemplified by the NIVD dataset, when possible. Ultimately, such adjustments will ensure more accurate cross-cultural comparisons of perceived quality in our increasingly globalized multimedia landscape. Additionally, it may aid in creating more culturally relevant and effective surveys and interventions.</p>
         <p></p>
      </sec>
   </body>
   <back>
      <ack>
         <title>Acknowledgment</title>
         <p>Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)&#x2013; DFG Project ID 251654672 &#x2013;TRR 161 and the Research Council of Norway, grant number 324663. We thank Vlad Hosu and Mirko Dulfer for assistance during curation of the raw KonIQ-10k dataset, and Zhi Li, Christos Bampis, and Shaolin Su for assistance with the NIVD dataset.</p>
      </ack>
      <ref-list content-type="numerical">
         <title>References</title>
         <ref id="jpi0187bib1">
            <label>1</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Bampis</surname>
                     <given-names>C.&#x00A0;G.</given-names>
                  </name>
                  <name>
                     <surname>Krasula</surname>
                     <given-names>L.</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>Z.</given-names>
                  </name>
                  <name>
                     <surname>Akhtar</surname>
                     <given-names>O.</given-names>
                  </name>
               </person-group>
               <year>2023</year>
               <article-title>Measuring and predicting perceptions of video quality across screen sizes with crowdsourcing</article-title>
               <source>15th Int&#x2019;l. Conf. Quality of Multimedia Experience (QoMEX)</source>
               <fpage>13</fpage>
               <lpage>18</lpage>
               <page-range>13&#x2013;8</page-range>
               <publisher-name>IEEE</publisher-name>
               <publisher-loc>Piscataway, NJ</publisher-loc>
               <pub-id pub-id-type="doi">10.1109/QoMEX58391.2023.10178501</pub-id>
            </element-citation></ref>
         <ref id="jpi0187bib2">
            <label>2</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>B&#x00FC;rkner</surname>
                     <given-names>P.-C.</given-names>
                  </name>
               </person-group>
               <year>2017</year>
               <article-title>BRMS: an R package for Bayesian multilevel models using Stan</article-title>
               <source>J. Stat. Software</source>
               <volume>80</volume>
               <fpage>1</fpage>
               <lpage>28</lpage>
               <page-range>1&#x2013;28</page-range>
            </element-citation>
         </ref>
         <ref id="jpi0187bib3">
            <label>3</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>B&#x00FC;rkner</surname>
                     <given-names>P.-C.</given-names>
                  </name>
                  <name>
                     <surname>Vuorre</surname>
                     <given-names>M.</given-names>
                  </name>
               </person-group>
               <year>2019</year>
               <article-title>Ordinal regression models in psychology: a tutorial</article-title>
               <source>Adv. Methods Pract. Psychol. Sci.</source>
               <volume>2</volume>
               <fpage>77</fpage>
               <lpage>101</lpage>
               <page-range>77&#x2013;101</page-range>
            </element-citation>
         </ref>
         <ref id="jpi0187bib4">
            <label>4</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Chen</surname>
                     <given-names>C.</given-names>
                  </name>
                  <name>
                     <surname>Lee</surname>
                     <given-names>S.-Y.</given-names>
                  </name>
                  <name>
                     <surname>Stevenson</surname>
                     <given-names>H.&#x00A0;W.</given-names>
                  </name>
               </person-group>
               <year>1995</year>
               <article-title>Response style and cross-cultural comparisons of rating scales among East Asian and North American students</article-title>
               <source>Psychol. Sci.</source>
               <volume>6</volume>
               <fpage>170</fpage>
               <lpage>175</lpage>
               <page-range>170&#x2013;5</page-range>
               <pub-id pub-id-type="doi">10.1111/j.1467-9280.1995.tb00327.x</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib5">
            <label>5</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Chun</surname>
                     <given-names>K.-T.</given-names>
                  </name>
                  <name>
                     <surname>Campbell</surname>
                     <given-names>J.&#x00A0;B.</given-names>
                  </name>
                  <name>
                     <surname>Yoo</surname>
                     <given-names>J.&#x00A0;H.</given-names>
                  </name>
               </person-group>
               <year>1974</year>
               <article-title>Extreme response style in cross-cultural research: a reminder</article-title>
               <source>J. Cross-Cultural Psychol.</source>
               <volume>5</volume>
               <fpage>465</fpage>
               <lpage>480</lpage>
               <page-range>465&#x2013;80</page-range>
               <pub-id pub-id-type="doi">10.1177/002202217400500407</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib6">
            <label>6</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Clarke</surname>
                     <given-names>I.</given-names>
                     <suffix>III</suffix>
                  </name>
               </person-group>
               <year>2000</year>
               <article-title>Extreme response style in cross-cultural research: an empirical investigation</article-title>
               <source>J. Soc. Behav. Personality</source>
               <volume>15</volume>
               <fpage>137</fpage>
               <lpage>152</lpage>
               <page-range>137&#x2013;52</page-range>
            </element-citation></ref>
         <ref id="jpi0187bib7">
            <label>7</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>De&#x00A0;Jong</surname>
                     <given-names>M.&#x00A0;G.</given-names>
                  </name>
                  <name>
                     <surname>Steenkamp</surname>
                     <given-names>J.-B.&#x00A0;E.</given-names>
                  </name>
                  <name>
                     <surname>Fox</surname>
                     <given-names>J.-P.</given-names>
                  </name>
                  <name>
                     <surname>Baumgartner</surname>
                     <given-names>H.</given-names>
                  </name>
               </person-group>
               <year>2008</year>
               <article-title>Using item response theory to measure extreme response style in marketing research: a global investigation</article-title>
               <source>J. Mark. Res.</source>
               <volume>45</volume>
               <fpage>104</fpage>
               <lpage>115</lpage>
               <page-range>104&#x2013;15</page-range>
               <pub-id pub-id-type="doi">10.1509/jmkr.45.1.104</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib8">
            <label>8</label>
            <element-citation publication-type="other"><person-group person-group-type="author">
                  <name>
                     <surname>Del&#x00A0;Pin</surname>
                     <given-names>S.&#x00A0;H.</given-names>
                  </name>
                  <name>
                     <surname>Amirshahi</surname>
                     <given-names>S.&#x00A0;A.</given-names>
                  </name>
               </person-group>
               <comment>&#x201C;Subjective quality evaluation: what can be learnt from cognitive science?,&#x201D; <italic><uri xlink:href="https://api.semanticscholar.org/CorpusID:253673139">11th&#x00A0;Colour&#x00A0;and&#x00A0;Visual&#x00A0;Comput. Symp.&#x00A0;(CVCS)</uri></italic> (CEUR-WS.org, Aachen, Germany, 2022)</comment>
            </element-citation></ref>
         <ref id="jpi0187bib9">
            <label>9</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Greenleaf</surname>
                     <given-names>E.&#x00A0;A.</given-names>
                  </name>
               </person-group>
               <year>1992</year>
               <article-title>Measuring extreme response style</article-title>
               <source>Publ. Opinion Quart.</source>
               <volume>56</volume>
               <fpage>328</fpage>
               <lpage>351</lpage>
               <page-range>328&#x2013;51</page-range>
               <pub-id pub-id-type="doi">10.1086/269326</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib10">
            <label>10</label>
            <element-citation publication-type="other"><person-group person-group-type="author">
                  <name>
                     <surname>Ho</surname>
                     <given-names>L.&#x00A0;L.</given-names>
                  </name>
                  <name>
                     <surname>Loh</surname>
                     <given-names>P.&#x00A0;C.</given-names>
                  </name>
                  <name>
                     <surname>Quah</surname>
                     <given-names>A.&#x00A0;L.</given-names>
                  </name>
               </person-group>
               <comment>&#x201C;<italic><uri xlink:href="https://doi.org/10.1109/QoMEX.2019.8743252">A&#x00A0;Cross-Cultural,&#x00A0;Between-Gender&#x00A0;Study&#x00A0;of&#x00A0;Extreme&#x00A0;Response&#x00A0;Style</uri></italic>,&#x201D; (Nanyang Technological University, Singapore, 1995)</comment>
            </element-citation></ref>
         <ref id="jpi0187bib11">
            <label>11</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Holbrook</surname>
                     <given-names>A.&#x00A0;L.</given-names>
                  </name>
                  <name>
                     <surname>Green</surname>
                     <given-names>M.&#x00A0;C.</given-names>
                  </name>
                  <name>
                     <surname>Krosnick</surname>
                     <given-names>J.&#x00A0;A.</given-names>
                  </name>
               </person-group>
               <year>2003</year>
               <article-title>Telephone versus face-to-face interviewing of national probability samples with long questionnaires: comparisons of respondent satisficing and social desirability response bias</article-title>
               <source>Publ. Opinion Quart.</source>
               <volume>67</volume>
               <fpage>79</fpage>
               <lpage>125</lpage>
               <page-range>79&#x2013;125</page-range>
               <pub-id pub-id-type="doi">10.1086/346010</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib12">
            <label>12</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Hosu</surname>
                     <given-names>V.</given-names>
                  </name>
                  <name>
                     <surname>Lin</surname>
                     <given-names>H.</given-names>
                  </name>
                  <name>
                     <surname>Sziranyi</surname>
                     <given-names>T.</given-names>
                  </name>
                  <name>
                     <surname>Saupe</surname>
                     <given-names>D.</given-names>
                  </name>
               </person-group>
               <year>2020</year>
               <article-title>KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment</article-title>
               <source>IEEE Trans. Image Process.</source>
               <volume>29</volume>
               <fpage>4041</fpage>
               <lpage>4056</lpage>
               <page-range>4041&#x2013;56</page-range>
               <pub-id pub-id-type="doi">10.1109/TIP.2020.2967829</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib13">
            <label>13</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Hui</surname>
                     <given-names>C.&#x00A0;H.</given-names>
                  </name>
                  <name>
                     <surname>Triandis</surname>
                     <given-names>H.&#x00A0;C.</given-names>
                  </name>
               </person-group>
               <year>1989</year>
               <article-title>Effects of culture and response format on extreme response style</article-title>
               <source>J. Cross-cultural Psychol.</source>
               <volume>20</volume>
               <fpage>296</fpage>
               <lpage>309</lpage>
               <page-range>296&#x2013;309</page-range>
               <pub-id pub-id-type="doi">10.1177/0022022189203004</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib14">
            <label>14</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <collab>International Telecommunication Union</collab>
               </person-group>
               <article-title>Recommendation ITU-R BT.500-15 (05/2023), Methodology for the subjective assessment of the quality of television pictures</article-title>
               <year>2023</year>
               <publisher-name>ITU Publications</publisher-name>
            </element-citation>
         </ref>
         <ref id="jpi0187bib15">
            <label>15</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Janowski</surname>
                     <given-names>L.</given-names>
                  </name>
                  <name>
                     <surname>Papir</surname>
                     <given-names>Z.</given-names>
                  </name>
               </person-group>
               <year>2009</year>
               <article-title>Modeling subjective tests of quality of experience with a generalized linear model</article-title>
               <source>First Int&#x2019;l. Workshop on Quality of Multimedia Experience (QoMEX)</source>
               <fpage>35</fpage>
               <lpage>40</lpage>
               <page-range>35&#x2013;40</page-range>
               <publisher-name>IEEE</publisher-name>
               <publisher-loc>Piscataway, NJ</publisher-loc>
               <pub-id pub-id-type="doi">10.1109/QOMEX.2009.5246979</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib16">
            <label>16</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Jones</surname>
                     <given-names>B.&#x00A0;L.</given-names>
                  </name>
                  <name>
                     <surname>McManus</surname>
                     <given-names>P.&#x00A0;R.</given-names>
                  </name>
               </person-group>
               <year>1986</year>
               <article-title>Graphic scaling of qualitative terms</article-title>
               <source>SMPTE J.</source>
               <volume>95</volume>
               <fpage>1166</fpage>
               <lpage>1171</lpage>
               <page-range>1166&#x2013;71</page-range>
               <pub-id pub-id-type="doi">10.5594/J04083</pub-id>
            </element-citation></ref>
         <ref id="jpi0187bib17">
            <label>17</label>
            <element-citation publication-type="other"><person-group person-group-type="author">
                  <name>
                     <surname>Kingma</surname>
                     <given-names>D.&#x00A0;P.</given-names>
                  </name>
                  <name>
                     <surname>Ba</surname>
                     <given-names>J.</given-names>
                  </name>
               </person-group>
               <comment>&#x201C;Adam: a method for stochastic optimization,&#x201D; Preprint, arXiv:<ext-link ext-link-type="arxiv" xlink:href="http://arxiv.org/abs/1412.6980">1412.6980</ext-link> (2014)</comment>
            </element-citation></ref>
         <ref id="jpi0187bib18">
            <label>18</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Li</surname>
                     <given-names>Z.</given-names>
                  </name>
                  <name>
                     <surname>Bampis</surname>
                     <given-names>C.&#x00A0;G.</given-names>
                  </name>
                  <name>
                     <surname>Krasula</surname>
                     <given-names>L.</given-names>
                  </name>
                  <name>
                     <surname>Janowski</surname>
                     <given-names>L.</given-names>
                  </name>
                  <name>
                     <surname>Katsavounidis</surname>
                     <given-names>I.</given-names>
                  </name>
               </person-group>
               <year>2020</year>
               <article-title>A simple model for subject behavior in subjective experiments</article-title>
               <source>IS&#x0026;T Int&#x2019;l. Symp. Electronic Imaging</source>
               <publisher-name>IS&#x0026;T</publisher-name>
               <publisher-loc>Springfield, VA</publisher-loc>
               <pub-id pub-id-type="doi">10.2352/ISSN.2470-1173.2020.11.HVEI-131</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib19">
            <label>19</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Liddell</surname>
                     <given-names>T.&#x00A0;M.</given-names>
                  </name>
                  <name>
                     <surname>Kruschke</surname>
                     <given-names>J.&#x00A0;K.</given-names>
                  </name>
               </person-group>
               <year>2018</year>
               <article-title>Analyzing ordinal data with metric models: what could possibly go wrong?</article-title>
               <source>J. Exp. Soc. Psychol.</source>
               <volume>79</volume>
               <fpage>328</fpage>
               <lpage>348</lpage>
               <page-range>328&#x2013;48</page-range>
               <pub-id pub-id-type="doi">10.1016/j.jesp.2018.08.009</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib20">
            <label>20</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Lin</surname>
                     <given-names>H.</given-names>
                  </name>
                  <name>
                     <surname>Hosu</surname>
                     <given-names>V.</given-names>
                  </name>
                  <name>
                     <surname>Saupe</surname>
                     <given-names>D.</given-names>
                  </name>
               </person-group>
               <year>2019</year>
               <article-title>KADID-10k: a large-scale artificially distorted IQA database</article-title>
               <source>Eleventh Int&#x2019;l. Conf. Quality of Multimedia Experience (QoMEX)</source>
               <fpage>1</fpage>
               <lpage>3</lpage>
               <page-range>1&#x2013;3</page-range>
               <publisher-name>IEEE</publisher-name>
               <publisher-loc>Piscataway, NJ</publisher-loc>
               <pub-id pub-id-type="doi">10.1109/QoMEX.2019.8743252</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib21">
            <label>21</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Pinson</surname>
                     <given-names>M.&#x00A0;H.</given-names>
                  </name>
                  <name>
                     <surname>Janowski</surname>
                     <given-names>L.</given-names>
                  </name>
                  <name>
                     <surname>P&#x00E9;pion</surname>
                     <given-names>R.</given-names>
                  </name>
                  <name>
                     <surname>Huynh-Thu</surname>
                     <given-names>Q.</given-names>
                  </name>
                  <name>
                     <surname>Schmidmer</surname>
                     <given-names>C.</given-names>
                  </name>
                  <name>
                     <surname>Corriveau</surname>
                     <given-names>P.</given-names>
                  </name>
                  <name>
                     <surname>Younkin</surname>
                     <given-names>A.</given-names>
                  </name>
                  <name>
                     <surname>Le&#x00A0;Callet</surname>
                     <given-names>P.</given-names>
                  </name>
                  <name>
                     <surname>Barkowsky</surname>
                     <given-names>M.</given-names>
                  </name>
                  <name>
                     <surname>Ingram</surname>
                     <given-names>W.</given-names>
                  </name>
               </person-group>
               <year>2012</year>
               <article-title>The influence of subjects and environment on audiovisual subjective tests: an international study</article-title>
               <source>IEEE J. Sel. Top. Signal Process.</source>
               <volume>6</volume>
               <fpage>640</fpage>
               <lpage>651</lpage>
               <page-range>640&#x2013;51</page-range>
               <pub-id pub-id-type="doi">10.1109/JSTSP.2012.2215306</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib22">
            <label>22</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Saffir</surname>
                     <given-names>M.&#x00A0;A.</given-names>
                  </name>
               </person-group>
               <year>1937</year>
               <article-title>A comparative study of scales constructed by three psychophysical methods</article-title>
               <source>Psychometrika</source>
               <volume>2</volume>
               <fpage>179</fpage>
               <lpage>198</lpage>
               <page-range>179&#x2013;98</page-range>
               <pub-id pub-id-type="doi">10.1007/BF02288395</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib23">
            <label>23</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Saupe</surname>
                     <given-names>D.</given-names>
                  </name>
                  <name>
                     <surname>Del&#x00A0;Pin</surname>
                     <given-names>S.&#x00A0;H.</given-names>
                  </name>
               </person-group>
               <year>2024</year>
               <article-title>National differences in image quality assessment: an investigation on three large-scale IQA datasets</article-title>
               <source>16th Int&#x2019;l. Conf. Quality of Multimedia Experience (QoMEX)</source>
               <fpage>214</fpage>
               <lpage>220</lpage>
               <page-range>214&#x2013;20</page-range>
               <publisher-name>IEEE</publisher-name>
               <publisher-loc>Piscataway, NJ</publisher-loc>
               <pub-id pub-id-type="doi">10.1109/QoMEX61742.2024.10598250</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib24">
            <label>24</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Sch&#x00F6;nemann</surname>
                     <given-names>P.&#x00A0;H.</given-names>
                  </name>
                  <name>
                     <surname>Tucker</surname>
                     <given-names>L.&#x00A0;R.</given-names>
                  </name>
               </person-group>
               <year>1967</year>
               <article-title>A maximum likelihood solution for the method of successive intervals allowing for unequal stimulus dispersions</article-title>
               <source>Psychometrika</source>
               <volume>32</volume>
               <fpage>403</fpage>
               <lpage>417</lpage>
               <page-range>403&#x2013;17</page-range>
               <pub-id pub-id-type="doi">10.1007/BF02289654</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib25">
            <label>25</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Scott</surname>
                     <given-names>M.&#x00A0;J.</given-names>
                  </name>
                  <name>
                     <surname>Guntuku</surname>
                     <given-names>S.&#x00A0;C.</given-names>
                  </name>
                  <name>
                     <surname>Huan</surname>
                     <given-names>Y.</given-names>
                  </name>
                  <name>
                     <surname>Lin</surname>
                     <given-names>W.</given-names>
                  </name>
                  <name>
                     <surname>Ghinea</surname>
                     <given-names>G.</given-names>
                  </name>
               </person-group>
               <year>2015</year>
               <article-title>Modelling human factors in perceptual multimedia quality: on the role of personality and culture</article-title>
               <source>Proc. 23rd ACM Int&#x2019;l. Conf. Multimedia</source>
               <fpage>481</fpage>
               <lpage>490</lpage>
               <page-range>481&#x2013;90</page-range>
               <publisher-name>ACM Press</publisher-name>
               <publisher-loc>New York, NY</publisher-loc>
               <pub-id pub-id-type="doi">10.1145/2733373.2806254</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib26">
            <label>26</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Seufert</surname>
                     <given-names>M.</given-names>
                  </name>
               </person-group>
               <year>2021</year>
               <article-title>Statistical methods and models based on quality of experience distributions</article-title>
               <source>Qual. User Exp.</source>
               <volume>6</volume>
               <fpage>3</fpage>
               <pub-id pub-id-type="doi">10.1007/s41233-020-00044-z</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib27">
            <label>27</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Tasaka</surname>
                     <given-names>S.</given-names>
                  </name>
               </person-group>
               <year>2017</year>
               <article-title>Bayesian hierarchical regression models for QoE estimation and prediction in audiovisual communications</article-title>
               <source>IEEE Trans. Multimedia</source>
               <volume>19</volume>
               <fpage>1195</fpage>
               <lpage>1208</lpage>
               <page-range>1195&#x2013;208</page-range>
               <pub-id pub-id-type="doi">10.1109/TMM.2017.2652064</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib28">
            <label>28</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Taylor</surname>
                     <given-names>J.&#x00A0;E.</given-names>
                  </name>
                  <name>
                     <surname>Rousselet</surname>
                     <given-names>G.&#x00A0;A.</given-names>
                  </name>
                  <name>
                     <surname>Scheepers</surname>
                     <given-names>C.</given-names>
                  </name>
                  <name>
                     <surname>Sereno</surname>
                     <given-names>S.&#x00A0;C.</given-names>
                  </name>
               </person-group>
               <year>2023</year>
               <article-title>Rating norms should be calculated from cumulative link mixed effects models</article-title>
               <source>Behav. Res. Methods</source>
               <volume>55</volume>
               <fpage>2175</fpage>
               <lpage>2196</lpage>
               <page-range>2175&#x2013;96</page-range>
               <pub-id pub-id-type="doi">10.3758/s13428-022-01814-7</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib29">
            <label>29</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Teunissen</surname>
                     <given-names>K.</given-names>
                  </name>
               </person-group>
               <year>1996</year>
               <article-title>The validity of CCIR quality indicators along a graphical scale</article-title>
               <source>SMPTE J.</source>
               <volume>105</volume>
               <fpage>144</fpage>
               <lpage>149</lpage>
               <page-range>144&#x2013;9</page-range>
               <pub-id pub-id-type="doi">10.5594/J04650</pub-id>
            </element-citation></ref>
         <ref id="jpi0187bib30">
            <label>30</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Thurstone</surname>
                     <given-names>L.&#x00A0;L.</given-names>
                  </name>
               </person-group>
               <year>1927</year>
               <article-title>A law of comparative judgment</article-title>
               <source>Psychol. Rev.</source>
               <volume>101</volume>
               <fpage>273</fpage>
               <lpage>286</lpage>
               <page-range>273&#x2013;86</page-range>
               <pub-id pub-id-type="doi">10.1037/h0070288</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib31">
            <label>31</label>
            <element-citation publication-type="book"><person-group person-group-type="author">
                  <name>
                     <surname>Torgerson</surname>
                     <given-names>W.&#x00A0;S.</given-names>
                  </name>
               </person-group>
               <source>Theory and Methods of Scaling</source>
               <year>1958</year>
               <publisher-name>Wiley</publisher-name>
               <publisher-loc>New York, NY</publisher-loc>
            </element-citation>
         </ref>
         <ref id="jpi0187bib32">
            <label>32</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Wichmann</surname>
                     <given-names>F.&#x00A0;A.</given-names>
                  </name>
                  <name>
                     <surname>Hill</surname>
                     <given-names>N.&#x00A0;J.</given-names>
                  </name>
               </person-group>
               <year>2001</year>
               <article-title>The psychometric function: I. Fitting, sampling, and goodness of fit</article-title>
               <source>Percept. Psychophys.</source>
               <volume>63</volume>
               <fpage>1293</fpage>
               <lpage>1313</lpage>
               <page-range>1293&#x2013;313</page-range>
               <pub-id pub-id-type="doi">10.3758/BF03194544</pub-id>
            </element-citation>
         </ref>
         <ref id="jpi0187bib33">
            <label>33</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Xiang</surname>
                     <given-names>Y.</given-names>
                  </name>
                  <name>
                     <surname>Gubian</surname>
                     <given-names>S.</given-names>
                  </name>
                  <name>
                     <surname>Suomela</surname>
                     <given-names>B.</given-names>
                  </name>
                  <name>
                     <surname>Hoeng</surname>
                     <given-names>J.</given-names>
                  </name>
               </person-group>
               <year>2013</year>
               <article-title>Generalized simulated annealing for efficient global optimization: the GenSA package for R</article-title>
               <source>R J.</source>
               <volume>5</volume>
               <pub-id pub-id-type="doi">10.32614/RJ-2013-002</pub-id>
               <comment>[Online]. Available: <uri xlink:href="https://journal.r-project.org">https://journal.r-project.org</uri></comment>
            </element-citation>
         </ref>
         <ref id="jpi0187bib34">
            <label>34</label>
            <element-citation publication-type="journal"><person-group person-group-type="author">
                  <name>
                     <surname>Zax</surname>
                     <given-names>M.</given-names>
                  </name>
                  <name>
                     <surname>Takahashi</surname>
                     <given-names>S.</given-names>
                  </name>
               </person-group>
               <year>1967</year>
               <article-title>Cultural influences on response style: comparisons of Japanese and American college students</article-title>
               <source>J. Soc. Psychol.</source>
               <volume>71</volume>
               <fpage>3</fpage>
               <lpage>10</lpage>
               <page-range>3&#x2013;10</page-range>
               <pub-id pub-id-type="doi">10.1080/00224545.1967.9919760</pub-id>
            </element-citation>
         </ref>
      </ref-list>
   </back>
</article>