
Mixed reality (MR) integrates virtual content with the physical world, enabling users to place virtual objects in real environments and interact with or observe them. As MR technologies advance, such experiences are becoming increasingly common. However, it remains unclear how the visual and interactive representation of virtual objects influences users, and few studies have examined users’ behavioral responses to virtual objects. We investigated whether representation factors (interactivity, transparency, and size) affect users’ sense of presence and their behaviors toward the object (e.g., avoidance or displacement). Here, interactivity refers to whether users can touch the virtual object. In two experiments (desk-scale and room-scale) conducted, participants performed a reaching task toward a real target located behind a virtual object whose representation factors were manipulated. Presence and behavior were assessed using subjective ratings and objective measures from tracking data and video observations. Perceived presence varied with interactivity, transparency, and size, whereas avoidance and displacement behaviors showed no reliable differences across conditions. Nonetheless, the results suggest that behavioral responses may emerge when interaction demands are stronger or the scale of interaction is larger. Overall, representation affected perceived presence but did not reliably change avoidance or displacement behavior in this task.

We have developed a semi-automatic annotation tool – “CVL Annotator” – for bounding box ground truth generation in videos. Our research is particularly motivated by the need for reference annotations of challenging nighttime traffic scenes with highly dynamic lighting conditions due to reflections, headlights and halos from oncoming traffic. Our tool incorporates a suite of different state-of-the-art tracking algorithms in order to minimize the amount of human input necessary to generate high-quality ground truth data. We focus our user interface on the premise of minimizing user interaction and visualizing all information relevant to the user at a glance. We perform a preliminary user study to measure the amount of time and clicks necessary to produce ground truth annotations of video traffic scenes and evaluate the accuracy of the final annotation results.