
Emotion recognition using physiological signals is often limited by unimodal analysis, which fails to capture interactions across physiological systems. This study proposes a multimodal framework that integrates heart rate (HR) and pupil diameter signals, with a particular focus on modeling cross-modal interactions. We introduce composite features that explicitly represent relationships between HR and pupil dynamics, combined with a two-step feature optimization strategy using correlation-based reduction and mutual information ranking. Experiments were conducted on an emotion-elicitation dataset with three emotional states (Joy, Neutral, Sad), using multiple classifiers and crossvalidation schemes. The proposed method achieved a classification accuracy of 91.1%, significantly outperforming HR-only (61.1%) and pupil-only (72.2%) approaches. Feature analysis revealed that cross-modal descriptors, particularly an entropy-based interaction feature, contributed most to performance improvement. These results demonstrate that explicitly modeling cross-modal physiological interactions provides an effective strategy for enhancing emotion recognition accuracy.