MonoHybrid: Self-Supervised Monocular Depth Estimation with Hybrid Network

Wei Jiang; Bingfei Nan; Guofa Wang

doi:10.2352/J.ImagingSci.Technol.2026.70.4.040503

Back to articles

Regular Article FastTrack

Volume: 0 | Article ID: 040503

MonoHybrid: Self-Supervised Monocular Depth Estimation with Hybrid Network

monocular depth estimation hybrid network self-supervision

DOI : 10.2352/J.ImagingSci.Technol.2026.70.4.040503

Abstract

Monocular depth estimation (MDE) is a widely used technique in autonomous driving and 3D reconstruction. However, inconsistent and fragmented depth outputs can significantly undermine the reliability of MDE applications in practice. To address this issue, the authors introduce MonoHybrid, a novel self-supervised hybrid network that effectively integrates Transformer and dilated convolutional architectures. This design enables the extraction of both global and local features, enhancing the receptive field and ensuring robust and continuous depth estimation. Additionally, the authors present a new Feature Fusion Module that fuses convolutional and Transformer features, resulting in improved depth estimation performance. Through comprehensive experiments, the proposed network demonstrates notable accuracy and generalization compared to other advanced methods in the field.

Journal Title : Journal of Imaging Science and Technology

Publisher Name : Society for Imaging Science and Technology

Downloads 0

Cite this article

Wei Jiang, Bingfei Nan, Guofa Wang, "MonoHybrid: Self-Supervised Monocular Depth Estimation with Hybrid Network" in Journal of Imaging Science and Technology, 2026, pp 1 - 12, https://doi.org/10.2352/J.ImagingSci.Technol.2026.70.4.040503

Copy citation

Article timeline

received March 2025
accepted January 2026

articleview.keywords

Login or subscribe to view the content