
Monocular depth estimation (MDE) is a widely used technique in autonomous driving and 3D reconstruction. However, inconsistent and fragmented depth outputs can significantly undermine the reliability of MDE applications in practice. To address this issue, the authors introduce MonoHybrid, a novel self-supervised hybrid network that effectively integrates Transformer and dilated convolutional architectures. This design enables the extraction of both global and local features, enhancing the receptive field and ensuring robust and continuous depth estimation. Additionally, the authors present a new Feature Fusion Module that fuses convolutional and Transformer features, resulting in improved depth estimation performance. Through comprehensive experiments, the proposed network demonstrates notable accuracy and generalization compared to other advanced methods in the field.
Wei Jiang, Bingfei Nan, Guofa Wang, "MonoHybrid: Self-Supervised Monocular Depth Estimation with Hybrid Network" in Journal of Imaging Science and Technology, 2026, pp 1 - 12, https://doi.org/10.2352/J.ImagingSci.Technol.2026.70.4.040503