Back to articles
Proceedings Paper
Volume: 37 | Article ID: HPCI-183
Image
mTREE: Multi-level Text-guided Representation End-to-end Learning for Whole Slide Image Analysis
  DOI :  10.2352/EI.2025.37.12.HPCI-183  Published OnlineFebruary 2025
Abstract
Abstract

Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines – the localization of key areas (“global-tolocal”) and the development of a WSI-level image-text representation (“local-to-global”) – into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.

Subject Areas :
Views 53
Downloads 10
 articleview.views 53
 articleview.downloads 10
  Cite this article 

Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo, "mTREE: Multi-level Text-guided Representation End-to-end Learning for Whole Slide Image Analysisin Electronic Imaging,  2025,  pp 183-1 - 183-7,  https://doi.org/10.2352/EI.2025.37.12.HPCI-183

 Copy citation
  Copyright statement 
Copyright © 2025 Society for Imaging Science and Technology 2025
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA