Back to articles
Proceedings Paper
Volume: 37 | Article ID: HPCI-183
Image
mTREE: Multi-level Text-guided Representation End-to-end Learning for Whole Slide Image Analysis
  DOI :  10.2352/EI.2025.37.12.HPCI-183  Published OnlineFebruary 2025
Abstract
Abstract

Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines – the localization of key areas (“global-tolocal”) and the development of a WSI-level image-text representation (“local-to-global”) – into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.

Subject Areas :
Views 13
Downloads 1
 articleview.views 13
 articleview.downloads 1
  Cite this article 

Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo, "mTREE: Multi-level Text-guided Representation End-to-end Learning for Whole Slide Image Analysisin Electronic Imaging,  2025,  pp 183-1 - 183-7,  https://doi.org/10.2352/EI.2025.37.12.HPCI-183

 Copy citation
  Copyright statement 
Copyright © 2025 Society for Imaging Science and Technology 2025
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA