AROGD: A Benchmark Dataset for Automatic Reading-order Generation of Document Collections

Md. Manzoor  Murshed; Steven J. Simske

doi:10.2352/EI.2026.38.3.MOBMU-316

Abstract

Automatic reading-order generation aims to produce an effective sequence of related documents, such as textbook chapters, course lectures, and search results, using only their textual content. This capability is critical for applications such as automatic curriculum sequencing, adaptive reading-order construction for improved comprehension, and relevant search-result ranking. Despite its potential importance, the field is still relatively new, and no existing system matches the sequencing quality of expert human curation. There are no publicly available benchmark datasets for rigorous evaluation and research. To address this gap, we have proposed some algorithmic techniques based on surveying recent advances reported in leading journals and conferences and herein introduce the AROGD (Automatic Reading Order Generation Dataset), the first publicly available dataset designed for automatic reading-order generation for multiple related document collections. AROGD provides carefully curated document sets with expert-annotated orderings, enabling reproducible experimentation and standardized benchmarking of automatic sequence-generation models. This contribution establishes a foundation for future research on text-driven reading-order generation and supports the development of robust, human-competitive sequencing algorithms. The Dataset, benchmark, and leaderboard will be available at https://github.com/murshedm/AROGD-Dataset.

articleview.keywords