IS&T | Library

Enabling Factors of International IP Treaties for Unlocking Scientific Publication Image Capture for the Visually Impaired

23 5

Legality of Image-to-Text Transformations
Marrakesh Treaty
Accessible Format Copies
Text-to-Data Mining Opt-Out
AI Training Legality
Perceivability of Scientific Imagery to Blind
Visually Impaired Accessibility
Visual Impairment Inclusion

Frank Wittig, Ruthra Bellan

DOI

10.2352/EI.2026.38.3.MOBMU-314

Volume 38

Issue 3

Abstract

With 126 signatories as of 2025, the Marrakesh VIP Treaty is one of the most far-reaching international legal treaties and enables blind and visually impaired people and authorized entities acting on their behalf lawful access to copyrighted intellectual property by legitimizing creation of so-called accessible format copies. However, on the explicit topic of image-to-text transformation or text-and-data mining now made possible by advanced vision language models (such as BLIP-2, LLaVA-1.5-7B, Moondream2, Qwen2-VL-2B, and Idefics3-8B), the treaty is silent. This work provides a comparative analysis of the developments in the tangential legal frameworks of several countries as well as various conference and scientific journal publishing frameworks to highlight the permissibility of visually impaired users of scientific literature to overcome the “book famine” they face. These legal challenges and opportunities are exemplified for the various stakeholders in the scientific publishing domain. Furthermore, the paper explores conflict potentials with data-mining restriction laws and sui generis database rights. The paper ends with an outlook assessing the maturity of the current conference proceedings landscape for enabling legally compliant access to the visually impaired.

Digital Library: EI

Published Online: March 2026

Explainable AI-powered Compliance Audit System: Real-time Multi-framework Security Monitoring with Transparent Decision Tracing

15 5

Explainable AI
Compliance Monitoring
Security Information and Event Management
Knowledge Graphs
Natural Language Generation
GDPR
ISO 27001
Regulatory Technology

Sam Soney Chemparathy, Mahipal , Reiner Creutzburg

DOI

10.2352/EI.2026.38.MOBMU-315

Volume 38

Issue 3

Abstract

Compliance monitoring in modern enterprises requires simultaneous adherence to multiple regulatory frameworks (ISO 27001, GDPR, SOC 2, HIPAA, PCI-DSS), yet existing Security Information and Event Management (SIEM) systems lack cross-framework mapping capabilities, explainability, and real-time processing. We present an explainable AI-powered compliance audit system that addresses these limitations by employing a novel architecture that combines knowledge graphs, transformer-based natural language models, and cryptographically signed audit trails. Our system provides real-time monitoring across five compliance frameworks, generates multi-audience explanations (technical, legal, executive), and maintains complete traceability of decisions. Evaluation on 100,000+ events from the LANL Unified Host and Network Dataset demonstrates 89% accuracy in compliance evaluation, 87% accuracy in control mapping, and end-to-end latency of 387ms (p50) on dedicated hardware (Intel Core i7-11700K, 32GB RAM). The system detected 1,247 compliance violations during a 30-day deployment, with 91.5% precision and an 8.5% false-positive rate. Our cross-framework knowledge graph reveals that 34% of applicable controls would be missed without unified mapping. We note that the LANL dataset evaluates the pipeline’s event-processing and rule-evaluation mechanics on authentication telemetry; validation on data-rich environments containing PII, PHI, or payment-card data remains necessary to confirm framework-specific detection accuracy. This work enables proactive compliance management and demonstrates a practical architecture for multi-regulatory compliance monitoring with built-in explainability.

Digital Library: EI

Published Online: March 2026

AROGD: A Benchmark Dataset for Automatic Reading-order Generation of Document Collections

6 0

automatic curriculum generation dataset
automatic document sequencing
Document Reading Order generation
Sequence Reading order

Md. Manzoor Murshed, Steven J. Simske

DOI

10.2352/EI.2026.38.3.MOBMU-316

Volume 38

Issue 3

Abstract

Automatic reading-order generation aims to produce an effective sequence of related documents, such as textbook chapters, course lectures, and search results, using only their textual content. This capability is critical for applications such as automatic curriculum sequencing, adaptive reading-order construction for improved comprehension, and relevant search-result ranking. Despite its potential importance, the field is still relatively new, and no existing system matches the sequencing quality of expert human curation. There are no publicly available benchmark datasets for rigorous evaluation and research. To address this gap, we have proposed some algorithmic techniques based on surveying recent advances reported in leading journals and conferences and herein introduce the AROGD (Automatic Reading Order Generation Dataset), the first publicly available dataset designed for automatic reading-order generation for multiple related document collections. AROGD provides carefully curated document sets with expert-annotated orderings, enabling reproducible experimentation and standardized benchmarking of automatic sequence-generation models. This contribution establishes a foundation for future research on text-driven reading-order generation and supports the development of robust, human-competitive sequencing algorithms. The Dataset, benchmark, and leaderboard will be available at https://github.com/murshedm/AROGD-Dataset.

Digital Library: EI

Published Online: March 2026

Development of a Mobile Application for the Community-focused Microclimate-informed Indoor Heat Emergency Alert (CommHEAT) System

8 0

Community-focused
Microclimate-informed
Mobile application
Unity
MySQL
PHP
Indoor Temperature
Heat Index

Alex Raymond Renner, Samantha Edwards, Joel McCleary, Adam Kohl, Kexin Wang, Eliot Winer

DOI

10.2352/EI.2026.38.3.MOBMU-317

Volume 38

Issue 3

Abstract

The mobile application for the Community-Focused Microclimate-Informed Indoor Heat Emergency Alert (CommHEAT) system forecasts indoor temperatures and heat indices for the next seven days for users and their “community” friends. The primary challenge in the mobile application development was creating a system that connected external weather data sources, an indoor temperature simulation (EnergyPlus) that requires High Performance Computing (HPC), and a web-based database. Python scripts were used to retrieve input data from external weather websites (e.g., Mesonet and the National Weather Service) and empirical data for dwelling “archetypes” from spreadsheets, run simulations, and export the results to a MySQL database created for the system. The mobile application was created with Unity and forecast information was extracted from the database using C# and web-based PHP scripts for security. Interdisciplinary rhetoric challenges and mid-development requests for additional application features caused numerous modifications and replacements of methods to create the required mobile application features. Despite these challenges, the development team successfully connected each component of the CommHEAT system and delivered a fully functional application that was deployed to iOS and Android mobile phones for a user study conducted in the summer of 2025.

Digital Library: EI

Published Online: March 2026

Sustainable Framework for Computational Resource-optimized 3D Photogrammetry

18 4

Sustainable Imaging
3D Photogrammetry
Reinforcement Learning

Julia Schnitzer, Zohair Al-Ameen

DOI

10.2352/EI.2026.38.3.MOBMU-318

Volume 38

Issue 3

Abstract

The computational footprint of 3D photogrammetry is a growing concern. This is due to the standard workflows that often need hundreds or thousands of high-resolution images to achieve high-fidelity results. This places a significant energy burden on processing hardware, thereby increasing costs and environmental impact. In this proposal, EcoScan is presented as a novel, sustainable photogrammetry workflow that minimizes computational resource consumption. EcoScan utilizes an on-device Reinforcement Learning (RL) agent that functions as an intelligent photographer. Its purpose is to make real-time decisions regarding which frames to capture and suggest optimal camera movements to maximize information gain per pixel. This yields a minimal yet sufficient image dataset that should be efficient for downstream processing. The proposed approach reformulates the capture process as a Markov Decision Process (MDP) with a reward function that balances reconstruction quality with computational energy costs. Results show that EcoScan reduces the number of required input images by 3-5 times compared to conventional methods while achieving equivalent reconstruction accuracy. This translates to a 60-70% reduction in total energy consumption during the SfM and MVS processing phases. The EcoScan framework provides a pathway towards sustainable 3D digitization without compromising quality.

Digital Library: EI

Published Online: March 2026

Measuring the Complexity of Image Enhancement and Restoration Algorithms Using a Logarithmic Model

18 3

Complexity measure
Logarithmic model
Image processing
Brain-computer interaction

Julia Schnitzer, Zohair Al-Ameen, Basim Mahmood

DOI

10.2352/EI.2026.38.3.MOBMU-319

Volume 38

Issue 3

Abstract

Digital Library: EI

Published Online: March 2026

Protocol Translation Vulnerabilities in LLM Agent Communication Stacks

8 6

LLM agent security
protocol translation
vulnerability taxonomy
stacking ensemble
formal verification
MCP
A2A
ACP
FIPA-ACL
cross-protocol detection
real-time threat analysis

Mahipal

DOI

10.2352/EI.2026.38.MOBMU-320

Volume 38

Issue 3

Abstract

As autonomous LLM agents increasingly operate across heterogeneous communication stacks — MCP, A2A, ACP, and FIPA-ACL — the translation gateways that bridge these protocols constitute a critical yet largely unstudied attack surface. Semantic mismatches in authentication, state management, and content encoding at translation boundaries can enable vulnerabilities that are invisible to single-protocol defenses. We introduce the Translation Security Analysis Framework (TSAF), contributing a six-category vulnerability taxonomy (ISV, PIV, SCV, CPRV, TIV, CEV) grounded in category theory, a three-layer detection pipeline combining static rules, behavioral anomaly detection, and a stacking ensemble classifier (XGBoost, LightGBM, HistGBM) trained on 2.5 million real network traffic samples from CICIDS2017 and UNSW-NB15, and formal verification of 16 security properties via ProVerif, Tamarin Prover, and TLA+—all proved to hold. Evaluation yields 96.3% true positive rate, 3.8% false positive rate, and 147 ms p95 latency, with statistically significant improvements over all baselines. TSAF provides the first systematic treatment of translation-boundary attacks in multi-agent AI.

Digital Library: EI

Published Online: March 2026

Capabilities of Image-to-text Transformation Models for Enabling Visually Impaired to Perceive Complex Imaging Visuals at Conferences and Scientific Journals

35 9

vision-language models
accessibility
scientific figures
automated captioning
RefCLIPScore
visual impairment
blind

Ruthra Bellan, Frank Wittig, Reiner Creutzburg

DOI

10.2352/EI.2026.38.3.MOBMU-322

Volume 38

Issue 3

Abstract

Scientific figures (charts, composite panels, and data visualizations) are routinely inaccessible to visually impaired readers because screen readers cannot interpret visual content and published captions are often too brief or domain-specific to convey what the figure shows. Vision-language models (VLMs) offer a potential route to automated, accessible image description at scale. In this study, we evaluate five open-source, instruction-tuned VLMs (BLIP-2, LLaVA-1.5-7B, Moondream2, Qwen2-VL-2B, and Idefics3-8B) on a dataset of 245 scientific figures drawn from 32 papers presented at Electronic Imaging 2025. Generated captions are scored against author-provided ground-truth captions using four complementary metrics: BLEU, ROUGE-L, Sentence-BERT cosine similarity (SBERT), and RefCLIPScore. Moondream2 achieves the highest performance across all semantic metrics (RefCLIPScore = 1.025, SBERT = 0.392) despite being one of the smallest models evaluated (~1.86B parameters), offering the best balance of quality and speed (8.7 s per image). The four metrics tell a consistent story: Moondream2 scores low on lexical match but high on semantic similarity and image alignment, which is the expected pattern when detailed visual descriptions are compared against brief author captions. These findings are broadly paralleled in an evaluation of VLM-generated captions performed by a small sample of actual publication authors. Besides highlighting the suitability of the aforementioned VLMs in aiding visually impaired individuals, the explored approaches may also serve as orientation for familiarizing authors and publishers of scientific articles with the needs of assistive tech and the increasing expectations in accessibility regulations.

Digital Library: EI

Published Online: March 2026

Overview and State-of-the-Art in Printer Forensics

33 4

Printer Forensics
Document Examination
Printing Technology
Digital Forensics
Source Attribution
Counterfeit Detection
Steganography

Nikola Nachevski, Rifqi Ardia Ramadhan, Panharith An, Rana Shafi, Reiner Creutzburg

DOI

10.2352/EI.2026.38.3.MOBMU-324

Volume 38

Issue 3

Abstract

Printer forensics is a specialized field within digital and document forensics that focuses on identifying the source printer of a printed document through intrinsic and extrinsic characteristics. As printers play a crucial role in both legitimate and malicious activities ranging from document authentication to the dissemination of forged or anonymous materials, the need for robust forensic techniques has become increasingly important. This paper provides a comprehensive overview of the current landscape in printer forensics, including the classification of methods used for source identification, such as mechanical defect analysis, texture pattern recognition, and embedded code detection. Both traditional image processing techniques and recent advancements leveraging machine learning and deep neural networks are examined. Additionally, we explore the challenges associated with dataset availability, print-scan noise, and cross-model generalization. By surveying existing methodologies and the public limitations of current approaches, we identify emerging trends and propose potential directions for future research in the field.

Digital Library: EI

Published Online: March 2026

Deep Learning Based Vehicle Classification: Detecting EVs and Gasoline Cars in Berlin using Convolutional Neural Networks

37 5

Deep Learning
Vehicle Classification
Electric Vehicles (EVs)
Convolutional Neural Networks
Object Detection
Smart Parking
Edge Computing
Privacy-Aware Systems

Raghav Tandon, Hamid Mostofi, Navaneeth Shivananjappa, Reiner Creutzburg

DOI

10.2352/EI.2026.38.3.MOBMU-327

Volume 38

Issue 3