
With 126 signatories as of 2025, the Marrakesh VIP Treaty is one of the most far-reaching international legal treaties and enables blind and visually impaired people and authorized entities acting on their behalf lawful access to copyrighted intellectual property by legitimizing creation of so-called accessible format copies. However, on the explicit topic of image-to-text transformation or text-and-data mining now made possible by advanced vision language models (such as BLIP-2, LLaVA-1.5-7B, Moondream2, Qwen2-VL-2B, and Idefics3-8B), the treaty is silent. This work provides a comparative analysis of the developments in the tangential legal frameworks of several countries as well as various conference and scientific journal publishing frameworks to highlight the permissibility of visually impaired users of scientific literature to overcome the “book famine” they face. These legal challenges and opportunities are exemplified for the various stakeholders in the scientific publishing domain. Furthermore, the paper explores conflict potentials with data-mining restriction laws and sui generis database rights. The paper ends with an outlook assessing the maturity of the current conference proceedings landscape for enabling legally compliant access to the visually impaired.

Automatic reading-order generation aims to produce an effective sequence of related documents, such as textbook chapters, course lectures, and search results, using only their textual content. This capability is critical for applications such as automatic curriculum sequencing, adaptive reading-order construction for improved comprehension, and relevant search-result ranking. Despite its potential importance, the field is still relatively new, and no existing system matches the sequencing quality of expert human curation. There are no publicly available benchmark datasets for rigorous evaluation and research. To address this gap, we have proposed some algorithmic techniques based on surveying recent advances reported in leading journals and conferences and herein introduce the AROGD (Automatic Reading Order Generation Dataset), the first publicly available dataset designed for automatic reading-order generation for multiple related document collections. AROGD provides carefully curated document sets with expert-annotated orderings, enabling reproducible experimentation and standardized benchmarking of automatic sequence-generation models. This contribution establishes a foundation for future research on text-driven reading-order generation and supports the development of robust, human-competitive sequencing algorithms. The Dataset, benchmark, and leaderboard will be available at https://github.com/murshedm/AROGD-Dataset.

The mobile application for the Community-Focused Microclimate-Informed Indoor Heat Emergency Alert (CommHEAT) system forecasts indoor temperatures and heat indices for the next seven days for users and their “community” friends. The primary challenge in the mobile application development was creating a system that connected external weather data sources, an indoor temperature simulation (EnergyPlus) that requires High Performance Computing (HPC), and a web-based database. Python scripts were used to retrieve input data from external weather websites (e.g., Mesonet and the National Weather Service) and empirical data for dwelling “archetypes” from spreadsheets, run simulations, and export the results to a MySQL database created for the system. The mobile application was created with Unity and forecast information was extracted from the database using C# and web-based PHP scripts for security. Interdisciplinary rhetoric challenges and mid-development requests for additional application features caused numerous modifications and replacements of methods to create the required mobile application features. Despite these challenges, the development team successfully connected each component of the CommHEAT system and delivered a fully functional application that was deployed to iOS and Android mobile phones for a user study conducted in the summer of 2025.

The computational footprint of 3D photogrammetry is a growing concern. This is due to the standard workflows that often need hundreds or thousands of high-resolution images to achieve high-fidelity results. This places a significant energy burden on processing hardware, thereby increasing costs and environmental impact. In this proposal, EcoScan is presented as a novel, sustainable photogrammetry workflow that minimizes computational resource consumption. EcoScan utilizes an on-device Reinforcement Learning (RL) agent that functions as an intelligent photographer. Its purpose is to make real-time decisions regarding which frames to capture and suggest optimal camera movements to maximize information gain per pixel. This yields a minimal yet sufficient image dataset that should be efficient for downstream processing. The proposed approach reformulates the capture process as a Markov Decision Process (MDP) with a reward function that balances reconstruction quality with computational energy costs. Results show that EcoScan reduces the number of required input images by 3-5 times compared to conventional methods while achieving equivalent reconstruction accuracy. This translates to a 60-70% reduction in total energy consumption during the SfM and MVS processing phases. The EcoScan framework provides a pathway towards sustainable 3D digitization without compromising quality.


Scientific figures (charts, composite panels, and data visualizations) are routinely inaccessible to visually impaired readers because screen readers cannot interpret visual content and published captions are often too brief or domain-specific to convey what the figure shows. Vision-language models (VLMs) offer a potential route to automated, accessible image description at scale. In this study, we evaluate five open-source, instruction-tuned VLMs (BLIP-2, LLaVA-1.5-7B, Moondream2, Qwen2-VL-2B, and Idefics3-8B) on a dataset of 245 scientific figures drawn from 32 papers presented at Electronic Imaging 2025. Generated captions are scored against author-provided ground-truth captions using four complementary metrics: BLEU, ROUGE-L, Sentence-BERT cosine similarity (SBERT), and RefCLIPScore. Moondream2 achieves the highest performance across all semantic metrics (RefCLIPScore = 1.025, SBERT = 0.392) despite being one of the smallest models evaluated (~1.86B parameters), offering the best balance of quality and speed (8.7 s per image). The four metrics tell a consistent story: Moondream2 scores low on lexical match but high on semantic similarity and image alignment, which is the expected pattern when detailed visual descriptions are compared against brief author captions. These findings are broadly paralleled in an evaluation of VLM-generated captions performed by a small sample of actual publication authors. Besides highlighting the suitability of the aforementioned VLMs in aiding visually impaired individuals, the explored approaches may also serve as orientation for familiarizing authors and publishers of scientific articles with the needs of assistive tech and the increasing expectations in accessibility regulations.

Printer forensics is a specialized field within digital and document forensics that focuses on identifying the source printer of a printed document through intrinsic and extrinsic characteristics. As printers play a crucial role in both legitimate and malicious activities ranging from document authentication to the dissemination of forged or anonymous materials, the need for robust forensic techniques has become increasingly important. This paper provides a comprehensive overview of the current landscape in printer forensics, including the classification of methods used for source identification, such as mechanical defect analysis, texture pattern recognition, and embedded code detection. Both traditional image processing techniques and recent advancements leveraging machine learning and deep neural networks are examined. Additionally, we explore the challenges associated with dataset availability, print-scan noise, and cross-model generalization. By surveying existing methodologies and the public limitations of current approaches, we identify emerging trends and propose potential directions for future research in the field.

The rapid growth of electric vehicles (EVs) has introduced new challenges for urban parking management, mainly in enforcing EV-designated parking spaces without intrusive infrastructure. This paper presents a deep-learning-based vision system for the automated classification of electric and gasoline vehicles in urban parking environments, using convolutional neural networks trained on real-world data from Berlin, Germany. A YOLO-based object detection model is employed to identify visually distinctive EV-specific features in rear-view vehicle images while preserving privacy by anonymizing license plates. The proposed approach relies solely on visual cues, eliminating the need for vehicle metadata, sensors, or network connectivity. Experimental results demonstrate robust classification performance, achieving high detection accuracy and consistent results across desktop and edge computing platforms. To validate real-world applicability, the trained model is deployed on both a mobile device and a low-cost Raspberry Pi-based edge system, enabling fully offline operation. These results indicate that deep learning-based visual classification can provide a scalable, privacy-aware solution for smart parking systems and urban mobility applications. This supports the effective management of EV infrastructure in modern cities.

Email phishing remains a prevalent cyber threat, targeting victims to extract sensitive information or deploy malicious software. This paper explores the integration of open-source intelligence (OSINT) tools and machine learning (ML) models to enhance phishing detection across multilingual datasets. Using Nmap and theHarvester, this study extracted 17 features, including domain names, IP addresses, and open ports, to improve detection accuracy. Multilingual email datasets, including English and Arabic, were analyzed to address limitations of ML models trained predominantly on English-language data. Experiments with five classification algorithms: Decision Tree, Random Forest, Support Vector Machine, XGBoost, and Multinomial Naïve Bayes. It revealed that Random Forest achieved the highest performance, with accuracies of 97.37% on both the English and Arabic datasets. For OSINT-enhanced datasets, the model achieved higher accuracy than baseline models without OSINT features. These findings highlight the potential of combining OSINT tools with advanced ML models to detect phishing emails more effectively across diverse languages and contexts. This study contributes an approach to phishing detection by incorporating OSINT features and evaluating their impact on multilingual datasets, addressing a critical gap in cybersecurity research.

Stereotaxic neurosurgery in small-animal neuroscience remains largely manual and operator-dependent, introducing variability that compromises experimental reproducibility. This paper presents a modular retrofit system that motorises a conventional rodent stereotaxic frame and integrates it with Pinpoint for neuroanatomic positioning via a custom Ephys Link binding, enabling direct software-to-hardware coordinate translation from neuroanatomical atlas-based planning to physical needle insertion. The system uses NEMA 17 stepper motors with a 14:1 planetary gearbox, a Duet 3 Mini 5+ motion controller originally developed for open-source 3D printing and here adapted for neuroscience through developer collaboration, and RepRapFirmware configured in CNC mode. A custom firmware parameter (M203 I0.1) unlocks a minimum feed rate of 0.1 mm/min for tissue-safe brain insertion. The microinjection axis is fully automated for suction and retraction. A custom housing provides structural rigidity, vibration damping, and cable management without modifying the original frame. The system achieves positional accuracy beyond 0.1mm, with a nominal microstep increment of 0.09μm, validated using phantoms and ex vivo specimens