In recent years, the rapid development of imaging systems and the growth of compute-intensive imaging algorithms have led to a strong demand for High Performance Computing (HPC) for efficient image processing. However, the two communities, imaging and HPC, have largely remained separate, with little synergy. This conference focuses on research topics that converge HPC and imaging research with an emphasis on advanced HPC facilities and techniques for imaging systems/algorithms and applications. In addition, the conference provides a unique platform that brings imaging and HPC people together and discusses emerging research topics and techniques that benefit both the HPC and imaging community. Papers are solicited on all aspects of research, development, and application of high-performance computing or efficient computing algorithms and systems for imaging applications.
We introduce a physics guided data-driven method for image-based multi-material decomposition for dual-energy computed tomography (CT) scans. The method is demonstrated for CT scans of virtual human phantoms containing more than two types of tissues. The method is a physics-driven supervised learning technique. We take advantage of the mass attenuation coefficient of dense materials compared to that of muscle tissues to perform a preliminary extraction of the dense material from the images using unsupervised methods. We then perform supervised deep learning on the images processed by the extracted dense material to obtain the final multi-material tissue map. The method is demonstrated on simulated breast models with calcifications as the dense material placed amongst the muscle tissues. The physics-guided machine learning method accurately decomposes the various tissues from input images, achieving a normalized root-mean-squared error of 2.75%.
The COVID-19 epidemic has been a significant healthcare challenge in the United States. COVID-19 is transmitted predominately by respiratory droplets generated when people breathe, talk, cough, or sneeze. Wearing a mask is the primary, effective, and convenient method of blocking 80% of respiratory infections. Therefore, many face mask detection systems have been developed to supervise hospitals, airports, publication transportation, sports venues, and retail locations. However, the current commercial solutions are typically bundled with software or hardware, impeding public accessibility. In this paper, we propose an in-browser serverless edge-computing-based face mask detection solution, called Web-based efficient AI recognition of masks (WearMask), which can be deployed on common devices (e.g., cell phones, tablets, computers) with internet connections using web browsers. The serverless edge-computing design minimizes the hardware costs (e.g., specific devices or cloud computing servers). It provides a holistic edge-computing framework for integrating (1) deep learning models (YOLO), (2) high-performance neural network inference computing framework (NCNN), and (3) a stack-based virtual machine (WebAssembly). For end-users, our solution has advantages of (1) serverless edge-computing design with minimal device limitation and privacy risk, (2) installation-free deployment, (3) low computing requirements, and (4) high detection speed. Our application has been launched with public access at facemask-detection.com.
We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlapping scan positions. This acquisition method can enable nanoscale imaging with x-rays and electrons, but this often requires very large experimental datasets and commensurately high turnaround times, which can limit experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we introduce a software system that can automate ptychography data analysis tasks. We accelerate the data analysis pipeline by using a modified version of PtychoNN -- an ML-based approach to solve phase retrieval problem that shows two orders of magnitude speedup compared to traditional iterative methods. Further, our system coordinates and overlaps different data analysis tasks to minimize synchronization overhead between different stages of the workflow. We evaluate our workflow system with real-world experimental workloads from the 26ID beamline at Advanced Photon Source and ThetaGPU cluster at Argonne Leadership Computing Resources.
Diffusion tensor imaging (DTI) is a non-invasive magnetic resonance imaging (MRI) modality used to map white matter fiber tracts for a variety of clinical applications; one of which is aiding preoperative assessments for tumor patients. DTI requires numerical computations on multiple diffusion weighted images to calculate diffusion tensors at each voxel and probabilistic tracking<sup>1</sup> to construct fiber tracts, or tractography. Greater accuracy in tractography is possible with larger, more advanced imaging and reconstruction algorithms. However, larger scans and advanced reconstruction is often computationally intensive. The post-processing pipeline involves significant computational resources and time and requires up to 40 minutes of computation time on state-of-the-art hardware. Parallel GPU computations can improve time for the resource-intensive tractography. A collaborative team from DIPY, NVIDIA, and UCSF recently developed a tool, GPUStreamlines, for GPU-enabled tractography<sup>2</sup> which has been expanded to support the constant solid angle (CSA) reconstruction algorithm<sup>3</sup>. This GPU-enabled tractography was applied to MRIs of brains with and without presence of lesions, with substantial increases in processing speed. We demonstrate that CSA GPU-enabled tractography in normal controls and patients are comparable to the existing gold standard tractography currently in place at UCSF.
Recently, a new deep learning architecture, the Vision Transformer, has emerged as the new standard for classification tasks, overtaking the conventional Convolutional Neural Network (CNN) models. However, these state-of-the-art models require large amounts of data, typically over 100 million images, to achieve optimal performance through transfer learning. This requirement is met by using proprietary datasets like JFT-300M or 3B, which are not publicly available. To overcome these challenges and address privacy concerns, Formula-Driven Supervised Learning (FDSL) has been introduced. FDSL trains deep learning models using synthetic images generated from mathematical formulas, such as Fractals and Radial Contour images. The main objective of this approach is to reduce the I/O bottleneck that occurs during training with large datasets. Our implementation of FDSL generates instances in real-time during training, and uses a custom data loader based on EGL (Native Platform Graphics Interface) for fast rendering via shaders. The evaluation of our custom data loader on the FractalDB-100k dataset comprising 100 million images revealed a loading time that is three times faster compared to the PyTorch Vision loader.