IS&T | Library

Regular

Keywords Filters

A - C - D - F - H - L - S - U

ADMM

Conjugate Gradient

Deep LearningDistributed Training

Fast data retrieval

High-PerformanceHigh Performance ComputingHeterogeneous Computing

LaminographyLarge Datasets

Segmentation

Unequally-Spaced Fast Fourier Transform

Proceedings

114 32

Accelerated Laminographic Image Reconstruction Using GPUs

ADMM
Conjugate Gradient
Heterogeneous Computing
Laminography
Unequally-Spaced Fast Fourier Transform

Bin Ma, Viktor Nikitin, Dong Li, Tekin Bicer

Pages 188-1 - 188-6, January 2024, This work is a U.S. Government work not subject to copyright in the United States (17 U.S.C. §105). The work is also available for worldwide use and reuse under CC0 1.0 Universal. 2024

DOI

10.2352/EI.2024.36.12.HPCI-188

Volume 36

Issue 12

Abstract

View

Abstract

Laminography is a specialized 3D imaging technique optimized for examining flat, elongated structures. Laminographic reconstruction is the process of generating 3D volume from a set of 2D projections that are collected during the laminography experiment. Iterative reconstruction techniques are typically the preferred computational method for generating high-quality 3D volumes, however, these methods are computationally demanding and therefore can be infeasible to apply to large datasets. To counteract these challenges, we require state-of-the-art computational methods that can efficiently utilize high-performance computing resources such as GPUs. In this work, we investigate the integration of the Unequally Spaced Fast Fourier Transform (USFFT) with two optimization methods: the Alternating Direction Method of Multipliers (ADMM) and the Conjugate Gradient (CG). The usage of USFFT addresses non-uniform sampling issues typical in laminography, while the combination of ADMM and CG introduces robust regularization techniques that enhance image quality by preserving edges and reducing noise. We further accelerated the iterative algorithm of USFFT by preprocessing the image into the frequency domain. Compared to the original algorithm, the optimized USFFT method achieved a 1.82x speedup. By harnessing heterogeneous computing and parallel computing with both CPU and GPU, our approach significantly accelerates the reconstruction process while keeping the quality of the generated images. We evaluate the performance of our methods using real-world datasets collected at 32-ID beamline at Advanced Photon Source using Argonne Leadership Computing Resources.

Digital Library: EI

Published Online: January 2024

Proceedings

143 51

High-Performance Data Loader for Large-Scale Data Processing

Deep Learning
Fast data retrieval
High-Performance
Large Datasets

Edgar Josafat Martinez-Noriega, Chen Peng, Rio Yokota

DOI

10.2352/EI.2024.36.12.HPCI-196

Volume 36

Issue 12

Abstract

View

Abstract

The utilization of supercomputers and large clusters for big-data processing has recently gained immense popularity, primarily due to the widespread adoption of Graphics Processing Units (GPUs) to execute iterative algorithms, such as Deep Learning and 2D/3D imaging applications. This trend is especially prominent in the context of large-scale datasets, which can range from hundreds of gigabytes to several terabytes in size. Similar to the field of Deep Learning, which deals with datasets of comparable or even greater sizes (e.g. LION-3B), these efforts encounter complex challenges related to data storage, retrieval, and efficient GPU utilization. In this work, we benchmarked a collection of high-performance general dataloaders used in Deep Learning with a dual focus on user-friendliness (Pythonic) and high-performance execution. These dataloaders have become crucial tools. Notably, advanced dataloading solutions such as Web-datasets, FFCV, and DALI have demonstrated significantly superior performance when compared to traditional PyTorch general data loaders. This work provides a comprehensive benchmarking analysis of high-performance general dataloaders tailored for handling extensive datasets within supercomputer environments. Our findings indicate that DALI surpasses our baseline PyTorch dataloader by up to 3.4x in loading times for datasets comprising one million images.

Digital Library: EI

Published Online: January 2024

Proceedings

193 26

Efficient Distributed Sequence Parallelism for Transformer-Based Image Segmentation

Deep Learning
Distributed Training
High Performance Computing
Segmentation

Isaac Lyngaas, Murali Gopalakrishnan Meena, Evan Calabrese, Mohamed Wahib, Peng Chen, Jun Igarashi, Yuankai Huo, Xiao Wang

DOI

10.2352/EI.2024.36.12.HPCI-199

Volume 36

Issue 12

Abstract

View

Abstract

We introduce an efficient distributed sequence parallel approach for training transformer-based deep learning image segmentation models. The neural network models are comprised of a combination of a Vision Transformer encoder with a convolutional decoder to provide image segmentation mappings. The utility of the distributed sequence parallel approach is especially useful in cases where the tokenized embedding representation of image data are too large to fit into standard computing hardware memory. To demonstrate the performance and characteristics of our models trained in sequence parallel fashion compared to standard models, we evaluate our approach using a 3D MRI brain tumor segmentation dataset. We show that training with a sequence parallel approach can match standard sequential model training in terms of convergence. Furthermore, we show that our sequence parallel approach has the capability to support training of models that would not be possible on standard computing resources.

Digital Library: EI

Published Online: January 2024

Filters

Keywords

Keywords

Subject Areas