Back to articles
Proceedings
Volume: 36 | Article ID: HPCI-196
Image
High-Performance Data Loader for Large-Scale Data Processing
  DOI :  10.2352/EI.2024.36.12.HPCI-196  Published OnlineJanuary 2024
Abstract
Abstract

The utilization of supercomputers and large clusters for big-data processing has recently gained immense popularity, primarily due to the widespread adoption of Graphics Processing Units (GPUs) to execute iterative algorithms, such as Deep Learning and 2D/3D imaging applications. This trend is especially prominent in the context of large-scale datasets, which can range from hundreds of gigabytes to several terabytes in size. Similar to the field of Deep Learning, which deals with datasets of comparable or even greater sizes (e.g. LION-3B), these efforts encounter complex challenges related to data storage, retrieval, and efficient GPU utilization. In this work, we benchmarked a collection of high-performance general dataloaders used in Deep Learning with a dual focus on user-friendliness (Pythonic) and high-performance execution. These dataloaders have become crucial tools. Notably, advanced dataloading solutions such as Web-datasets, FFCV, and DALI have demonstrated significantly superior performance when compared to traditional PyTorch general data loaders. This work provides a comprehensive benchmarking analysis of high-performance general dataloaders tailored for handling extensive datasets within supercomputer environments. Our findings indicate that DALI surpasses our baseline PyTorch dataloader by up to 3.4x in loading times for datasets comprising one million images.

Subject Areas :
Views 66
Downloads 25
 articleview.views 66
 articleview.downloads 25
  Cite this article 

Edgar Josafat Martinez-Noriega, Chen Peng, Rio Yokota, "High-Performance Data Loader for Large-Scale Data Processingin Electronic Imaging,  2024,  pp 196-1 - 196-6,  https://doi.org/10.2352/EI.2024.36.12.HPCI-196

 Copy citation
  Copyright statement 
Copyright © 2024, Society for Imaging Science and Technology 2024
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA