Parallel processing using GPUs provides substantial increases in algorithm performance across many disciplines. As a result serial algorithms are commonly translated to parallel algorithms written in CUDA or OpenCL. To perform this translation a user must first overcome various barriers to entry. These obstacles change depending on the user but in general may include learning to program using the chosen API, understanding the intricacies of parallel processing and optimization, and other issues such as the upkeep of two sets of code. Such barriers are experienced by both experts and novices alike. Leveraging the unique source to source transformation tools provided by Clang/LLVM we have created a tool to generate CUDA from C++. Such transformations reduce obstacles experienced in developing GPU software and can increase efficiency and revision speed regardless of experience. This manuscript details a new open source, cross platform tool, togpu, which performs source to source transformations from C++ to CUDA. We present experimentation results using common image processing algorithms. The tool lowers entrance barriers while preserving a singular code base and readability. Enhancing the GPU developer workflow through providing core tooling affords users immediate benefits — and facilitates further developments — to improve high performance, parallel computing.
Matthew Marangoni, Thomas Wischgoll, "Paper: Togpu: Automatic Source Transformation from C++ to CUDA using Clang/LLVM" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Visualization and Data Analysis, 2016, https://doi.org/10.2352/ISSN.2470-1173.2016.1.VDA-487