Benchmarking OpenCL for High-Performance Scientific Computing
High-performance computing (HPC) systems are becoming increasingly heterogeneous, with current nodes consisting of a mix of CPU and one or more GPUs, and it is expected that FPGAs will soon be thrown in the mix. The performance of HPC systems is often evaluated using the HPL benchmark , which measures the time to solve a dense system of linear equations. However, this benchmark is limited since linear algebra represents only one of many critical types of workloads for HPC systems.
The Extended OpenDwarfs (EOD) benchmark suite aims to encompass a wider range of patterns of computation and communication found in scientific computing workloads.  EOD comprises a set of realistic scientific codes written using OpenCL  so as to be portable to modern accelerator architectures including CPU, GPU and many-integrated-core (MIC). So far, EOD includes 13 benchmarks which have been evaluated on 15 different accelerator devices. It uses high resolution/low overhead timers to measure the performance of individual accelerators built in to each region of every EOD benchmark. The performance of certain accelerators are highly influenced by problem size, so a representative benchmark suite should be flexible with regard to problem size selection. To this end, a major focus of EOD was in enabling different problem sizes for each benchmark code, where problem size selection is based on the working memory footprint. So far, not all of the benchmark codes have been extended to support multiple problems sizes -- which is where you come in!.
We're looking for a motivated research student to assist us in:
- applying the same methodology of problem size selection to extend the remaining benchmark codes. One of these benchmarks -- computational fluid dynamics -- will require cutting through a jungle of legacy code to generate suitably sized datasets. In particular, the wing meshes on which it operates were developed by some mysterious means, possibly the dark arts, but probably Fortran codes which no longer exist. They are stored in an undocumented data-structure which you will need to decipher. The work for the other two benchmarks should be easier, there are memory leaks and a buffer overflow on the larger problem sizes -- fixing these would be a good place to start.
- improving the documentation of the build/install process and documenting the design decisions made during your research.
- examining how OpenCL-specific optimizations impact the performance of certain EOD kernels on accelerators -- potentially an FPGA could be involved.
Enhance the Extended OpenDwarfs benchmark suite to allow exploration of problem size effects on a wider range of benchmarks
Improve the applicability and usability of the EOD benchmark suite
- Solid programming skills;
- Familiarity, or at least a passing acquaintance with GDB and valgrind. Also, most of the benchmarks are written in C++, and some in C. So knowing these languages couldn't hurt. That said, if you show an abundance of Requirements 1-3, we can help you learn you the tools and languages as you progress.
 B. Johnston and J. Milthorpe (2018) Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures
 Khronos Group OpenCL
Learning objectives for this project:
- Demonstrate understanding of performance characteristics of scientific computing application codes on heterogeneous computing hardware
- Perform performance evaluations of benchmark codes
- Apply good benchmarking practice to enhance and extend an existing OpenCL benchmark suite
- Communicate performance results to a research audience