Guiding Device Specific Optimization using Architecture-Independent Metrics
People
Supervisor
Description
Measuring performance-critical characteristics of application workloads are important both for developers, who must understand and optimize the performance of codes, as well as designers and integrators of HPC systems, who must ensure that accelerator architectures are suitable for the intended workloads. However, if these workload characteristics are tied to architectural features that are specific to a particular system, they may not generalize well to alternative or future systems.
An architecture-independent method ensures an accurate characterization of inherent program behaviour, without bias due to architecture-dependent features that vary widely between different types of accelerators. The Architecture-independent workload characterization (AIWC) tool [1] collects a set of metrics which determine the suitability and performance of an application on any parallel HPC architecture. These metrics were initially collected primarily for the prediction of execution times, but since they are representative of structural characteristics of the underlying program and are free from architectural traits, they can be used in identifying performance limitations of a given implementation and the associated penalties for a target device.
Goals
This project will develop a methodology to examine a kernel's suitability to a given accelerator, and potential for optimization, by measuring its AIWC features. The ultimate aim of this work is to identify common trends which could be used to inform or guide the optimization efforts of the developer. Best practice optimizations for CPU and GPU (e.g. [2,3]) will be applied to selected portable OpenCL codes, and the changes in AIWC features due to each optimization measured.
In addition to identifying which AIWC features map most closely to architecture-specific optimizations, the selected codes may also be used to augment the Extended OpenDwarfs benchmark suite of portable OpenCL scientific application kernels [4].
Requirements
Background Literature
[1] Johnston, B. and Milthorpe, J. (2018) AIWC: OpenCL-Based Architecture-Independent Workload Characterization
[2] Intel (2016) Intel 64 and IA-32 Architectures Optimization Reference Manual
[3] NVidia (2019) CUDA C Best Practices Guide
[4] Johnston, B. and Milthorpe, J. (2018) Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures