A Coarse Grain Reconfigurable Array (CGRA) for Statically Scheduled Data Flow Computing

White Paper Published By: Wave Computing
Wave Computing
Published:  Jul 06, 2018
Type:  White Paper
Length:  9 pages

This paper argues a case for the use of coarse grained reconfigurable array (CGRA) architectures for the efficient acceleration of the data flow computations used in deep neural network training and inferencing. The paper discusses the problems with other parallel acceleration systems such as massively parallel processor arrays (MPPAs) and heterogeneous systems based on CUDA and OpenCL, and proposes that CGRAs with autonomous computing features deliver improved performance and computational efficiency. The machine learning compute appliance that Wave Computing is developing executes data flow graphs using multiple clock-less, CGRA-based System on Chips (SoCs) each containing 16,000 processing elements (PEs). This paper describes the tools needed for efficient compilation of data flow graphs to the CGRA architecture, and outlines Wave Computing’s WaveFlow software (SW) framework for the online mapping of models from popular workflows like Tensorflow, MXNet and Caffe.

