Abstract

The demand for high-performant and energy-efficient neural network (NN) grows. Due to their parallelity, GPUs bring a big performance improvement (in comparison to CPUs) for these applications. Nevertheless, they still have a limitation in their fixed architecture. Field-programmable gate arrays (FPGAs) offer more flexibility and can optimise the architecture design to the specific neural network application in order to archive the best performance. Furthermore they are highly energy-efficient and are therefore well suited for NNs in embedded systems or large scale server clusters.

However, deploying neural networks on FPGAs (like programming parallel accelerators in general), with focus on high performance, is a complex step! Due to their flexible architecture, FPGAs allow for so many options for tweaking the performance but in return require a lot of hardware specific expertise and take costly development time to be configured properly. Furthermore, developers do not want to manually adapt their implementations for various accelerators (e.g. CPU, GPU, FPGA). Instead, performance portability is desirable!

Lift addresses these challenges by offering a high-level functional, data-parallel language, which allows the user to efficiently develop an application independently of the target hardware platform. Then, rewrite rules in the Lift compiler open a vast design space of possible implementations for this abstract system specification. This design space is explored to find a suitable solution, which satisfies the performance and energy requirements. In order to exploit the parallel structure of NNs, the compiler employs pipelining mechanisms and allocates distributed on-chip memory on the FPGA. Timing behaviour and scheduling is introduced until finally a Hardware description language (HDL) code is emitted, that can be used to generate the bitstream for the FPGA.