Enhancing design space exploration by extending CPU/GPU specifications onto FPGAs
Owaida, M- O.
;
Falcão, G.
;
Andrade, J.
; Antonopoulos, C. D. A.
; Bellas, N. B.
; Purnaprajna, M. P.
; Novo, D. N.
; Karakonstantis, G. K.
; Burg, A. B.
; Ienne, P. I.
Transactions on Embedded Computing Systems Vol. 14, Nº 2, pp. 1 - 23, March, 2015.
ISSN (print): 1539-9087
ISSN (online):
Scimago Journal Ranking: 0,35 (in 2015)
Digital Object Identifier: 10.1145/2656207
Abstract
The design cycle for complex special purpose compute systems is extremely costly and time-consuming. It involves a multi-parametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation through time consuming Monte-Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as Low-Density Parity-Check (LDPC) codes, adopted by modern communication standards, which involves thousands of Monte-Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to graphics processing units (GPUs) and FP- GAs. The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context we evaluate the concept of retarget- ing a single OpenCL program to multiple-platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g. 8000 bit) to large length (e.g. 64800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, providing different acceleration factors over conventional multicore CPUs.