Reconfigurable computing devices such as field programmable gate arrays (FPGAs) have demonstrated 10x-100x gains in performance and functional density over microprocessors for a variety of applications, yet their commercial use is limited primarily to serving as single-task ASIC replacements, which largely ignores their programmability and severely limits their applicability. SCORE (Stream Computations Organized for Reconfigurable Execution) addresses this underutilization of reconfigurable technology by introducing a compute model rooted in paged virtual hardware, analogous to virtual memory. The paged model provides a framework for device size abstraction, automatic dynamic reconfiguration, binary compatibility among page-compatible devices, and automatic performance scaling on larger devices, without recompilation.
A key problem in compiling for SCORE is the partitioning of programs into communicating, fixed-size hardware pages. The partitioning must be sensitive to inter-page communication, which in a virtualized model has unknown delay and may require run-time buffering memory. Existing heuristics for circuit partitioning (wire min-cut, FM, spectral, etc.) are not sufficient because they do not fully address the impact of communication on run-time performance and bufferring. In this paper, we propose performance-oriented techniques for automatically synthesizing and partitioning SCORE computations into pages. The problem is formulated as a transformation on streaming state machines and utilizes a variety of high-level, functional information from the unpartitioned program. We propose a methodology for evaluating partitioning techniques in terms of overhead on circuit area and performance, and we show preliminary results for parts of the partitioning methodology. Development and experimentation is done within the existing SCORE software infrastructure, which is under continuing support and development by the Berkeley BRASS group.