A hybrid design-time/run-time scheduling flow to minimise the reconfiguration overhead of FPGAs (original) (raw)

Hardware Task Scheduling for Partially Reconfigurable FPGAs

Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems, swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a run time system manager for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM considers the runtime status of each task and region, e.g. busy, idle, scheduled for reconfiguration/execution etc., to execute tasks. Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and RR reservation for future reconfiguration and execution. We validate its correctness using our RTSM to execute an image processing application on a ZedBoard platform. We also evaluate its features within a simulation framework, and find that despite the technology limitations, our approach can give promising results in terms of quality of scheduling. 1 Introduction Reconfiguration can dynamically adapt the functionality of hardware systems by swapping in and out HW tasks. To select the proper resource for loading and triggering HW task reconfiguration and execution in partially reconfigurable systems with FPGAs, efficient and flexible runtime system support is needed [6]. In this paper we propose and implement a Run-Time System Manager (RTSM) incorporating efficient scheduling mechanisms that balance effectively the execution of HW and SW tasks and the use of physical resources. We aim to execute as fast as possible a given application, without exhausting the physical resources. Our motivation during the development of RTSM was to find ways to overcome the strict technology restrictions imposed by the Xilinx PR flow [8]:  Static partitioning of the reconfigurable surface in reconfigurable regions (RR).

Dynamic scheduling of tasks on partially reconfigurable FPGAs

Iee Proceedings-computers and Digital Techniques, 2000

Field{Programmable Gate Arrays (FPGAs) that allow partial recon guration at run{time can be shared among multiple independent tasks. When the sequence of tasks to be performed is unpredictable the FPGA controller needs to make allocation decisions on{line. Since on{line allocation su ers from fragmentation, tasks can end up waiting despite there being su cient, albeit non{contiguous resources available to service them. The time to complete tasks is consequently longer and the utilization of the FPGA is lower than it could be.

Hardware task scheduling optimizations for reconfigurable computing

2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications, 2008

Reconfigurable Computers (RC) can provide significant performance improvement for domain applications. However, wide acceptance of today's RCs among domain scientist is hindered by the complexity of design tools and the required hardware design experience. Recent developments in hardware/software co-design methodologies for these systems provide the ease of use, but they are not comparable in performance to manual co-design. This paper aims at improving the overall performance of hardware tasks assigned to FPGA. Particularly the analysis of inter-task communication as well as data dependencies among tasks are used to reduce the number of configurations and to minimize the communication overhead and task processing time. This work leverages algorithms developed in the RC and Reconfigurable Hardware (RH) domains to address efficient use of hardware resources to propose two algorithms, Weight-Based Scheduling (WBS) and Highest Priority First-Next Fit (HPF-NF). However, traditional resource based scheduling alone is not sufficient to reduce the performance bottleneck, therefore a comprehensive algorithm is necessary. The Reduced Data Movement Scheduling (RDMS) algorithm is proposed to address dependency analysis and inter-task communication optimizations. Simulation shows that compared to WBS and HPF-NF, RDMS is able to reduce the amount of FPGA configurations to schedule random generated graphs with heavy weight nodes by 30% and 11% respectively. Additionally, the proof-of-concept implementation of a complex 13-node example task graph on the SGI RC100 reconfigurable computer shows that RDMS is not only able to trim down the amount of necessary configurations from 6 to 4 but also to reduce communication overhead by 48% and the hardware processing time by 33%. * One instantiation of a FPGA configuration is denoted to the process of loading the corresponding bitstream into the device, configuring it, executing the tasks in the configuration, and then releasing the device.

Methodology and reconfigurable architecture for effective placement of variable-size hardware tasks

2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013), 2013

Dynamic partial reconfiguration (DPR) of FPGAbased architectures offers a high degree of flexibility and is often an appropriate solution for applications needing dynamically changing contexts. The standard design flow used for design of these architectures still suffer from a lack of adaptability when confronted with applications to implement consisting of variablesize hardware tasks or IP (Intellectual Property) cores. Thus induced heterogeneity may cause wrong placement of hardware tasks (IPs) on a chip leading to a sub-optimal use of available hardware resources and therefore a decrease in the system performances. This paper addresses the problems of effective design of reconfigurable regions on the FPGA device with regard to needed hardware resources for a given application. We propose a methodology allowing effective placement of variable-size IPs on reconfigurable regions which are sized to the smallest IP of a given application. Its validation and benefits are shown on the example of video transcoding from MPEG2 to H.264 video stream, especially in the case of reconfigurable region partitioning used to implement hardware tasks for the entropy encoding (CAVLC / VLC). The obtained results show a gain in hardware utilization resources up to 40% for given hardware tasks and the lesser context changing time (up to 2 times faster) which is driven by the size of the reconfigurable region used to task implementation.

Run-time scheduling for multimedia applications on dynamically reconfigurable systems

2003

Current multimedia applications are characterized by highly dynamic and non-deterministic behavior as well as high-performance requirements. In addition, portable devices demand also low energy consumption. Potentially, Dynamically Reconfigurable Hardware resources (DRHW) present the ideal features to fulfill these requirements since they can be reconfigured at run-time to match the performance and energy consumption requirements. However, the lack of programming support for dynamic task placement as well as the large configuration overhead has prevented a broader use of DRHW resources on embedded system design. To cope with these two problems, we have adopted a DRHW model with specific support for task migration and inter-task communication.

Online Task Scheduling for the FPGA-Based Partially Reconfigurable Systems

2009

Given the FPGA-based partially reconfigurable systems, hardware tasks can be configured into (or removed from) the FPGA fabric without interfering with other tasks running on the same device. In such systems, the efficiency of task scheduling algorithms directly impacts the overall system performance. By using previously proposed 2D scheduling model, existing algorithms could not provide an efficient way to find all suitable allocations. In addition, most of them ignored the single reconfiguration port constraint and inter-task dependencies. Further more, to our best knowledge there is no previous work investigating in the impact on the scheduling result by reusing already placed tasks. In this paper, we focus on online task scheduling and propose task scheduling solution that takes the ignored constraints into account. In addition, a novel "reuse and partial reuse" approach is proposed. The simulation results show that our proposed solution achieves shorter application completion time up to 43.9% and faster single task response time up to 63.8% compared to the previously proposed stuffing algorithm.

HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs

2013

Partial reconfiguration (PR) enables shared FPGA systems to nonintrusively time multiplex hardware tasks in partially reconfigurable regions (PRRs). To fully exploit PR, higher priority tasks should preempt lower priority tasks and preempted tasks should resume execution in any PRR. This preemption/resumption requires saving/restoring the preempted task's execution context and relocating the task to another PRR, however, prior works only provide partial solutions and impose limitations and/or overheads. We propose on-chip hardware task relocation (HTR) software, which enables a task's execution state to be saved, relocated to, and restored in any PRR with sufficient resources. The HTR software executes on a soft-core processor in the FPGA's static region, and is thus portable across any system/application. Experimental results evaluate HTR execution times, enabling designers to tradeoff task/PRR granularity and HTR execution times based on application requirements.

Run-time management of systems with partially reconfigurable FPGAs

Integration, the VLSI Journal, 2017

Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems by swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a Run-Time System Manager (RTSM) for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions (RRs) of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM controls system operation considering the status of each task and region (e.g. busy, idle, scheduled for reconfiguration/execution, etc). Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and region reservation for future reconfiguration and execution. We validate the correctness and portability of our RTSM executing an image processing application on two Xilinx-based platforms: ZedBoard and XUPV5. We also perform a more extensive evaluation of its features using a simulation framework, and find thatdespite the technology limitationsour approach can give promising results in terms of scheduling quality. Since our RTSM supports also the scheduling of parallel SW tasks, we use it to manage the execution of the entire parallel Edge Detection application on a desktop; we compare the application execution time with that using the OpenMP framework and find that with our RTSM execution is 2.4 times faster than the unoptimized OpenMP version. When processor affinity optimization is enabled for OpenMP, our RTMS and the OpenMP are on par, indicating that the scheduling efficiency of our RTSM is competitive to this state-of-the-art scheduler, while supporting in addition the management of HW tasks.

Configuration-Sensitive Process Scheduling for FPGA-Based Computing Platforms

2004

Reconfigurable computing has become an important part of research in software systems and computer architecture. While prior research on reconfigurable computing have addressed architectural and compilation/programming aspects to some extent, there is still not much consensus on what kind of operating system (OS) support should be provided. In this paper, we focus on OS process scheduler, and demonstrate how it can be customized considering the needs of reconfigurable hardware. Our process scheduler is configuration sensitive, that is, it reuses the current FPGA configuration as much as possible. Our extensive experimental results show that the proposed scheduler is superior to classical scheduling algorithms such First-Come-First-Serve (FCFS) and Shortest Job First (SJF).

An efficient Resource Management to optimize the placement of hardware task on FPGA in the RVC framework

Design Automation for Embedded Systems, 2012

Dynamic partial reconfiguration (DPR) functionality allows implementing multitasks applications by exchanging tasks in a design at run-time. It is a promising solution to enhance system performances. But, the effective use of DPR is often hampered by the complexity added to the system design process. In this paper, we investigate the implementation of a multi-tasks applications using the DPR in the RVC framework. We present a resource management method which includes three steps: partitioning the application in HW/SW tasks, divided the FPGA in static and dynamic regions and placement the tasks on FPGA. The proposed method is based on using linear programming strategy to find the optimal placement of hardware tasks. We take into account the heterogeneity aspect of the device. The goal is to minimize the resource utilization and fragmentation. We use RVC technology which is based on a specific language for writing dataflow models called RVC-CAL. This language describes the application as set of blocks called actors connected through a network. To test the efficiency of our approach, we exploit the decoder MPEG-4 SP described in RVC-CAL. We measure the quality of placement in terms of tasks rejection, execution time and resource wastage. Application of different data combinations and a comparison with the state-of-the art method show the high performance of the proposed approach.