A Methodology for Automating Co-Scheduling for Reconfigurable Computing Systems (original) (raw)
Related papers
Hardware Task Scheduling for Partially Reconfigurable FPGAs
Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems, swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a run time system manager for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM considers the runtime status of each task and region, e.g. busy, idle, scheduled for reconfiguration/execution etc., to execute tasks. Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and RR reservation for future reconfiguration and execution. We validate its correctness using our RTSM to execute an image processing application on a ZedBoard platform. We also evaluate its features within a simulation framework, and find that despite the technology limitations, our approach can give promising results in terms of quality of scheduling. 1 Introduction Reconfiguration can dynamically adapt the functionality of hardware systems by swapping in and out HW tasks. To select the proper resource for loading and triggering HW task reconfiguration and execution in partially reconfigurable systems with FPGAs, efficient and flexible runtime system support is needed [6]. In this paper we propose and implement a Run-Time System Manager (RTSM) incorporating efficient scheduling mechanisms that balance effectively the execution of HW and SW tasks and the use of physical resources. We aim to execute as fast as possible a given application, without exhausting the physical resources. Our motivation during the development of RTSM was to find ways to overcome the strict technology restrictions imposed by the Xilinx PR flow [8]: Static partitioning of the reconfigurable surface in reconfigurable regions (RR).
Online Task Scheduling for the FPGA-Based Partially Reconfigurable Systems
2009
Given the FPGA-based partially reconfigurable systems, hardware tasks can be configured into (or removed from) the FPGA fabric without interfering with other tasks running on the same device. In such systems, the efficiency of task scheduling algorithms directly impacts the overall system performance. By using previously proposed 2D scheduling model, existing algorithms could not provide an efficient way to find all suitable allocations. In addition, most of them ignored the single reconfiguration port constraint and inter-task dependencies. Further more, to our best knowledge there is no previous work investigating in the impact on the scheduling result by reusing already placed tasks. In this paper, we focus on online task scheduling and propose task scheduling solution that takes the ignored constraints into account. In addition, a novel "reuse and partial reuse" approach is proposed. The simulation results show that our proposed solution achieves shorter application completion time up to 43.9% and faster single task response time up to 63.8% compared to the previously proposed stuffing algorithm.
Hardware task scheduling optimizations for reconfigurable computing
2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications, 2008
Reconfigurable Computers (RC) can provide significant performance improvement for domain applications. However, wide acceptance of today's RCs among domain scientist is hindered by the complexity of design tools and the required hardware design experience. Recent developments in hardware/software co-design methodologies for these systems provide the ease of use, but they are not comparable in performance to manual co-design. This paper aims at improving the overall performance of hardware tasks assigned to FPGA. Particularly the analysis of inter-task communication as well as data dependencies among tasks are used to reduce the number of configurations and to minimize the communication overhead and task processing time. This work leverages algorithms developed in the RC and Reconfigurable Hardware (RH) domains to address efficient use of hardware resources to propose two algorithms, Weight-Based Scheduling (WBS) and Highest Priority First-Next Fit (HPF-NF). However, traditional resource based scheduling alone is not sufficient to reduce the performance bottleneck, therefore a comprehensive algorithm is necessary. The Reduced Data Movement Scheduling (RDMS) algorithm is proposed to address dependency analysis and inter-task communication optimizations. Simulation shows that compared to WBS and HPF-NF, RDMS is able to reduce the amount of FPGA configurations to schedule random generated graphs with heavy weight nodes by 30% and 11% respectively. Additionally, the proof-of-concept implementation of a complex 13-node example task graph on the SGI RC100 reconfigurable computer shows that RDMS is not only able to trim down the amount of necessary configurations from 6 to 4 but also to reduce communication overhead by 48% and the hardware processing time by 33%. * One instantiation of a FPGA configuration is denoted to the process of loading the corresponding bitstream into the device, configuring it, executing the tasks in the configuration, and then releasing the device.
Scheduling Temporal Partitions in a Multiprocessing Paradigm for Reconfigurable Architectures
2009
In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.
Dynamic scheduling of tasks on partially reconfigurable FPGAs
Iee Proceedings-computers and Digital Techniques, 2000
Field{Programmable Gate Arrays (FPGAs) that allow partial recon guration at run{time can be shared among multiple independent tasks. When the sequence of tasks to be performed is unpredictable the FPGA controller needs to make allocation decisions on{line. Since on{line allocation su ers from fragmentation, tasks can end up waiting despite there being su cient, albeit non{contiguous resources available to service them. The time to complete tasks is consequently longer and the utilization of the FPGA is lower than it could be.
Efficient On-line Hardware/Software Task Scheduling for Dynamic Run-time Reconfigurable Systems
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Abstract Modern reconfigurable devices such as FPGAs can be reconfigured at run time. Some of them can be dynamically partially reconfigured, which means part of the FPGA is changed without interrupting other parts. This feature adds tremendous flexibility to the Reconfigurable Computing (RC) Field but also introduces challenges. Reconfigurable Operating Systems tend to ease applications development and most importantly applications verifications and maintenance. In this paper we propose novel scheduling ...
2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010
In this paper, we propose an efficient online task scheduling algorithm which targets 2D FPGA area partitioning model and takes into account the data dependency and the data communications 1) among hardware tasks and 2) between hardware tasks and external devices which have not been explicitly investigated in previous work. In the experiment with 10000 workloads, the evaluation result shows that our proposed scheduling algorithm is about 20x faster than the comparable approach.
Applications of Heterogeneous Computing in Hardware/Software Co-Scheduling
2007 IEEE/ACS International Conference on Computer Systems and Applications, 2007
Current work on automatic task partitioning and scheduling for reconfigurable computing (RC) systems strictly addresses the field programmable gate array (FPGA) hardware, and does not take advantage of the synergy between the microprocessor and the FPGA. Efforts on partitioning between the microprocessor and the FPGA are often times a manual and laborious effort as a formal methodology for automatic hardware-software partitioning for RC systems has not yet been established. Related fields such as heterogeneous computing (HC) and embedded computing (EC) have an extensive body of work for scheduling for heterogeneous processors. In this work, we adapt HC scheduling algorithms for RC systems, and show how simply adapting the algorithms alone is not sufficient to take advantage of the reconfigurable hardware. In many cases, the HC heuristics algorithms do not generate efficient schedules necessary to take advantage of the synergy between the microprocessor and the FPGA. We introduce new heuristic algorithms based on HC scheduling algorithms and show that they provide up to an order of magnitude improvement in execution time.
Configuration-Sensitive Process Scheduling for FPGA-Based Computing Platforms
2004
Reconfigurable computing has become an important part of research in software systems and computer architecture. While prior research on reconfigurable computing have addressed architectural and compilation/programming aspects to some extent, there is still not much consensus on what kind of operating system (OS) support should be provided. In this paper, we focus on OS process scheduler, and demonstrate how it can be customized considering the needs of reconfigurable hardware. Our process scheduler is configuration sensitive, that is, it reuses the current FPGA configuration as much as possible. Our extensive experimental results show that the proposed scheduler is superior to classical scheduling algorithms such First-Come-First-Serve (FCFS) and Shortest Job First (SJF).
Computers & Electrical Engineering, 2009
Hardware resource management Task placement Partitioning and scheduling algorithm a b s t r a c t There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications.