Dr. Eng. Nabil Hasasneh | Hebron University (original) (raw)
Uploads
Papers by Dr. Eng. Nabil Hasasneh
2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 2008
This paper presents a concurrent execution model and its micro-architecture based on in-order RIS... more This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing languages. The model is supported in the ISA by number of instructions to create and manage abstract concurrency. The paper estimates the cost of supporting these instructions in silicon. The model and its implementation uses dynamic parameterisation of concurrency creation, where a single instruction captures asynchronous remote function execution, mutual exclusion and the execution of a general concurrent loop structure and all associated communication. Concurrent loops may be dependent or independent, bounded or unbounded and may be nested arbitrarily. Hierarchical concurrency allows compilers to restructure and parallelise sequential code to meet the strict constraints on the model, which provide its freedom from deadlock and locality of communication. Communication is implicit in both the model and micro-architecture, due to the dynamic distribution of concurrency. The result is location-independent binary code that may execute on any number of processors. Simulation and analysis of the micro-architecture indicate that the model is a strong candidate for the exploitation of many-core processors. The results show near-linear speedup over two orders of magnitude of processor scaling, good energy efficiency and tolerance to large latencies in asynchronous operations. This is true for both independent threads as well as for reductions.
Lecture Notes in Computer Science, 2004
This paper presents a model for instruction-level distributed computing that allows the implement... more This paper presents a model for instruction-level distributed computing that allows the implementation of scalable chip multiprocessors. Based on explicit microthreading it serves as a replacement for outof-order instruction issue; it defines the model and explores implementations issues. The model results in a fully distributed implementation in which data is distributed to one register file per processor, which is scalable as the number of ports in each register file is constant. The only component with less than ideal scaling properties is the the switching network between processors.
2013 Fourth International Conference on e-Learning "Best Practices in Management, Design and Development of e-Courses: Standards of Excellence and Creativity", 2013
Over the last few years, we have been building up a set of case studies illustrating different wa... more Over the last few years, we have been building up a set of case studies illustrating different ways in which learning technologies have been implemented across the University. This paper aims at presenting and analyzing the e-learning experience at HU and the most e-course activities that should be used to encourage students to use e-class, and how to use and apply e-learning in schools. We have consistently found high levels of students and staff satisfaction through learning outcomes.
ACM International Conference Proceeding Series, 2002
The micro-threaded microprocessor is a chip multi-processor, which uses a multi-threaded approach... more The micro-threaded microprocessor is a chip multi-processor, which uses a multi-threaded approach, where the threads are obtained from within a single context and exploit both vector and instruction level parallelism (ILP). This approach employs vertical and horizontal transfer in a simple pipeline. The horizontal transfer is referred to as the normal scalar pipeline processing used in most microprocessors. Vertical transfer
Parallel processing letters, 2006
This paper analyses the micro-threaded model of concurrency making comparisons with both data and... more This paper analyses the micro-threaded model of concurrency making comparisons with both data and instruction-level concurrency. The model is fine grain and provides synchronisation in a distributed register file, making it a promising candidate for scalable chip-multiprocessors. The micro-threaded model was first proposed in 1996 as a means to tolerate high latencies in data-parallel, distributed-memory multi-processors. This paper explores the model's opportunity to provide the simultaneous issue of instructions, required for chip multiprocessors, and discusses the issues of scalability with regard to support structures implementing the model and communication in supporting it. The model supports deterministic distribution of code fragments and dynamic scheduling of instructions from within those fragments. The hardware also recognises different classes of variables from the register specifiers, which allows the hardware to manage locality and optimise communication so that it is both efficient and scalable.
The Computer Journal, Jan 1, 2006
Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechani... more Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (ILP). The most significant problem with this approach is a large instruction window and the logic to support instruction issue from it. This includes generating wake-up signals to waiting instructions and a selection mechanism for issuing them. Wide-issue width also requires a large multi-ported register file, so that each instruction can read and write its operands simultaneously. Neither structure scales well with issue width leading to poor performance relative to the gates used. Furthermore, to obtain this ILP, the execution of instructions must proceed speculatively. An alternative, which avoids this complexity in instruction issue and eliminates speculative execution, is the microthreaded model. This model fragments sequential code at compile time and executes the fragments out of order while maintaining in-order execution within the fragments. The only constraints on the execution of fragments are the dependencies between them, which are managed in a distributed and scalable manner using synchronizing registers. The fragments of code are called microthreads and they capture ILP and loop concurrency. Fragments can be interleaved on a single processor to give tolerance to latency in operands or distributed to many processors to achieve speedup. The implementation of this model is fully scalable. It supports distributed instruction issue and a fully scalable register file, which implements a distributed, shared-register model of communication and synchronization between multiple processors on a single chip. This paper introduces the model, compares it with current approaches and presents an analysis of some of the implementation issues. It also presents results showing scalable performance with issue width over several orders of magnitude, from the same binary code.
International Journal of Parallel …, Jan 1, 2006
Chip multiprocessors hold great promise for achieving scalability in future systems.
Computer Systems and …
Page 1. High Level Modelling and Design For a Microthreaded Scheduler to Support Microgrids Nabil... more Page 1. High Level Modelling and Design For a Microthreaded Scheduler to Support Microgrids Nabil Hasasneh Institute for Informatics University of Amsterdam Kruislaan 403 1098 SJ Amsterdam, NL Nabilh@hebron.edu Ian Bell Engineering Dept. ...
Chip multiprocessors hold great promise for achieving scalability in future systems. Microthreade... more Chip multiprocessors hold great promise for achieving scalability in future systems. Microthreaded chip multiprocessors add a means of exploiting legacy code in such systems. Using this model, compilers generate parametric concurrency from sequential source code, which ...
Proceedings 1st MicroGrid …, Jan 1, 2005
KNAW Narcis. Back to search results. Publication Microgrids and Micr-contexts: Support Structures... more KNAW Narcis. Back to search results. Publication Microgrids and Micr-contexts: Support Structures for Microthread... Pagina-navigatie: Main. Title, Microgrids and Micr-contexts: Support Structures for Microthread Scheduling and Synchronisation. ...
Journal of Systems Architecture, Jan 1, 2007
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multi... more This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results using VHDL. The arbiter exploits the advantage of a concurrency control instruction (Brk) provided by the micro-threaded microprocessor model to set the priority processor and move the circulated arbitration token at the most likely processor to issue the create instruction. This mechanism provides latency hiding during token circulation by decoupling the microthreaded processor from the ring's timing. The arbiter provides a very simple arbitration mechanism and can be used for chip multiprocessor arbitration purposes.
Architecture of Computing Systems- …, Jan 1, 2006
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multi... more This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results using VHDL. The arbiter exploits the advantage of a concurrency control instruction (Brk) provided by the micro-threaded microprocessor model to set the priority processor and move the circulated arbitration token at the most likely processor to issue the create instruction. This mechanism provides latency hiding during token circulation by decoupling the micro-threaded processor from the ring's timing. It is shown that this arbiter can be extended easily to support large numbers of processors and can be used for chip multiprocessor arbitration purposes.
2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 2008
This paper presents a concurrent execution model and its micro-architecture based on in-order RIS... more This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing languages. The model is supported in the ISA by number of instructions to create and manage abstract concurrency. The paper estimates the cost of supporting these instructions in silicon. The model and its implementation uses dynamic parameterisation of concurrency creation, where a single instruction captures asynchronous remote function execution, mutual exclusion and the execution of a general concurrent loop structure and all associated communication. Concurrent loops may be dependent or independent, bounded or unbounded and may be nested arbitrarily. Hierarchical concurrency allows compilers to restructure and parallelise sequential code to meet the strict constraints on the model, which provide its freedom from deadlock and locality of communication. Communication is implicit in both the model and micro-architecture, due to the dynamic distribution of concurrency. The result is location-independent binary code that may execute on any number of processors. Simulation and analysis of the micro-architecture indicate that the model is a strong candidate for the exploitation of many-core processors. The results show near-linear speedup over two orders of magnitude of processor scaling, good energy efficiency and tolerance to large latencies in asynchronous operations. This is true for both independent threads as well as for reductions.
Lecture Notes in Computer Science, 2004
This paper presents a model for instruction-level distributed computing that allows the implement... more This paper presents a model for instruction-level distributed computing that allows the implementation of scalable chip multiprocessors. Based on explicit microthreading it serves as a replacement for outof-order instruction issue; it defines the model and explores implementations issues. The model results in a fully distributed implementation in which data is distributed to one register file per processor, which is scalable as the number of ports in each register file is constant. The only component with less than ideal scaling properties is the the switching network between processors.
2013 Fourth International Conference on e-Learning "Best Practices in Management, Design and Development of e-Courses: Standards of Excellence and Creativity", 2013
Over the last few years, we have been building up a set of case studies illustrating different wa... more Over the last few years, we have been building up a set of case studies illustrating different ways in which learning technologies have been implemented across the University. This paper aims at presenting and analyzing the e-learning experience at HU and the most e-course activities that should be used to encourage students to use e-class, and how to use and apply e-learning in schools. We have consistently found high levels of students and staff satisfaction through learning outcomes.
ACM International Conference Proceeding Series, 2002
The micro-threaded microprocessor is a chip multi-processor, which uses a multi-threaded approach... more The micro-threaded microprocessor is a chip multi-processor, which uses a multi-threaded approach, where the threads are obtained from within a single context and exploit both vector and instruction level parallelism (ILP). This approach employs vertical and horizontal transfer in a simple pipeline. The horizontal transfer is referred to as the normal scalar pipeline processing used in most microprocessors. Vertical transfer
Parallel processing letters, 2006
This paper analyses the micro-threaded model of concurrency making comparisons with both data and... more This paper analyses the micro-threaded model of concurrency making comparisons with both data and instruction-level concurrency. The model is fine grain and provides synchronisation in a distributed register file, making it a promising candidate for scalable chip-multiprocessors. The micro-threaded model was first proposed in 1996 as a means to tolerate high latencies in data-parallel, distributed-memory multi-processors. This paper explores the model's opportunity to provide the simultaneous issue of instructions, required for chip multiprocessors, and discusses the issues of scalability with regard to support structures implementing the model and communication in supporting it. The model supports deterministic distribution of code fragments and dynamic scheduling of instructions from within those fragments. The hardware also recognises different classes of variables from the register specifiers, which allows the hardware to manage locality and optimise communication so that it is both efficient and scalable.
The Computer Journal, Jan 1, 2006
Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechani... more Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (ILP). The most significant problem with this approach is a large instruction window and the logic to support instruction issue from it. This includes generating wake-up signals to waiting instructions and a selection mechanism for issuing them. Wide-issue width also requires a large multi-ported register file, so that each instruction can read and write its operands simultaneously. Neither structure scales well with issue width leading to poor performance relative to the gates used. Furthermore, to obtain this ILP, the execution of instructions must proceed speculatively. An alternative, which avoids this complexity in instruction issue and eliminates speculative execution, is the microthreaded model. This model fragments sequential code at compile time and executes the fragments out of order while maintaining in-order execution within the fragments. The only constraints on the execution of fragments are the dependencies between them, which are managed in a distributed and scalable manner using synchronizing registers. The fragments of code are called microthreads and they capture ILP and loop concurrency. Fragments can be interleaved on a single processor to give tolerance to latency in operands or distributed to many processors to achieve speedup. The implementation of this model is fully scalable. It supports distributed instruction issue and a fully scalable register file, which implements a distributed, shared-register model of communication and synchronization between multiple processors on a single chip. This paper introduces the model, compares it with current approaches and presents an analysis of some of the implementation issues. It also presents results showing scalable performance with issue width over several orders of magnitude, from the same binary code.
International Journal of Parallel …, Jan 1, 2006
Chip multiprocessors hold great promise for achieving scalability in future systems.
Computer Systems and …
Page 1. High Level Modelling and Design For a Microthreaded Scheduler to Support Microgrids Nabil... more Page 1. High Level Modelling and Design For a Microthreaded Scheduler to Support Microgrids Nabil Hasasneh Institute for Informatics University of Amsterdam Kruislaan 403 1098 SJ Amsterdam, NL Nabilh@hebron.edu Ian Bell Engineering Dept. ...
Chip multiprocessors hold great promise for achieving scalability in future systems. Microthreade... more Chip multiprocessors hold great promise for achieving scalability in future systems. Microthreaded chip multiprocessors add a means of exploiting legacy code in such systems. Using this model, compilers generate parametric concurrency from sequential source code, which ...
Proceedings 1st MicroGrid …, Jan 1, 2005
KNAW Narcis. Back to search results. Publication Microgrids and Micr-contexts: Support Structures... more KNAW Narcis. Back to search results. Publication Microgrids and Micr-contexts: Support Structures for Microthread... Pagina-navigatie: Main. Title, Microgrids and Micr-contexts: Support Structures for Microthread Scheduling and Synchronisation. ...
Journal of Systems Architecture, Jan 1, 2007
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multi... more This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results using VHDL. The arbiter exploits the advantage of a concurrency control instruction (Brk) provided by the micro-threaded microprocessor model to set the priority processor and move the circulated arbitration token at the most likely processor to issue the create instruction. This mechanism provides latency hiding during token circulation by decoupling the microthreaded processor from the ring's timing. The arbiter provides a very simple arbitration mechanism and can be used for chip multiprocessor arbitration purposes.
Architecture of Computing Systems- …, Jan 1, 2006
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multi... more This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results using VHDL. The arbiter exploits the advantage of a concurrency control instruction (Brk) provided by the micro-threaded microprocessor model to set the priority processor and move the circulated arbitration token at the most likely processor to issue the create instruction. This mechanism provides latency hiding during token circulation by decoupling the micro-threaded processor from the ring's timing. It is shown that this arbiter can be extended easily to support large numbers of processors and can be used for chip multiprocessor arbitration purposes.