Mohsen Ghasempour - Profile on Academia.edu (original) (raw)
Papers by Mohsen Ghasempour
Simulation Modelling Practice and Theory, Sep 1, 2020
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs can not usually use established simulators and, therefore, designers have to adapt an existing one or develop their own from scratch. Traditionally, software-based simulators have been the main platform for architectural exploration. However, the introduction of high-level hardware description languages, such as Bluespec, together with improvements in reconfigurable hardware platforms, provides an opportunity to challenge the traditional notion and consider hardware-accelerated simulators for this purpose. This paper presents a comprehensive analysis of three simulators, a hardware-accelerated one, a high-fidelity software-based one and a mature, generic software one, each of them developed to evaluate different aspects of a novel, unconventional architecture: the SpiNNaker massively-parallel computer. The analysis includes a discussion of the different modelling and implementation trade-offs and a comparison with the real system.
Analysis of FPGA and software approaches to simulate unconventional computer architectures
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs cannot usually use established simulators and, therefore, designers have to develop their own. Traditionally, software simulators have been the main platform for architectural design, based on the conventional wisdom that software is flexible and easy to program, albeit slow, while hardware is fast but difficult to develop. The introduction of high-level hardware description languages (HDLs), such as Bluespec, together with improvements in FPGAs, provide an opportunity to challenge the traditional notion and consider hardware simulators for this purpose. This paper presents a comprehensive analysis of the performance and the implementation effort of two simulators, one FPGA based and one software based, developed to simulate a novel, unconventional architecture. The analysis uses the interconnection network of the SpiNNaker massively-parallel computer as a case study which allows a comparison with the real system.
— Building large computing systems requires first to model them. Modern hardware systems are so c... more — Building large computing systems requires first to model them. Modern hardware systems are so complex that their software models in the desired detail may be too slow. Thus abstract hardware modelling can be appropriate. This paper presents an example software/hardware model built using Bluespec System Verilog (BSV) design flow to give rapid simulation of a hardware system. The chosen example was a hardware model of the on-chip router, on-chip and off-chip network of SpiNNaker for understanding the behaviour of the traffic in the system. A model of a 5×5 SpiNNaker topology has been designed in Virtex-5 FPGA using BSV and a Graphical User Interface (GUI) was developed in LabVIEW for graphical representation of the results. I.
Simulation Modelling Practice and Theory, 2020
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs can not usually use established simulators and, therefore, designers have to adapt an existing one or develop their own from scratch. Traditionally, software-based simulators have been the main platform for architectural exploration. However, the introduction of high-level hardware description languages, such as Bluespec, together with improvements in reconfigurable hardware platforms, provides an opportunity to challenge the traditional notion and consider hardware-accelerated simulators for this purpose. This paper presents a comprehensive analysis of three simulators, a hardware-accelerated one, a high-fidelity software-based one and a mature, generic software one, each of them developed to evaluate different aspects of a novel, unconventional architecture: the SpiNNaker massively-parallel computer. The analysis includes a discussion of the different modelling and implementation trade-offs and a comparison with the real system.
Analysis of FPGA and software approaches to simulate unconventional computer architectures
2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2015
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs cannot usually use established simulators and, therefore, designers have to develop their own. Traditionally, software simulators have been the main platform for architectural design, based on the conventional wisdom that software is flexible and easy to program, albeit slow, while hardware is fast but difficult to develop. The introduction of high-level hardware description languages (HDLs), such as Bluespec, together with improvements in FPGAs, provide an opportunity to challenge the traditional notion and consider hardware simulators for this purpose. This paper presents a comprehensive analysis of the performance and the implementation effort of two simulators, one FPGA based and one software based, developed to simulate a novel, unconventional architecture. The analysis uses the interconnection network of the SpiNNaker massively-parallel computer as a case study which allows a comparison with the real system.
2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014
High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to ma... more High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.
arXiv (Cornell University), Sep 12, 2015
The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and e... more The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and even modern memory controllers use a fixed and run-time-agnostic address mapping. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address mapping process in DRAM controllers, implied by using a fixed address mapping scheme, means that DRAM performance cannot be exploited efficiently. DReAM is a novel hardware technique that can detect a workload-specific address mapping at run-time based on the application access pattern which improves the performance of DRAMs. The experimental results show that DReAM outperforms the best evaluated address mapping on average by 9%, for mapping-sensitive workloads, by 2% for mapping-insensitive workloads, and up to 28% across all the workloads. DReAM can be seen as an insurance policy capable of detecting which scenarios are not well served by the predefined address mapping.
Proceedings of the Second International Symposium on Memory Systems, 2016
The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and e... more The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and even modern memory controllers use a fixed and run-time-agnostic address mapping. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address mapping process in DRAM controllers, implied by using a fixed address mapping scheme, means that DRAM performance cannot be exploited efficiently. DReAM is a novel hardware technique that can detect a workload-specific address mapping at run-time based on the application access pattern which improves the performance of DRAMs. The experimental results show that DReAM outperforms the best evaluated address mapping on average by 9%, for mapping-sensitive workloads, by 2% for mapping-insensitive workloads, and up to 28% across all the workloads. DReAM can be seen as an insurance policy capable of detecting which scenarios are not well served by the predefined address mapping. Row RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality (Baseline) RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [6] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col
RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality RA Bank CH Column Bloc... more RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [ZZZ00] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col (c) Mapping 3:Minimalist Open-Page Scheme [KSJ11] items). This implies that some bits are used as a block offset, as shown in Figure 5.3. Row RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality (Baseline) RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [ZZZ00] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col
Memory controllers have used static page closure policies to decide whether a row should be left ... more Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical. We propose a compact memory address-based encoding technique which can improve or maintain the perform...
Computerised objective measurement of strain in voiced speech
2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015
Objective Assessment of Asthenia using Energy and Low-to-High Spectral Ratio
Proceedings of the 12th International Conference on Signal Processing and Multimedia Applications, 2015
Accelerating Interconnect Analysis Using High-Level HDLs and FPGA, SpiNNaker as a Case Study
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
Ultra-low power transmitter
2012 IEEE International Symposium on Circuits and Systems, 2012
ABSTRACT This paper presents a design of an ultra-low power UWB transmitter based on 4th and 5th ... more ABSTRACT This paper presents a design of an ultra-low power UWB transmitter based on 4th and 5th derivative Gaussian pulse shapes implemented in UMC 90nm CMOS technology. The simulations show 119mV peak to peak pulse amplitude and the pulse width of 240 ps for the 5th derivative Gaussian pulse and 99.71mV pulse amplitude and 190 ps pulse width for the 4th derivative Gaussian pulse. Power consumption of the pulse generators are calculated 30.11 uW and 21.5 uW for the 5th and 4th derivative Gaussian pulse respectively at a 100MHz pulse repeating frequency (PRF). Ultra-low power radio transmission is important in such application contexts as wireless network nodes and sensors powered by energy harvesters.
Building large computing systems requires first to model them. Modern hardware systems are so com... more Building large computing systems requires first to model them. Modern hardware systems are so complex that their software models in the desired detail may be too slow. Thus abstract hardware modelling can be appropriate. This paper presents an example software/hardware model built using Bluespec System Verilog (BSV) design flow to give rapid simulation of a hardware system. The chosen example was a hardware model of the on-chip router, on-chip and off-chip network of SpiNNaker for understanding the behaviour of the traffic in the system. A model of a 5×5 SpiNNaker topology has been designed in Virtex-5 FPGA using BSV and a Graphical User Interface (GUI) was developed in LabVIEW for graphical representation of the results.
Simulation Modelling Practice and Theory, Sep 1, 2020
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs can not usually use established simulators and, therefore, designers have to adapt an existing one or develop their own from scratch. Traditionally, software-based simulators have been the main platform for architectural exploration. However, the introduction of high-level hardware description languages, such as Bluespec, together with improvements in reconfigurable hardware platforms, provides an opportunity to challenge the traditional notion and consider hardware-accelerated simulators for this purpose. This paper presents a comprehensive analysis of three simulators, a hardware-accelerated one, a high-fidelity software-based one and a mature, generic software one, each of them developed to evaluate different aspects of a novel, unconventional architecture: the SpiNNaker massively-parallel computer. The analysis includes a discussion of the different modelling and implementation trade-offs and a comparison with the real system.
Analysis of FPGA and software approaches to simulate unconventional computer architectures
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs cannot usually use established simulators and, therefore, designers have to develop their own. Traditionally, software simulators have been the main platform for architectural design, based on the conventional wisdom that software is flexible and easy to program, albeit slow, while hardware is fast but difficult to develop. The introduction of high-level hardware description languages (HDLs), such as Bluespec, together with improvements in FPGAs, provide an opportunity to challenge the traditional notion and consider hardware simulators for this purpose. This paper presents a comprehensive analysis of the performance and the implementation effort of two simulators, one FPGA based and one software based, developed to simulate a novel, unconventional architecture. The analysis uses the interconnection network of the SpiNNaker massively-parallel computer as a case study which allows a comparison with the real system.
— Building large computing systems requires first to model them. Modern hardware systems are so c... more — Building large computing systems requires first to model them. Modern hardware systems are so complex that their software models in the desired detail may be too slow. Thus abstract hardware modelling can be appropriate. This paper presents an example software/hardware model built using Bluespec System Verilog (BSV) design flow to give rapid simulation of a hardware system. The chosen example was a hardware model of the on-chip router, on-chip and off-chip network of SpiNNaker for understanding the behaviour of the traffic in the system. A model of a 5×5 SpiNNaker topology has been designed in Virtex-5 FPGA using BSV and a Graphical User Interface (GUI) was developed in LabVIEW for graphical representation of the results. I.
Simulation Modelling Practice and Theory, 2020
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs can not usually use established simulators and, therefore, designers have to adapt an existing one or develop their own from scratch. Traditionally, software-based simulators have been the main platform for architectural exploration. However, the introduction of high-level hardware description languages, such as Bluespec, together with improvements in reconfigurable hardware platforms, provides an opportunity to challenge the traditional notion and consider hardware-accelerated simulators for this purpose. This paper presents a comprehensive analysis of three simulators, a hardware-accelerated one, a high-fidelity software-based one and a mature, generic software one, each of them developed to evaluate different aspects of a novel, unconventional architecture: the SpiNNaker massively-parallel computer. The analysis includes a discussion of the different modelling and implementation trade-offs and a comparison with the real system.
Analysis of FPGA and software approaches to simulate unconventional computer architectures
2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2015
The design of new computer architectures relies heavily on simulation. New architectures that inc... more The design of new computer architectures relies heavily on simulation. New architectures that incorporate unconventional features or novel designs cannot usually use established simulators and, therefore, designers have to develop their own. Traditionally, software simulators have been the main platform for architectural design, based on the conventional wisdom that software is flexible and easy to program, albeit slow, while hardware is fast but difficult to develop. The introduction of high-level hardware description languages (HDLs), such as Bluespec, together with improvements in FPGAs, provide an opportunity to challenge the traditional notion and consider hardware simulators for this purpose. This paper presents a comprehensive analysis of the performance and the implementation effort of two simulators, one FPGA based and one software based, developed to simulate a novel, unconventional architecture. The analysis uses the interconnection network of the SpiNNaker massively-parallel computer as a case study which allows a comparison with the real system.
2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014
High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to ma... more High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.
arXiv (Cornell University), Sep 12, 2015
The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and e... more The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and even modern memory controllers use a fixed and run-time-agnostic address mapping. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address mapping process in DRAM controllers, implied by using a fixed address mapping scheme, means that DRAM performance cannot be exploited efficiently. DReAM is a novel hardware technique that can detect a workload-specific address mapping at run-time based on the application access pattern which improves the performance of DRAMs. The experimental results show that DReAM outperforms the best evaluated address mapping on average by 9%, for mapping-sensitive workloads, by 2% for mapping-insensitive workloads, and up to 28% across all the workloads. DReAM can be seen as an insurance policy capable of detecting which scenarios are not well served by the predefined address mapping.
Proceedings of the Second International Symposium on Memory Systems, 2016
The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and e... more The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and even modern memory controllers use a fixed and run-time-agnostic address mapping. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address mapping process in DRAM controllers, implied by using a fixed address mapping scheme, means that DRAM performance cannot be exploited efficiently. DReAM is a novel hardware technique that can detect a workload-specific address mapping at run-time based on the application access pattern which improves the performance of DRAMs. The experimental results show that DReAM outperforms the best evaluated address mapping on average by 9%, for mapping-sensitive workloads, by 2% for mapping-insensitive workloads, and up to 28% across all the workloads. DReAM can be seen as an insurance policy capable of detecting which scenarios are not well served by the predefined address mapping. Row RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality (Baseline) RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [6] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col
RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality RA Bank CH Column Bloc... more RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [ZZZ00] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col (c) Mapping 3:Minimalist Open-Page Scheme [KSJ11] items). This implies that some bits are used as a block offset, as shown in Figure 5.3. Row RA Bank CH Column Block Offset (a) Mapping 1: Maximise row-buffer locality (Baseline) RA Bank CH Column Block Offset Row Row RA Bank CH Column Block Offset XOR (b) Mapping 2: Permutation-based Page Interleaving [ZZZ00] RA Bank CH Column Block Offset Row XOR Col Row RA Bank CH Column Block Offset Col
Memory controllers have used static page closure policies to decide whether a row should be left ... more Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical. We propose a compact memory address-based encoding technique which can improve or maintain the perform...
Computerised objective measurement of strain in voiced speech
2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015
Objective Assessment of Asthenia using Energy and Low-to-High Spectral Ratio
Proceedings of the 12th International Conference on Signal Processing and Multimedia Applications, 2015
Accelerating Interconnect Analysis Using High-Level HDLs and FPGA, SpiNNaker as a Case Study
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
Ultra-low power transmitter
2012 IEEE International Symposium on Circuits and Systems, 2012
ABSTRACT This paper presents a design of an ultra-low power UWB transmitter based on 4th and 5th ... more ABSTRACT This paper presents a design of an ultra-low power UWB transmitter based on 4th and 5th derivative Gaussian pulse shapes implemented in UMC 90nm CMOS technology. The simulations show 119mV peak to peak pulse amplitude and the pulse width of 240 ps for the 5th derivative Gaussian pulse and 99.71mV pulse amplitude and 190 ps pulse width for the 4th derivative Gaussian pulse. Power consumption of the pulse generators are calculated 30.11 uW and 21.5 uW for the 5th and 4th derivative Gaussian pulse respectively at a 100MHz pulse repeating frequency (PRF). Ultra-low power radio transmission is important in such application contexts as wireless network nodes and sensors powered by energy harvesters.
Building large computing systems requires first to model them. Modern hardware systems are so com... more Building large computing systems requires first to model them. Modern hardware systems are so complex that their software models in the desired detail may be too slow. Thus abstract hardware modelling can be appropriate. This paper presents an example software/hardware model built using Bluespec System Verilog (BSV) design flow to give rapid simulation of a hardware system. The chosen example was a hardware model of the on-chip router, on-chip and off-chip network of SpiNNaker for understanding the behaviour of the traffic in the system. A model of a 5×5 SpiNNaker topology has been designed in Virtex-5 FPGA using BSV and a Graphical User Interface (GUI) was developed in LabVIEW for graphical representation of the results.