Farhad Mehdipour - Academia.edu (original) (raw)
Papers by Farhad Mehdipour
ACM Journal on Emerging Technologies in Computing Systems, Apr 27, 2015
2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), 2013
Proceedings of the …, 2007
Most embedded systems rely on batteries as their source of energy, and hence, low power consumpti... more Most embedded systems rely on batteries as their source of energy, and hence, low power consumption is inherently essential for them. In processor-based embedded systems, a large portion of power is consumed for accessing instruction memories (including ...
Fog and Edge Computing
P digm basis [8]. Many time-critical applications generate data continuously and expect the proce... more P digm basis [8]. Many time-critical applications generate data continuously and expect the processed outcome on a real-time basis such as stock market data processing. Benefits 11.2.1 Big data analytics have the following benefits: Improved business: Big data analytics helps organizations harness their data and use it to identify new opportunities, which facilitates smarter business decisions, new revenue opportunities, more effective marketing, better customer service, improved operational efficiency, and higher profits. Cost reduction: Big data analytics can provide significant cost advantages when it comes to storing large amounts of data while doing business in more efficient ways. Faster and better decision making: Businesses are able to analyze information immediately, make decisions, and stay agile. New products and services: With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want.
A large-scale reconfigurable data-path (LSRDP) processor based on single-flux quantum circuits is... more A large-scale reconfigurable data-path (LSRDP) processor based on single-flux quantum circuits is designed to overcome the issues originating from the CMOS technology. The LSRDP micro-architecture design procedure and its outcome will be presented in this paper.
Distributed Computing, 2008
Recently, Large Scale Reconfigurable Data Path (LSRDP) processor has been proposed for the reduct... more Recently, Large Scale Reconfigurable Data Path (LSRDP) processor has been proposed for the reduction of required memory bandwidth in a high performance scientific computing. In this paper, performance evaluations of LSRDP at various memory bandwidths have has been demonstrated for 1-dimensional Heat and 2-dimensional Poisson partial diffrential equations, and for Electron Repulsion Integral (ERI) calculation as target benchmark applications. Executions times for Heat and ERI applications were decreased compared to original execution by the general purpose processor. On the other hand for Poisson application, execution times were increased, since data sorting steps, which are needed for burst transfer from main memory to LSRDP, consume great amount of execution time.
2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2016
Existing platforms fall short in providing effective solutions for big data analytics while the d... more Existing platforms fall short in providing effective solutions for big data analytics while the demands for processing large quantities of data in real-time are increasing. Moving data analytics towards where the data is generated and stored could be a solution for addressing this issue. In this paper, we propose a solution referred as FOG-engine, which is integrated into IoTs near the ground and facilitates data analytics before offloading large amounts of data to a central location. In this work, we introduce a model for data analytic using FOG-engines and discuss our plan for evaluating its efficacy in terms of several performance metrics such as processing speed, network bandwidth, and data transfer size.
Journal of Parallel and Distributed Computing, 2016
Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processo... more Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors‖. The details and the differences between the two versions have been explained in the cover letter.
Advances in Computers, 2016
The volume of generated data increases by the rapid growth of Internet of Things (IoT), leading t... more The volume of generated data increases by the rapid growth of Internet of Things (IoT), leading to the big data proliferation and more opportunities for data centers. Highly virtualized cloud-based datacenters are currently considered for big data analytics. However big data requires datacenters with promoted infrastructure capable of undertaking more responsibilities for handling and analyzing data. Also, as the scale of the datacenter is increasingly expanding, minimizing energy consumption and operational cost is a vital concern. Future datacenters infrastructure including interconnection network, storage and servers should be able to handle big data applications in an energy-efficient way. In this chapter, we aim to explore different aspects of could-based datacenters for big data analytics. First the datacenter architecture including computing and networking technologies as well as data centers for cloud-based services will be illustrated. Then the concept of big data, cloud computing and some of the existing cloud-based datacenter platforms including tools for big data analytics will be introduced. We later discuss the techniques for improving energy efficiency in the cloud-based datacenters for big data analytics. Finally, the current and future trends for datacenters in particular with respect to energy consumption to support big data analytics will be discussed.
2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), 2010
International Journal of Big Data Intelligence, 2015
In today's commercial world, information is becoming a major economic resource thus leading to a ... more In today's commercial world, information is becoming a major economic resource thus leading to a statement-Information is wealth. It is a technical challenge for computer systems in managing and analyzing the large volumes of data coming from a variety of resources continuously over a period. Experts are in a mood of moving towards alternative hardware platforms for achieving high-speed data processing and analysis especially for streaming applications. In this paper, (a) existing trends in big data processing and the necessary systems involved are studied by performing a survey on available platforms, (b) recommended features and suitable hardware systems are proposed based on the operations involved in the processing. Investigation shows that, in combination with CPU and along with GPU, FPGA is a possible alternative. It can be a part of the heterogeneous platform featuring parallelism, pipelining and high performance for the operations involved in big data processing.
2011 3rd Asia Symposium on Quality Electronic Design (ASQED), 2011
2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE), 2011
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
The memory architecture has a significant effect on the flexibility and performance of a coarse-g... more The memory architecture has a significant effect on the flexibility and performance of a coarse-grained reconfigurable array (CGRA), which can be restrained due to configuration overhead and large latency of data transmission. Multi-context structure and data preloading method are widely used in popular CGRAs as a solution to bandwidth bottlenecks of context and data. However, these two schemes cannot balance the computing performance, area overhead, and flexibility. This paper proposed group-based context cache and multi-level data memory architectures to alleviate the bottleneck problems. The group-based context cache was designed to dynamically transfer and buffer context inside CGRA in order to relieve the off-chip memory access for contexts at runtime. The multi-level data memory was designed to add data memories to different CGRA hierarchies, which were used as data buffers for reused input data and intermediate data. The proposed memory architectures are efficient and cost-effective so that performance improvement can be achieved at the cost of minor area overhead. Experiments of H.264 video decoding program and scale invariant feature transform algorithm achieved performance improvements of 19% and 23%, respectively. Further, the complexity of the applications running on CGRA is no longer restricted by the capacity of the on-chip context memory, thereby achieving flexible configuration for CGRA. The memory architectures proposed in this paper were based on a generic CGRA architecture derived from the characteristics found in the majority of existing popular CGRAs. As such, they can be applied to universal CGRAs.
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
In reconfigurable systems, reconfiguration latency is a very important factor impact the system p... more In reconfigurable systems, reconfiguration latency is a very important factor impact the system performance. In this paper, a framework is proposed that integrates the temporal partitioning and physical design phases to perform a static compilation process for reconfigurable computing systems. A temporal partitioning algorithm is proposed which attempts to decrease the time of reconfiguration on a partially reconfigurable hardware. This algorithm attempts to find similar single or pair of operations between subsequent partitions. Considering similar pairs instead of single nodes brings about less complexity for routing process. By using this technique, smaller reconfiguration bit-stream is obtained, which directly decreases the reconfiguration overhead time at the run-time. A complementary algorithm attempts to increase the similarity of subsequent partitions by searching for similar pairs and using a technique called dummy node insertion. An incremental physical design process based on similar configurations produced in the partitioning stage improves the metrics over iterations.
Lecture Notes in Computer Science
2011 IEEE International 3D Systems Integration Conference (3DIC), 2011 IEEE International, 2012
Thermal management is one of the main concerns in three-dimensional integration due to difficulty... more Thermal management is one of the main concerns in three-dimensional integration due to difficulty of dissipating heat through the stack of the integrated circuit. In a 3D stack involving a data-path accelerator, a base processor and memory components, peak temperature reduction is targeted in this paper. A mapping algorithm has been devised in order to distribute operations of data flow graphs evenly over the processing elements of the target accelerator in two steps involving thermal-aware partitioning of input data flow graphs, and thermal-aware mapping of the partitions onto the processing elements. The efficiency of the proposed technique in reducing peak temperature is demonstrated throughout the experiments. I.
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI, 2013
Designing for low power consumption demands power-efficient devices and good design practices to ... more Designing for low power consumption demands power-efficient devices and good design practices to leverage the architectural features without compromising performance. Power estimation at an early stage of electronic design automation (EDA) flow is essential in order to handle the design issues much earlier. In this paper, we are proposing a methodology for evaluating the power in three-dimensional field-programmable gate arrays (3D FPGAs) at an early stage of the design cycle namely at the partitioning step and making it a power-aware stage. As a part of the work, we also estimate the routing resources needed for the power evaluation. Our estimated power values are compared against the values obtained from a 3D place and route tool, TPR along with the added power calculations, which is demonstrating acceptable accuracy. Our results show that there is a scope for achieving desired distribution of power among the layers well before the placement with reasonable deviation in estimation and proves that our methodology is providing opportunity for power management at earlier stages of the design flow.
ACM Journal on Emerging Technologies in Computing Systems, Apr 27, 2015
2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), 2013
Proceedings of the …, 2007
Most embedded systems rely on batteries as their source of energy, and hence, low power consumpti... more Most embedded systems rely on batteries as their source of energy, and hence, low power consumption is inherently essential for them. In processor-based embedded systems, a large portion of power is consumed for accessing instruction memories (including ...
Fog and Edge Computing
P digm basis [8]. Many time-critical applications generate data continuously and expect the proce... more P digm basis [8]. Many time-critical applications generate data continuously and expect the processed outcome on a real-time basis such as stock market data processing. Benefits 11.2.1 Big data analytics have the following benefits: Improved business: Big data analytics helps organizations harness their data and use it to identify new opportunities, which facilitates smarter business decisions, new revenue opportunities, more effective marketing, better customer service, improved operational efficiency, and higher profits. Cost reduction: Big data analytics can provide significant cost advantages when it comes to storing large amounts of data while doing business in more efficient ways. Faster and better decision making: Businesses are able to analyze information immediately, make decisions, and stay agile. New products and services: With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want.
A large-scale reconfigurable data-path (LSRDP) processor based on single-flux quantum circuits is... more A large-scale reconfigurable data-path (LSRDP) processor based on single-flux quantum circuits is designed to overcome the issues originating from the CMOS technology. The LSRDP micro-architecture design procedure and its outcome will be presented in this paper.
Distributed Computing, 2008
Recently, Large Scale Reconfigurable Data Path (LSRDP) processor has been proposed for the reduct... more Recently, Large Scale Reconfigurable Data Path (LSRDP) processor has been proposed for the reduction of required memory bandwidth in a high performance scientific computing. In this paper, performance evaluations of LSRDP at various memory bandwidths have has been demonstrated for 1-dimensional Heat and 2-dimensional Poisson partial diffrential equations, and for Electron Repulsion Integral (ERI) calculation as target benchmark applications. Executions times for Heat and ERI applications were decreased compared to original execution by the general purpose processor. On the other hand for Poisson application, execution times were increased, since data sorting steps, which are needed for burst transfer from main memory to LSRDP, consume great amount of execution time.
2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2016
Existing platforms fall short in providing effective solutions for big data analytics while the d... more Existing platforms fall short in providing effective solutions for big data analytics while the demands for processing large quantities of data in real-time are increasing. Moving data analytics towards where the data is generated and stored could be a solution for addressing this issue. In this paper, we propose a solution referred as FOG-engine, which is integrated into IoTs near the ground and facilitates data analytics before offloading large amounts of data to a central location. In this work, we introduce a model for data analytic using FOG-engines and discuss our plan for evaluating its efficacy in terms of several performance metrics such as processing speed, network bandwidth, and data transfer size.
Journal of Parallel and Distributed Computing, 2016
Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processo... more Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors‖. The details and the differences between the two versions have been explained in the cover letter.
Advances in Computers, 2016
The volume of generated data increases by the rapid growth of Internet of Things (IoT), leading t... more The volume of generated data increases by the rapid growth of Internet of Things (IoT), leading to the big data proliferation and more opportunities for data centers. Highly virtualized cloud-based datacenters are currently considered for big data analytics. However big data requires datacenters with promoted infrastructure capable of undertaking more responsibilities for handling and analyzing data. Also, as the scale of the datacenter is increasingly expanding, minimizing energy consumption and operational cost is a vital concern. Future datacenters infrastructure including interconnection network, storage and servers should be able to handle big data applications in an energy-efficient way. In this chapter, we aim to explore different aspects of could-based datacenters for big data analytics. First the datacenter architecture including computing and networking technologies as well as data centers for cloud-based services will be illustrated. Then the concept of big data, cloud computing and some of the existing cloud-based datacenter platforms including tools for big data analytics will be introduced. We later discuss the techniques for improving energy efficiency in the cloud-based datacenters for big data analytics. Finally, the current and future trends for datacenters in particular with respect to energy consumption to support big data analytics will be discussed.
2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), 2010
International Journal of Big Data Intelligence, 2015
In today's commercial world, information is becoming a major economic resource thus leading to a ... more In today's commercial world, information is becoming a major economic resource thus leading to a statement-Information is wealth. It is a technical challenge for computer systems in managing and analyzing the large volumes of data coming from a variety of resources continuously over a period. Experts are in a mood of moving towards alternative hardware platforms for achieving high-speed data processing and analysis especially for streaming applications. In this paper, (a) existing trends in big data processing and the necessary systems involved are studied by performing a survey on available platforms, (b) recommended features and suitable hardware systems are proposed based on the operations involved in the processing. Investigation shows that, in combination with CPU and along with GPU, FPGA is a possible alternative. It can be a part of the heterogeneous platform featuring parallelism, pipelining and high performance for the operations involved in big data processing.
2011 3rd Asia Symposium on Quality Electronic Design (ASQED), 2011
2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE), 2011
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
The memory architecture has a significant effect on the flexibility and performance of a coarse-g... more The memory architecture has a significant effect on the flexibility and performance of a coarse-grained reconfigurable array (CGRA), which can be restrained due to configuration overhead and large latency of data transmission. Multi-context structure and data preloading method are widely used in popular CGRAs as a solution to bandwidth bottlenecks of context and data. However, these two schemes cannot balance the computing performance, area overhead, and flexibility. This paper proposed group-based context cache and multi-level data memory architectures to alleviate the bottleneck problems. The group-based context cache was designed to dynamically transfer and buffer context inside CGRA in order to relieve the off-chip memory access for contexts at runtime. The multi-level data memory was designed to add data memories to different CGRA hierarchies, which were used as data buffers for reused input data and intermediate data. The proposed memory architectures are efficient and cost-effective so that performance improvement can be achieved at the cost of minor area overhead. Experiments of H.264 video decoding program and scale invariant feature transform algorithm achieved performance improvements of 19% and 23%, respectively. Further, the complexity of the applications running on CGRA is no longer restricted by the capacity of the on-chip context memory, thereby achieving flexible configuration for CGRA. The memory architectures proposed in this paper were based on a generic CGRA architecture derived from the characteristics found in the majority of existing popular CGRAs. As such, they can be applied to universal CGRAs.
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
In reconfigurable systems, reconfiguration latency is a very important factor impact the system p... more In reconfigurable systems, reconfiguration latency is a very important factor impact the system performance. In this paper, a framework is proposed that integrates the temporal partitioning and physical design phases to perform a static compilation process for reconfigurable computing systems. A temporal partitioning algorithm is proposed which attempts to decrease the time of reconfiguration on a partially reconfigurable hardware. This algorithm attempts to find similar single or pair of operations between subsequent partitions. Considering similar pairs instead of single nodes brings about less complexity for routing process. By using this technique, smaller reconfiguration bit-stream is obtained, which directly decreases the reconfiguration overhead time at the run-time. A complementary algorithm attempts to increase the similarity of subsequent partitions by searching for similar pairs and using a technique called dummy node insertion. An incremental physical design process based on similar configurations produced in the partitioning stage improves the metrics over iterations.
Lecture Notes in Computer Science
2011 IEEE International 3D Systems Integration Conference (3DIC), 2011 IEEE International, 2012
Thermal management is one of the main concerns in three-dimensional integration due to difficulty... more Thermal management is one of the main concerns in three-dimensional integration due to difficulty of dissipating heat through the stack of the integrated circuit. In a 3D stack involving a data-path accelerator, a base processor and memory components, peak temperature reduction is targeted in this paper. A mapping algorithm has been devised in order to distribute operations of data flow graphs evenly over the processing elements of the target accelerator in two steps involving thermal-aware partitioning of input data flow graphs, and thermal-aware mapping of the partitions onto the processing elements. The efficiency of the proposed technique in reducing peak temperature is demonstrated throughout the experiments. I.
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI, 2013
Designing for low power consumption demands power-efficient devices and good design practices to ... more Designing for low power consumption demands power-efficient devices and good design practices to leverage the architectural features without compromising performance. Power estimation at an early stage of electronic design automation (EDA) flow is essential in order to handle the design issues much earlier. In this paper, we are proposing a methodology for evaluating the power in three-dimensional field-programmable gate arrays (3D FPGAs) at an early stage of the design cycle namely at the partitioning step and making it a power-aware stage. As a part of the work, we also estimate the routing resources needed for the power evaluation. Our estimated power values are compared against the values obtained from a 3D place and route tool, TPR along with the added power calculations, which is demonstrating acceptable accuracy. Our results show that there is a scope for achieving desired distribution of power among the layers well before the placement with reasonable deviation in estimation and proves that our methodology is providing opportunity for power management at earlier stages of the design flow.