prabhat mishra - Academia.edu (original) (raw)
Papers by prabhat mishra
2011 24th Internatioal Conference on VLSI Design, 2011
Post-silicon validation is an essential part of modern integrated circuit design to capture bugs ... more Post-silicon validation is an essential part of modern integrated circuit design to capture bugs and design errors that escape pre-silicon validation phase. A major problem governing post-silicon debug is the observability of internal signals since the chip has already been manufactured. Storage requirements limit the number of signals that can be traced; therefore, a major challenge is how to reconstruct the majority of the remaining signals based on traced values. Existing approaches focus on selecting signals with an emphasis on partial restorability, which does not guarantee a good signal restoration. We propose an approach that efficiently selects a set of signals based on total restorability criteria. Our experimental results demonstrate that our signal selection algorithm is both computationally more efficient and can restore up to three times more signals compared to existing methods. * This work was partially supported by NSF CAREER award 0746261. 1 Partial Restorability of a signal refers to the probability that the signal value can be reconstructed using known values of some other traced signals.
Journal of Low Power Electronics, 2011
System optimization techniques based on efficient dynamic reconfiguration have been widely adopte... more System optimization techniques based on efficient dynamic reconfiguration have been widely adopted in recent years. Cache reconfiguration is a promising optimization technique for reducing memory hierarchy energy consumption with little or no impact on overall system performance. While cache reconfiguration is successful in desktop-based and embedded systems, it is not directly applicable in real-time systems due to timing constraints. Existing scheduling-aware cache reconfiguration techniques consider only one-level cache. It is a major challenge to dynamically tune multi-level caches since the exploration space is prohibitively large. This paper efficiently integrates cache reconfiguration in real-time systems with a unified two-level cache hierarchy. We propose a set of exploration heuristics for our static analysis which effectively reduces the exploration time while keeps the generated profile results beneficial to be leveraged during runtime. Our experimental results have demonstrated 40-58% energy savings with minor impact on performance.
17th International Conference on VLSI Design. Proceedings.
Recent advances on language based software toolkit generation enables performance driven explorat... more Recent advances on language based software toolkit generation enables performance driven exploration of embedded systems by exploiting the application behavior. There is a need for an automatic generation of hardware to determine the required silicon area, clock frequency, and power consumption of the candidate architectures. In this paper, we present a language based exploration framework that automatically generates synthesizable RTL models for pipelined processors. Our framework allows varied microarchitectural modifications, such as, addition of pipeline stages, pipeline paths, opcodes and new functional units. The generated RTL is synthesized to determine the area, power, and clock frequency of the modified architectures. Our exploration results demonstrate the power of reuse in composing heterogeneous architectures using functional abstraction primitives allowing for a reduction in the time for specification and exploration by at least an order of magnitude.
Sustainable Computing: Informatics and Systems, 2012
Dynamic voltage scaling (DVS) has been a very effective technique for processor energy reduction.... more Dynamic voltage scaling (DVS) has been a very effective technique for processor energy reduction. It adjusts processor voltage and frequency level during runtime. In this article, we propose a general and flexible processor voltage scaling algorithm for real-time multitasking systems. Our approach focuses on exploiting dynamic slack that is created when a task finishes earlier than its estimated worst-case execution time (WCET). Our algorithm is efficient enough to execute at runtime and can be configured flexibly to make tradeoffs between running time and energy savings. By rescheduling tasks effectively, we can achieve almost as much energy savings as if there is no arrival time constraints. Furthermore, our approach can effectively incorporate both leakage power consumption as well as variable scaling overhead. Also, it is relatively independent of task characteristics and scheduling policy. Experimental results show that our technique can achieve significant energy savings at runtime over statically generated schedules and up to 12% more savings compared to the state-of-art techniques.
International Journal of Parallel Programming, 2009
Proceedings of the great lakes symposium on VLSI, 2012
Code encryption is a promising approach that encrypts the application binary to protect it from r... more Code encryption is a promising approach that encrypts the application binary to protect it from reverse engineering and tampering, and decrypts the instructions during runtime. A major challenge is to trade-off between the security level and runtime decryption overhead. In this paper, we explore a synergistic combination of various code compression algorithms with code encryption techniques to reduce this overhead. Since decryption overhead (time) is linearly dependent on code size, it is promising to employ compression to reduce code size, and thereby achieve the advantages of both compression and encryption. Experimental results demonstrate that our proposed scheme can employ efficient encryption techniques while significantly improve the performance up to 2.3X (1.5X on average) and reduce energy consumption up to 57% (26% on average), compared to using encryption alone.
Communications in Computer and Information Science, 2011
System-Level Validation, 2012
System-level specifications are widely used to capture a wide spectrum of SoC designs. To enable ... more System-level specifications are widely used to capture a wide spectrum of SoC designs. To enable early stage exploration, it is required that system-level specifications should have both formal (unambiguous) semantics and easy correlation with the architecture manual. However, most system-level specifications are still written in an informal manner. Since informal specifications are not amenable to automated analysis, there are possibilities of ambiguity, incompleteness, and contradiction, which can lead to different interpretations of ...
2007 Design, Automation & Test in Europe Conference & Exhibition, 2007
Memory plays a crucial role in designing embedded systems. A larger memory can accommodate more a... more Memory plays a crucial role in designing embedded systems. A larger memory can accommodate more and large applications but increases cost, area, as well as energy requirements. Code compression techniques address this problem by reducing the size of the applications. While early work on bitmask-based compression has proposed several promising ideas, many challenges remain in applying them to embedded system design. This paper makes two important contributions to address these challenges by developing application-specific bitmask selection and bitmask-aware dictionary selection techniques. We applied these techniques for code compression of TI and MediaBench applications to demonstrate the usefulness of our approach.
2006 IEEE/ACM International Conference on Computer Aided Design, 2006
Embedded systems are constrained by the available memory. Code compression techniques address thi... more Embedded systems are constrained by the available memory. Code compression techniques address this issue by reducing the code size of application programs. Dictionary-based code compression techniques are popular because they offer both good compression ratio and fast decompression scheme. Recently proposed techniques [8, 9] improve standard dictionary-based compression by considering mismatches. This paper makes two important contributions: i) it provides a cost-benefit analysis framework for improving the compression ratio by creating more matching patterns, and ii) it develops an efficient code compression technique using bitmasks to improve the compression ratio without introducing any decompression penalty. To demonstrate the usefulness of our approach we have used applications from various domains and compiled for a wide variety of architectures. Our approach outperforms the existing dictionary-based techniques by an average of 15%, giving a compression ratio of 55%-65%. Fetch and Execute Processor Memory Compressed Code Compression Algorithm Application Program (binary) Decompressio n Mechanism Figure 1: Traditional Code Compression Methodology The first code compression technique for embedded processors was proposed by Wolfe and Chanin [1]. The idea of using a dictionary to store the frequently occurring instruction sequences has been explored Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
System-Level Validation, 2012
Existing directed test generation approaches focus on knowledge forwarding between different boun... more Existing directed test generation approaches focus on knowledge forwarding between different bounds to reduce the test generation time. This chapter describes a test generation technique for multicore architectures that exploits the structural similarity within the same bound as well as between different bounds. It enables the reuse of the knowledge learned from one core to the remaining cores in multicore architectures. The experimental results demonstrate that this approach can significantly reduce overall test generation time ...
2012 IEEE International High Level Design Validation and Test Workshop (HLDVT), 2012
Limited signal observability is a major concern during post-silicon validation. On-chip trace buf... more Limited signal observability is a major concern during post-silicon validation. On-chip trace buffers store a small number of signal states every cycle. Existing signal selection techniques are designed to select a set of signals based on the trace buffer width. In a real-life scenario, it is reasonable that a designer has determined some important signals that must be traced. In this paper, we study the constrained signal selection problem where a set of trace signals are already provided by the designer and the remaining signals have to be determined to improve overall restoration performance. Our experimental results using ISCAS'89 benchmarks demonstrate that up to 5% improvement can be obtained in restoration performance compared to existing approaches.
2007 IEEE International High Level Design Validation and Test Workshop, 2007
SystemC Transaction Level Modeling (TLM) is widely used to reduce the overall design and validati... more SystemC Transaction Level Modeling (TLM) is widely used to reduce the overall design and validation effort of complex System-on-Chip (SOC) architectures. Due to lack of efficient techniques, the amount of reuse between abstraction levels is limited in many scenarios such as reuse of TLM level tests for RTL validation. This paper presents a top-down methodology for generation of RTL tests from SystemC TLM specifications. This paper makes two important contributions: automatic test generation from TLM specification using a transition-based coverage metric and automatic translation of TLM tests into RTL tests using a set of transformation rules. Our initial results using a router design demonstrate the usefulness of our approach by capturing various functional errors as well as inconsistencies in the implementation.
Proceedings of the 18th ACM Great Lakes symposium on VLSI, 2008
2009 IEEE International High Level Design Validation and Test Workshop, 2009
Welcome to the 2009 IEEE International High Level Design Validation and Test Workshop, the 14th i... more Welcome to the 2009 IEEE International High Level Design Validation and Test Workshop, the 14th in a series of events that explores emerging trends, innovative research and scalable solutions in the areas of validation and test for electronic systems. The two day technical program includes exciting sessions on topics such as design validation approaches at RTL and at system-level, high-level
Design, Automation, and Test in Europe, 2006
Functional validation is a major bottleneck in pipelined pro- cessor design. Simulation using fun... more Functional validation is a major bottleneck in pipelined pro- cessor design. Simulation using functional test vectors is the most widely used form of processor validation. While exist- ing model checking based approaches have proposed several promising ideas for efficient test generation, many challenges remain in applying them to realistic pipelined processors. The time and resources required for test generation using
IEEE Design & Test of Computers, 2004
IEEE Design & Test of Computers, 2011
Arxiv preprint arXiv:1109.6840, 2011
This article describes a comprehensive system for surveillance and monitoring applications. The d... more This article describes a comprehensive system for surveillance and monitoring applications. The development of an efficient real time video motion detection system is motivated by their potential for deployment in the areas where security is the main concern. The paper presents a platform for real time video motion detection and subsequent generation of an alarm condition as set by the parameters of the control system. The prototype consists of a mobile platform mounted with RF camera which provides continuous feedback of the environment. The received visual information is then analyzed by user for appropriate control action, thus enabling the user to operate the system from a remote location. The system is also equipped with the ability to process the image of an object and generate control signals which are automatically transmitted to the mobile platform to track the object.
Design Automation for Embedded Systems, 2003
As embedded systems continue to face increasingly higher performance requirements, deeply pipelin... more As embedded systems continue to face increasingly higher performance requirements, deeply pipelined processor architectures are being employed to meet desired system performance. A signi®cant bottleneck in the validation of such systems is the lack of a golden reference model. Thus, many existing techniques employ a bottom-up approach to architecture validation, where the functionality of an existing pipelined architecture is, in essence, reverse-engineered from its implementation. Our validation technique is complementary to these bottom-up approaches. Our approach leverages the system architect's knowledge about the behavior of the pipelined architecture, through Architecture Description Language (ADL) constructs, and thus allows a powerful top±down approach to architecture validation. The most important requirement in top±down validation process is to ensure that the speci®cation (reference model) is golden. Earlier, we have developed validation techniques to ensure that the static behavior of the pipeline is wellformed by analyzing the structural aspects of the speci®cation using a graph based model. In this paper, we verify the dynamic behavior by analyzing the instruction¯ow in the pipeline using a Finite State Machine (FSM) based model to validate several important architectural properties such as determinism and in-order execution in the presence of hazards and multiple exceptions. We applied this methodology to the speci®cation of a representative pipelined processor to demonstrate the usefulness of our approach.
2011 24th Internatioal Conference on VLSI Design, 2011
Post-silicon validation is an essential part of modern integrated circuit design to capture bugs ... more Post-silicon validation is an essential part of modern integrated circuit design to capture bugs and design errors that escape pre-silicon validation phase. A major problem governing post-silicon debug is the observability of internal signals since the chip has already been manufactured. Storage requirements limit the number of signals that can be traced; therefore, a major challenge is how to reconstruct the majority of the remaining signals based on traced values. Existing approaches focus on selecting signals with an emphasis on partial restorability, which does not guarantee a good signal restoration. We propose an approach that efficiently selects a set of signals based on total restorability criteria. Our experimental results demonstrate that our signal selection algorithm is both computationally more efficient and can restore up to three times more signals compared to existing methods. * This work was partially supported by NSF CAREER award 0746261. 1 Partial Restorability of a signal refers to the probability that the signal value can be reconstructed using known values of some other traced signals.
Journal of Low Power Electronics, 2011
System optimization techniques based on efficient dynamic reconfiguration have been widely adopte... more System optimization techniques based on efficient dynamic reconfiguration have been widely adopted in recent years. Cache reconfiguration is a promising optimization technique for reducing memory hierarchy energy consumption with little or no impact on overall system performance. While cache reconfiguration is successful in desktop-based and embedded systems, it is not directly applicable in real-time systems due to timing constraints. Existing scheduling-aware cache reconfiguration techniques consider only one-level cache. It is a major challenge to dynamically tune multi-level caches since the exploration space is prohibitively large. This paper efficiently integrates cache reconfiguration in real-time systems with a unified two-level cache hierarchy. We propose a set of exploration heuristics for our static analysis which effectively reduces the exploration time while keeps the generated profile results beneficial to be leveraged during runtime. Our experimental results have demonstrated 40-58% energy savings with minor impact on performance.
17th International Conference on VLSI Design. Proceedings.
Recent advances on language based software toolkit generation enables performance driven explorat... more Recent advances on language based software toolkit generation enables performance driven exploration of embedded systems by exploiting the application behavior. There is a need for an automatic generation of hardware to determine the required silicon area, clock frequency, and power consumption of the candidate architectures. In this paper, we present a language based exploration framework that automatically generates synthesizable RTL models for pipelined processors. Our framework allows varied microarchitectural modifications, such as, addition of pipeline stages, pipeline paths, opcodes and new functional units. The generated RTL is synthesized to determine the area, power, and clock frequency of the modified architectures. Our exploration results demonstrate the power of reuse in composing heterogeneous architectures using functional abstraction primitives allowing for a reduction in the time for specification and exploration by at least an order of magnitude.
Sustainable Computing: Informatics and Systems, 2012
Dynamic voltage scaling (DVS) has been a very effective technique for processor energy reduction.... more Dynamic voltage scaling (DVS) has been a very effective technique for processor energy reduction. It adjusts processor voltage and frequency level during runtime. In this article, we propose a general and flexible processor voltage scaling algorithm for real-time multitasking systems. Our approach focuses on exploiting dynamic slack that is created when a task finishes earlier than its estimated worst-case execution time (WCET). Our algorithm is efficient enough to execute at runtime and can be configured flexibly to make tradeoffs between running time and energy savings. By rescheduling tasks effectively, we can achieve almost as much energy savings as if there is no arrival time constraints. Furthermore, our approach can effectively incorporate both leakage power consumption as well as variable scaling overhead. Also, it is relatively independent of task characteristics and scheduling policy. Experimental results show that our technique can achieve significant energy savings at runtime over statically generated schedules and up to 12% more savings compared to the state-of-art techniques.
International Journal of Parallel Programming, 2009
Proceedings of the great lakes symposium on VLSI, 2012
Code encryption is a promising approach that encrypts the application binary to protect it from r... more Code encryption is a promising approach that encrypts the application binary to protect it from reverse engineering and tampering, and decrypts the instructions during runtime. A major challenge is to trade-off between the security level and runtime decryption overhead. In this paper, we explore a synergistic combination of various code compression algorithms with code encryption techniques to reduce this overhead. Since decryption overhead (time) is linearly dependent on code size, it is promising to employ compression to reduce code size, and thereby achieve the advantages of both compression and encryption. Experimental results demonstrate that our proposed scheme can employ efficient encryption techniques while significantly improve the performance up to 2.3X (1.5X on average) and reduce energy consumption up to 57% (26% on average), compared to using encryption alone.
Communications in Computer and Information Science, 2011
System-Level Validation, 2012
System-level specifications are widely used to capture a wide spectrum of SoC designs. To enable ... more System-level specifications are widely used to capture a wide spectrum of SoC designs. To enable early stage exploration, it is required that system-level specifications should have both formal (unambiguous) semantics and easy correlation with the architecture manual. However, most system-level specifications are still written in an informal manner. Since informal specifications are not amenable to automated analysis, there are possibilities of ambiguity, incompleteness, and contradiction, which can lead to different interpretations of ...
2007 Design, Automation & Test in Europe Conference & Exhibition, 2007
Memory plays a crucial role in designing embedded systems. A larger memory can accommodate more a... more Memory plays a crucial role in designing embedded systems. A larger memory can accommodate more and large applications but increases cost, area, as well as energy requirements. Code compression techniques address this problem by reducing the size of the applications. While early work on bitmask-based compression has proposed several promising ideas, many challenges remain in applying them to embedded system design. This paper makes two important contributions to address these challenges by developing application-specific bitmask selection and bitmask-aware dictionary selection techniques. We applied these techniques for code compression of TI and MediaBench applications to demonstrate the usefulness of our approach.
2006 IEEE/ACM International Conference on Computer Aided Design, 2006
Embedded systems are constrained by the available memory. Code compression techniques address thi... more Embedded systems are constrained by the available memory. Code compression techniques address this issue by reducing the code size of application programs. Dictionary-based code compression techniques are popular because they offer both good compression ratio and fast decompression scheme. Recently proposed techniques [8, 9] improve standard dictionary-based compression by considering mismatches. This paper makes two important contributions: i) it provides a cost-benefit analysis framework for improving the compression ratio by creating more matching patterns, and ii) it develops an efficient code compression technique using bitmasks to improve the compression ratio without introducing any decompression penalty. To demonstrate the usefulness of our approach we have used applications from various domains and compiled for a wide variety of architectures. Our approach outperforms the existing dictionary-based techniques by an average of 15%, giving a compression ratio of 55%-65%. Fetch and Execute Processor Memory Compressed Code Compression Algorithm Application Program (binary) Decompressio n Mechanism Figure 1: Traditional Code Compression Methodology The first code compression technique for embedded processors was proposed by Wolfe and Chanin [1]. The idea of using a dictionary to store the frequently occurring instruction sequences has been explored Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
System-Level Validation, 2012
Existing directed test generation approaches focus on knowledge forwarding between different boun... more Existing directed test generation approaches focus on knowledge forwarding between different bounds to reduce the test generation time. This chapter describes a test generation technique for multicore architectures that exploits the structural similarity within the same bound as well as between different bounds. It enables the reuse of the knowledge learned from one core to the remaining cores in multicore architectures. The experimental results demonstrate that this approach can significantly reduce overall test generation time ...
2012 IEEE International High Level Design Validation and Test Workshop (HLDVT), 2012
Limited signal observability is a major concern during post-silicon validation. On-chip trace buf... more Limited signal observability is a major concern during post-silicon validation. On-chip trace buffers store a small number of signal states every cycle. Existing signal selection techniques are designed to select a set of signals based on the trace buffer width. In a real-life scenario, it is reasonable that a designer has determined some important signals that must be traced. In this paper, we study the constrained signal selection problem where a set of trace signals are already provided by the designer and the remaining signals have to be determined to improve overall restoration performance. Our experimental results using ISCAS'89 benchmarks demonstrate that up to 5% improvement can be obtained in restoration performance compared to existing approaches.
2007 IEEE International High Level Design Validation and Test Workshop, 2007
SystemC Transaction Level Modeling (TLM) is widely used to reduce the overall design and validati... more SystemC Transaction Level Modeling (TLM) is widely used to reduce the overall design and validation effort of complex System-on-Chip (SOC) architectures. Due to lack of efficient techniques, the amount of reuse between abstraction levels is limited in many scenarios such as reuse of TLM level tests for RTL validation. This paper presents a top-down methodology for generation of RTL tests from SystemC TLM specifications. This paper makes two important contributions: automatic test generation from TLM specification using a transition-based coverage metric and automatic translation of TLM tests into RTL tests using a set of transformation rules. Our initial results using a router design demonstrate the usefulness of our approach by capturing various functional errors as well as inconsistencies in the implementation.
Proceedings of the 18th ACM Great Lakes symposium on VLSI, 2008
2009 IEEE International High Level Design Validation and Test Workshop, 2009
Welcome to the 2009 IEEE International High Level Design Validation and Test Workshop, the 14th i... more Welcome to the 2009 IEEE International High Level Design Validation and Test Workshop, the 14th in a series of events that explores emerging trends, innovative research and scalable solutions in the areas of validation and test for electronic systems. The two day technical program includes exciting sessions on topics such as design validation approaches at RTL and at system-level, high-level
Design, Automation, and Test in Europe, 2006
Functional validation is a major bottleneck in pipelined pro- cessor design. Simulation using fun... more Functional validation is a major bottleneck in pipelined pro- cessor design. Simulation using functional test vectors is the most widely used form of processor validation. While exist- ing model checking based approaches have proposed several promising ideas for efficient test generation, many challenges remain in applying them to realistic pipelined processors. The time and resources required for test generation using
IEEE Design & Test of Computers, 2004
IEEE Design & Test of Computers, 2011
Arxiv preprint arXiv:1109.6840, 2011
This article describes a comprehensive system for surveillance and monitoring applications. The d... more This article describes a comprehensive system for surveillance and monitoring applications. The development of an efficient real time video motion detection system is motivated by their potential for deployment in the areas where security is the main concern. The paper presents a platform for real time video motion detection and subsequent generation of an alarm condition as set by the parameters of the control system. The prototype consists of a mobile platform mounted with RF camera which provides continuous feedback of the environment. The received visual information is then analyzed by user for appropriate control action, thus enabling the user to operate the system from a remote location. The system is also equipped with the ability to process the image of an object and generate control signals which are automatically transmitted to the mobile platform to track the object.
Design Automation for Embedded Systems, 2003
As embedded systems continue to face increasingly higher performance requirements, deeply pipelin... more As embedded systems continue to face increasingly higher performance requirements, deeply pipelined processor architectures are being employed to meet desired system performance. A signi®cant bottleneck in the validation of such systems is the lack of a golden reference model. Thus, many existing techniques employ a bottom-up approach to architecture validation, where the functionality of an existing pipelined architecture is, in essence, reverse-engineered from its implementation. Our validation technique is complementary to these bottom-up approaches. Our approach leverages the system architect's knowledge about the behavior of the pipelined architecture, through Architecture Description Language (ADL) constructs, and thus allows a powerful top±down approach to architecture validation. The most important requirement in top±down validation process is to ensure that the speci®cation (reference model) is golden. Earlier, we have developed validation techniques to ensure that the static behavior of the pipeline is wellformed by analyzing the structural aspects of the speci®cation using a graph based model. In this paper, we verify the dynamic behavior by analyzing the instruction¯ow in the pipeline using a Finite State Machine (FSM) based model to validate several important architectural properties such as determinism and in-order execution in the presence of hazards and multiple exceptions. We applied this methodology to the speci®cation of a representative pipelined processor to demonstrate the usefulness of our approach.