Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review (original) (raw)
Abstract
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (127)
- E. Mocii and M. Pedram, "Best practices in low power design. 1. Power reduction techniques [Tutorial 1]," in IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004. , 2004, pp. xi-xi.
- S. Manne, A. Klauser, and D. Grunwald, "Pipeline gating: speculation control for energy reduction," SIGARCH Comput. Archit. News, vol. 26, pp. 132- 141, 1998.
- F. Richard, S. Perissakis, N. Cardwell, C. Kozyrakis, B. McGaughy, D. Patterson, T. Anderson, and K. Yelick, "The energy efficiency of IRAM architectures," in Proceedings of the 24th annual international symposium on Computer architecture Denver, Colorado, United States: ACM, 1997.
- R. Gonzalez and M. Horowitz, "Energy dissipation in general purpose microprocessors," IEEE Journal of Solid-State Circuits, vol. 31, pp. 1277-1284, Sep 1996.
- E. Acar, R. Arunachalam, and S. R. Nassif, "Predicting Short Circuit Power From Timing Models," Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific, vol. 21-24 pp. 277 -282, 2003.
- C. Piguet, Low-Power CMOS Circuits: Technology, Logic Design and CAD Tools. Boca Raton, FL: CRC Press, Taylor & Francis Group, 2006.
- I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava, "Power Optimization of Variable- Voltage Core-Based Systems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, pp. 1702-1714, December 1999.
- PC Code Memory IR Decoder Datapath
- A. P. Chandrakasan, S. Sheng, and R. W. Broderson, "Low-power CMOS digital design," IEEE Journal of Solid-State Circuits, vol. 27, pp. 473-484, 1992.
- J. Henkel and S. Parameswaran, Designing Embedded Processors: A Low Power Perspective: Springer Netherlands, 2007.
- G. C. Buttazzo, "Rate monotonic vs. EDF: judgment day," Real-Time Syst., vol. 29, pp. 5-26, 2005.
- P. Pillai and K. G. Shin, "Real-time dynamic voltage scaling for low-power embedded operating systems," in Proceedings of the eighteenth ACM symposium on Operating systems principles Banff, Alberta, Canada: ACM, 2001.
- M. Weiser, B. Welch, A. Demers, and S. Shenker, "Scheduling for reduced CPU energy," Proceedings of USENIX Symposium on Operating Systems Design and Implementation, pp. 13-23, 1994.
- J. P. Lehoczky and S. Ramos-Thuel, "An optimal algorithm for scheduling soft-aperiodic tasks in fixed-priority preemptive systems," in Real-Time Systems Symposium, 1992, 1992, pp. 110-123.
- J.-W. Dai, "The Scheduling to Achieve Optimized Performance of Randomly Addressed Polling Protocol," Wireless Personal Communications, vol. 15, pp. 161-179, 2000.
- K. Jeffay, D. F. Stanat, and C. U. Martel, "On non- preemptive scheduling of period and sporadic tasks," in Proceedings of Twelfth Real-Time Systems Symposium, 1991. , 1991, pp. 129-139.
- K. Choi, W. Lee, R. Soma, and M. Pedram, "Dynamic voltage and frequency scaling under a precise energy model considering variable and fixed components of the system power dissipation," in Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design: IEEE Computer Society, 2004.
- H. Jung and M. Pedram, "Improving the Efficiency of Power Management Techniques by Using Bayesian Classification," in 9th International Symposium on Quality Electronic Design, 2008. ISQED 2008. San Jose, CA, , 2008.
- H. Sasaki, Y. Ikeda, M. Kondo, and H. Nakamura, "An intra-task dvfs technique based on statistical analysis of hardware events," in Proceedings of the 4th international conference on Computing frontiers Ischia, Italy: ACM, 2007.
- S. Yang, W. Wolf, N. Vijaykrishnan, D. N. Serpanos, and Y. Xie, "Power Attack Resistant Cryptosystem Design: A Dynamic Voltage and Frequency Switching Approach," in Proceedings of the conference on Design, Automation and Test in Europe -Volume 3: IEEE Computer Society, 2005.
- F. Emnett and M. Biegel, "Power Reduction Through RTL Clock Gating," in SNUG(Synopsis User Group) Conference San Jose, 2000.
- M. Dale, "The Power of RTL Clock-gating," in Chip Design Magazine. vol. 2008 [cited 2008 February]: available from http://www.chipdesignmag.com/display.php?articleI d=915.
- N. Banerjee, K. Roy, H. Mahmoodi, and S. Bhunia, "Low power synthesis of dynamic logic circuits using fine-grained clock gating," in Proceedings of the conference on Design, automation and test in Europe: Proceedings Munich, Germany: European Design and Automation Association, 2006.
- S. V. Kosonocky, A. J. Bhavnagarwala, K. Chin, G. D. Gristede, A.-M. Haen, W. Hwang, M. B. Ketchen, S. Kim, D. R. Knebel, K. W. Warren, and V. Zyuban, "Low-power circuits and technology for wireless digital systems," IBM Journal of Research and Development, vol. 47, pp. 283-298, 2003.
- R. Puri, L. Stok, and S. Bhattacharya, "Keeping hot chips cool," in Proceedings of the 42nd annual conference on Design automation Anaheim, California, USA: ACM, 2005.
- A. Calimera, A. Pullini, A. V. Sathanur, L. Benini, A. Macii, E. Macii, and M. Poncino, "Design of a family of sleep transistor cells for a clustered power- gating flow in 65nm technology," in Proceedings of the 17th great lakes symposium on Great lakes symposium on VLSI Stresa-Lago Maggiore, Italy: ACM, 2007.
- H.-O. Kim, Y. Shin, H. Kim, and I. Eo, "Physical design methodology of power gating circuits for standard-cell-based design," in Proceedings of the 43rd annual conference on Design automation San Francisco, CA, USA: ACM, 2006.
- K. Ghose and M. B. Kamble, "Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation," in Proceedings of the 1999 international symposium on Low power electronics and design San Diego, California, United States: ACM, 1999.
- K. Ghose and M. B. Kamble, "Energy Efficient Cache Organizations for Superscalar Processors," in Power-Driven Microarchitecture Workshop In Conjunction With ISCA98 in Barcelona, 1998.
- J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M. Donahue, J. Eno, G. W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stephany, and S. C. Thierauf, "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor," Digital Tech. J., vol. 9, pp. 49-62, 1997.
- N. Bellas, I. Hajj, and C. Polychronopoulos, "Using dynamic cache management techniques to reduce energy in a high-performance processor," in Proceedings of the 1999 international symposium on Low power electronics and design San Diego, California, United States: ACM, 1999.
- W. Tang, R. Gupta, and A. Nicolau, "Power Savings in Embedded Processors through Decode Filer Cache," in Proceedings of the conference on Design, automation and test in Europe: IEEE Computer Society, 2002.
- J. Kin, M. Gupta, and W. H. Mangione-Smith, "The filter cache: an energy efficient memory structure," in Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture Research Triangle Park, North Carolina, United States: IEEE Computer Society, 1997.
- K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy caches: simple techniques for reducing leakage power," in Proceedings of the 29th annual international symposium on Computer architecture Anchorage, Alaska: IEEE Computer Society, 2002.
- S. Kaxiras, Z. Hu, and M. Martonosi, "Cache decay: exploiting generational behavior to reduce cache leakage power," in Proceedings of the 28th annual international symposium on Computer architecture G\&\#246;teborg, Sweden: ACM, 2001.
- J. Abella, A. Gonzalez, X. Vera, and M. F. P. O'Boyle, "IATAC: a smart predictor to turn-off L2 cache lines," ACM Trans. Archit. Code Optim., vol. 2, pp. 55-77, 2005.
- Z. Hu, S. Kaxiras, and M. Martonosi, "Let caches decay: reducing leakage energy via exploitation of cache generational behavior," ACM Trans. Comput. Syst., vol. 20, pp. 161-190, 2002.
- M. J. Geiger, S. A. McKee, and G. S. Tyson, "Drowsy region-based caches: minimizing both dynamic and static power dissipation," in Proceedings of the 2nd conference on Computing frontiers Ischia, Italy: ACM, 2005.
- S. Kim, N. Vijaykrishnan, M. Kandemir, A. Sivasubramaniam, and M. J. Irwin, "Partitioned instruction cache architecture for energy efficiency," Trans. on Embedded Computing Sys., vol. 2, pp. 163-185, 2003.
- O. Golubeva, M. Loghi, E. Macii, and M. Poncino, "Locality-driven architectural cache sub-banking for leakage energy reduction," in Proceedings of the 2007 international symposium on Low power electronics and design Portland, OR, USA: ACM, 2007.
- C.-L. Su and A. M. Despain, "Cache design trade- offs for power and performance optimization: a case study," in Proceedings of the 1995 international symposium on Low power design Dana Point, California, United States: ACM, 1995.
- Y. Zhang and J. Yang, "Low cost instruction cache designs for tag comparison elimination," in Proceedings of the 2003 international symposium on Low power electronics and design Seoul, Korea: ACM, 2003.
- C. Zhang, F. Vahid, J. Yang, and W. Najjar, "A way-halting cache for low-energy high-performance systems," ACM Trans. Archit. Code Optim., vol. 2, pp. 34-54, 2005.
- K. Inoue, T. Ishihara, and K. Murakami, "Way- predicting set-associative cache for high performance and low energy consumption," in Proceedings of the 1999 international symposium on Low power electronics and design San Diego, California, United States: ACM, 1999.
- R. Panwar and D. Rennels, "Reducing the frequency of tag compares for low power I-cache design," in Proceedings of the 1995 international symposium on Low power design Dana Point, California, United States: ACM, 1995.
- T. Ishihara and F. Fallah, "A non-uniform cache architecture for low power system design," in Proceedings of the 2005 international symposium on Low power electronics and design San Diego, CA, USA: ACM, 2005.
- X. Zhou and P. Petrov, "Low-power cache organization through selective tag translation for embedded processors with virtual memory support," in Proceedings of the 16th ACM Great Lakes symposium on VLSI Philadelphia, PA, USA: ACM, 2006.
- A. Efthymiou and J. D. Garside, "Adaptive Pipeline Depth Control for Processor Power-Management," in IEEE International Conference on Computer Design (ICCD'02), 2002, p. 454.
- H. Ali and B. M. Al-Hashimi, "Architecture Level Power-Performance Tradeoffs for Pipelined Designs," in IEEE International Symposium on Circuits and Systems, 2007. ISCAS 2007. New Orleans, LA, 2007, pp. 1791-1794.
- H. Shimada, H. Ando, and T. Shimada, "Pipeline stage unification: a low-energy consumption technique for future mobile processors," in Proceedings of the 2003 international symposium on Low power electronics and design Seoul, Korea: ACM, 2003.
- S.-J. Ruan, K.-L. Tsai, E. Naroska, and F. Lai, "Bipartitioning and encoding in low-power pipelined circuits," ACM Trans. Des. Autom. Electron. Syst., vol. 10, pp. 24-32, 2005.
- G. D. Micheli, Synthesis and optimization of digital circuits. New York, NY.: McGraw-Hill, 1994.
- R. I. Bahar and S. Manne, "Power and energy reduction via pipeline balancing," SIGARCH Comput. Archit. News, vol. 29, pp. 218-229, 2001.
- A. Hartstein and T. R. Puzak, "The optimum pipeline depth considering both power and performance," ACM Trans. Archit. Code Optim., vol. 1, pp. 369-388, 2004.
- W. B. Jone, J. S. Wang, H. I. Lu, I. P. Hsu, and J. Y. Chen, "Design theory and implementation for low- power segmented bus systems," ACM Trans. Des. Autom. Electron. Syst., vol. 8, pp. 38-54, 2003.
- C. Dake Liu Svensson, "Power consumption estimation in CMOS VLSI chips," IEEE Journal of Solid-State Circuits, vol. 29, pp. 663-670, Jun 1994.
- S. R. Sridhara and N. R. Shanbhag, "A low-power bus design using joint repeater insertion and coding," in Proceedings of the 2005 international symposium on Low power electronics and design San Diego, CA, USA: ACM, 2005.
- M. Ni and S. O. Memik, "Self-heating-aware optimal wire sizing under Elmore delay model," in Proceedings of the conference on Design, automation and test in Europe Nice, France: EDA Consortium, 2007.
- V. Raghunathan, M. B. Srivastava, and R. K. Gupta, "A survey of techniques for energy efficient on-chip communication," in Proceedings of the 40th conference on Design automation Anaheim, CA, USA: ACM, 2003.
- G. C. Cardarilli, M. Salmeri, A. Salsano, and O. Simonelli, "Bus architecture for low-power VLSI digital circuits," in IEEE International Symposium on Circuits and Systems. vol. 4, 1996.
- R. Golshan and B.Haroun, "A novel reduced swing CMOS bus interface circuit for high speed low power VLSI systems," in IEEE International Symposium on Circuits and Systems. ISCAS '94., . vol. 4 London, UK, 1994 pp. 351-354.
- M. R. Stan and W. P. Burleson, "Bus-invert coding for low-power I/O," IEEE Trans. Very Large Scale Integr. Syst., vol. 3, pp. 49-58, 1995.
- M. R. Stan and W. P. Burleson, "Coding a terminated bus for low power," in Proceedings of the Fifth Great Lakes Symposium on VLSI (GLSVLSI'95): IEEE Computer Society, 1995.
- R. Mehra, L. M. Guerra, and J. M. Rabaey, "A partitioning scheme for optimizing interconnect power," IEEE Journal of Solid-State Circuits., vol. 32, pp. 433-443, Mar 1997.
- V. Kursun, R. M. Secareanu, and E. G. Friedman, "Low Power CMOS Bi-Directional Voltage Converter," in Conference Proceedings of the IEEE EDS/CAS Activities in Western New York 2001, pp. 6-7.
- H. Zhang and J. Rabaey, "Low-swing interconnect interface circuits," in Proceedings of the 1998 international symposium on Low power electronics and design Monterey, California, United States: ACM, 1998.
- P. R. Panda and N. D. Dutt, "Reducing Address Bus Transitions for Low Power Memory Mapping," in Proceedings of the 1996 European conference on Design and Test: IEEE Computer Society, 1996.
- G. Ascia, V. Catania, M. Palesi, and A. Parlato, "An evolutionary approach for reducing the energy in address buses," in Proceedings of the 1st international symposium on Information and communication technologies Dublin, Ireland: Trinity College Dublin, 2003.
- L. Benini, G. D. Micheli, E. Macii, D. Sciuto, and C. Silvano, "Address bus encoding techniques for system-level power optimization," in Proceedings of the conference on Design, automation and test in Europe Le Palais des Congrés de Paris, France: IEEE Computer Society, 1998.
- L. Benini, G. d. Micheli, E. Macii, D. Sciuto, and C. Silvano, "Asymptotic Zero-Transition Activity Encoding for Address Busses in Low-Power Microprocessor-Based Systems," in Proceedings of the 7th Great Lakes Symposium on VLSI: IEEE Computer Society, 1997.
- M. Pedram, "Power optimization and management in embedded systems," in Proceedings of the 2001 conference on Asia South Pacific design automation Yokohama, Japan: ACM, 2001.
- C.-L. Su, C.-Y. Tsui, and A. M. Despain, "Saving Power in the Control Path of Embedded Processors," IEEE Des. Test, vol. 11, pp. 24-30, 1994.
- H. Mehta, R. M. Owens, and M. J. Irwin, "Some Issues in Gray Code Addressing," in Proceedings of the 6th Great Lakes Symposium on VLSI: IEEE Computer Society, 1996.
- W.-C. Cheng and M. Pedram, "Power-optimal encoding for DRAM address bus (poster session)," in Proceedings of the 2000 international symposium on Low power electronics and design Rapallo, Italy: ACM, 2000.
- P. A. Kyeounsoo Kim Beerel, "A low-power matrix transposer using MSB-controlled inversion coding," in The First IEEE Asia Pacific Conference on ASICs, 1999. AP-ASIC '99. Seoul, South Korea, 1999, pp. 194-197.
- E. Musoll, T. Lang, and J. Cortadella, "Exploiting the locality of memory references to reduce the address bus energy," in Proceedings of the 1997 international symposium on Low power electronics and design Monterey, California, United States: ACM, 1997.
- L. Benini, G. D. Micheli, E. Macii, M. Poncino, and S. Quer, "System-level power optimization of special purpose applications: the beach solution," in Proceedings of the 1997 international symposium on Low power electronics and design Monterey, California, United States: ACM, 1997.
- Y. Shin and T. Sakurai, "Coupling-driven bus design for low-power application-specific systems," in Proceedings of the 38th conference on Design automation Las Vegas, Nevada, United States: ACM, 2001.
- C.-T. Hsieh and M. Pedram, "Architectural power optimization by bus splitting," in Proceedings of the conference on Design, automation and test in Europe Paris, France: ACM, 2000.
- J. Hu and R. Marculescu, "Exploiting the Routing Flexibility for Energy/Performance Aware Mapping of Regular NoC Architectures," in Proceedings of the conference on Design, Automation and Test in Europe -Volume 1: IEEE Computer Society, 2003.
- W. Hangsheng, P. Li-Shiuan, and M. Sharad, "A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks," in Proceedings of the conference on Design, Automation and Test in Europe -Volume 2: IEEE Computer Society, 2005.
- H. Wang, L.-S. Peh, and S. Malik, "Power-driven Design of Router Microarchitectures in On-chip Networks," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture: IEEE Computer Society, 2003.
- L. N. Chakrapani, P. Korkmaz, V. J. M. III, K. V. Palem, K. Puttaswamy, and W. F. Wong, "The emerging power crisis in embedded processors: what can a poor compiler do?," in Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems Atlanta, Georgia, USA: ACM, 2001.
- H. Lekatsas and W. Wolf, "Code compression for embedded systems," in Proceedings of the 35th annual conference on Design automation San Francisco, California, United States: ACM, 1998.
- S. Furber and J. SparsØ, Principles of Asynchronous Circuit Design -A Systems Perspective. Boston: Kluwer Academic Publishers, 2001.
- K. Y. Yun and D. L. Dill, "Automatic synthesis of 3D asynchronous state machines," in Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design Santa Clara, California, United States: IEEE Computer Society Press, 1992.
- S. B. Furber, P. Day, J. D. Garside, N. C. Paver, and J. V. Woods, "AMULET1: a micropipelined ARM," Compcon Spring '94, Digest of Papers, pp. 476 - 485, 28 Feb.-4 March 1994
- M. Hempstead, N. Tripathi, P. Mauro, G.-Y. Wei, and D. Brooks, "An Ultra Low Power System Architecture for Sensor Network Applications," SIGARCH Comput. Archit. News, vol. 33, pp. 208- 219, 2005.
- V. N. Ekanayake, I. V. Clinton Kelly, and R. Manohar, "BitSNAP: Dynamic Significance Compression for a Low-Energy Sensor Network Asynchronous Processor," in Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems: IEEE Computer Society, 2005.
- A. Bink and R. York, "ARM996HS: The First Licensable, Clockless 32-Bit Processor Core," IEEE Micro, vol. 27, pp. 58-68, 2007.
- C. E. Molnar, R. F. Sproull, and I. E. Sutherland, "Counterflow Pipeline Processor Architecture," Sun Microsystems, Inc. 1994.
- A. Bardsley and D. Edwards, "Compiling the language Balsa to delay insensitive hardware," in Proceedings of the IFIP TC10 WG10.5 international conference on Hardware description languages and their applications Toledo, Spain: Chapman; Hall, Ltd., 1997.
- D. Fang, J. Teifel, and R. Manohar, "A High- Performance Asynchronous FPGA: Test Results," in Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05) -Volume 00: IEEE Computer Society, 2005.
- S. Hauck, S. Burns, G. Borriello, and C. Ebeling, "An FPGA for Implementing Asynchronous Circuits," IEEE Des. Test, vol. 11, pp. 60-69, 1994.
- M. Krsti, E. Grass, F. K. Gürkaynak, and P. Vivet, "Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook," IEEE Des. Test, vol. 24, pp. 430-441, 2007.
- F. K. Gurkaynak, S. Oetiker, H. Kaeslin, N. Felber, and W. Fichtner, "GALS at ETH Zurich: Success or Failure," in Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems: IEEE Computer Society, 2006.
- P. Teehan, M. Greenstreet, and G. Lemieux, "A Survey and Taxonomy of GALS Design Styles," Design & Test of Computers, IEEE, vol. 24, pp. 418-428, 2007.
- E. Beigne and P. Vivet, "Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture," in Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems: IEEE Computer Society, 2006.
- I. M. Panades and A. Greiner, "Bi-Synchronous FIFO for Synchronous Circuit Communication Well Suited for Network-on-Chip in GALS Architectures," in First International Symposium on Networks-on-Chip (NOCS'07), 2007, pp. 83-94.
- A. E. Sjogren and C. J. Myers, "Interfacing synchronous and asynchronous modules within a high-speed pipeline," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 8, pp. 573-583, 2000.
- H. Zhang, V. Prabhu, V. George, M. Wan, M. Benes, A. Abnous, and J. M. Rabaey, "A 1V Heterogeneous Reconfigurable DSP IC for Wireless Baseband Signal Processing," IEEE Journal of Solid State Circuits, vol. 35, pp. 1697-2003, November 2000.
- Y. Li, Z.-y. Wang, and K. Dai, "A Low-Power Application Specific Instruction Set Processor Using Asynchronous Function Units," in Proceedings of the 7th IEEE International Conference on Computer and Information Technology: IEEE Computer Society, 2007.
- G. Estrin, "Organization of Computer Systems: The Fixed-plus Variable Structure Computer," in Proceedings of the Western Joint Computer Conference New York, 1960, pp. 33-40.
- G. Estrin, "Reconfigurable computer origins: the UCLA fixed-plus-variable (F+V) structure computer," Annals of the History of Computing, IEEE, vol. 24, pp. 3-9, 2002.
- F. Barat, R. Lauwereins, and G. Deconinck, "Reconfigurable instruction set processors from a hardware/software perspective," Software
- Engineering, IEEE Transactions on, vol. 28, pp. 847-862, 2002.
- T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W. Luk, and P. Y. K. Cheung, "Reconfigurable computing: architectures and design methods," IEE Proceedings -Computers and Digital Techniques, vol. 152, pp. 193-207, 2005.
- R. Hartenstein, "A decade of reconfigurable computing: a visionary retrospective," in Proceedings of Design, Automation and Test in Europe, 2001. Conference and Exhibition 2001. Munich, Germany, 2001, pp. 642-649.
- F. Barat, M. Jayapala, T. Vander, R. Lauwereins, G. Deconinck, and H. Corporaal, "Low Power Coarse-Grained Reconfigurable Instruction Set Processor," Field-Programmable Logic and Applications; Lecture Notes in Computer Science, vol. 2778/2003, pp. 230-239, 2003.
- M. J. Wirthlin, "A dynamic instruction set computer," in Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines: IEEE Computer Society, 1995.
- A. L. Rosa, L. Lavagno, and C. Passerone, "A software development tool chain for a reconfigurable processor," in Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems Atlanta, Georgia, USA: ACM, 2001.
- R. Lysecky, G. Stitt, and F. Vahid, "Warp Processors," ACM Trans. Des. Autom. Electron. Syst., vol. 11, pp. 659-681, 2006.
- M. Gschwind, E. R. Altman, S. Sathaye, P. Ledak, and D. Appenzeller, "Dynamic and transparent binary translation," Computer, vol. 33, pp. 54-59, 2000.
- V. Bala, E. Duesterwald, and S. Banerjia, "Dynamo: a transparent dynamic optimization system," SIGPLAN Not., vol. 35, pp. 1-12, 2000.
- G. Stitt, R. Lysecky, and F. Vahid, "Dynamic hardware/software partitioning: a first approach," in Proceedings of the 40th conference on Design automation Anaheim, CA, USA: ACM, 2003.
- R. Lysecky and F. Vahid, "A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning," in Proceedings of the conference on Design, Automation and Test in Europe -Volume 1: IEEE Computer Society, 2005.
- R. Lysecky and F. Vahid, "A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning," in Proceedings of the conference on Design, automation and test in Europe -Volume 1: IEEE Computer Society, 2004.
- S. Khawam, I. Nousias, M. Milward, Y. Yi, M. Muir, and T. Arslan, "The Reconfigurable Instruction Cell Array," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 16, pp. 75-85, 2008.
- E. Mirsky and A. e. DeHon, "MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources," in IEEE Symposium on FPGAs for Custom Computing Machines, 1996. Proceedings. Napa Valley, CA, USA, 1996, pp. 157-166.
- P. M. Heysters, G. J. M. Smit, and E. Molenkamp, "Montium -Balancing between Energy- Efficiency, Flexibility and Performance," in Proceedings of ERSA'03, Las Vegas, USA, 2003, pp. 235-241.
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, "Introduction to the cell multiprocessor," IBM J. Res. Dev., vol. 49, pp. 589-604, 2005.
- H. P. Hofstee, "Power Efficient Processor Architecture and The Cell Processor," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture: IEEE Computer Society, 2005.
- J. Cong, Y. Fan, G. Han, and Z. Zhang, "Application-specific instruction generation for configurable processor architectures," in Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays Monterey, California, USA: ACM, 2004.
- G. Martin, "Recent Developments in Configurable and Extensible Processors," in IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06), 2006, pp. 39-44.
- D. Gajski, "NISC: The Ultimate Reconfigurable Component," Center for Embedded Computer Systems October 2003.
- M. Reshadi and D. Gajski, "A cycle-accurate compilation algorithm for custom pipelined datapaths," in Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis Jersey City, NJ, USA: ACM, 2005.
- J. Trajkovic and D. Gajski, "Automatic Data Path Generation from C code for Custom Processors," in International Federation for Information Processing Publications-IFIP, May 2007, pp. 107- 120.