Hooman Jarollahi - Profile on Academia.edu (original) (raw)

Papers by Hooman Jarollahi

Associative memories retrieve stored information given partial or erroneous input patterns. Recen... more Associative memories retrieve stored information given partial or erroneous input patterns. Recently, a new family of associative memories based on Clustered-Neural-Networks (CNNs) was introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65% fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work.

Hardware Implementation of Associative Memories Based on Multiple-Valued Sparse Clustered Networks

IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2016

The author has further granted permission to Simon Fraser University to keep or make a digital co... more The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the "Institutional Repository" link of the SFU Library website <www.lib.sfu.ca> at: <http://ir.lib.sfu.ca/handle/1892/112>) and, without changing the content, to translate the thesis/project or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission. Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence. While licensing SFU to permit the above uses, the author retains copyright in the thesis, project or extended essays, including the right to change the work for subsequent purposes, including editing and publishing the work in whole or in part, and licensing other parties, as the author may desire.

This paper addresses a novel five-transistor (5T) CMOS SRAM design with high performance and reli... more This paper addresses a novel five-transistor (5T) CMOS SRAM design with high performance and reliability in 65nm CMOS, and illustrates how it reduces the dynamic power consumption in comparison with the conventional and low-power 6T SRAM counterparts. This design can be used as cache memory in processors and low-power portable devices. The proposed SRAM cell features ~13% area reduction compared to a conventional 6T cell, and features a unique bit-line and negative supply voltage biasing methodology and ground control architecture to enhance performance, and suppress standby leakage power.

Algorithm and implementation of an associative memory for oriented edge detection using improved clustered neural networks

2015 IEEE International Symposium on Circuits and Systems (ISCAS), 2015

SRAM cell with common bit line and source line standby voltage

Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec H3A 0E9, Canada

... Christian B. Peel This dissertation has been read by each member of the following graduate co... more ... Christian B. Peel This dissertation has been read by each member of the following graduate committee and by majority vote has been found to be satisfactory. Date A. Lee Swindlehurst, Chair ... Date A. Lee Swindlehurst Chair, Graduate Committee Accepted for the Department ...

IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2014

This paper presents algorithm, architecture, and fabrication results of a non-volatile context-dr... more This paper presents algorithm, architecture, and fabrication results of a non-volatile context-driven search engine that reduces energy consumption as well as computational delay compared to classical hardware and software-based approaches. The proposed architecture stores only associations between items from multiple search fields in the form of binary links, and merges repeated field items to reduce the memory requirements and accesses. The fabricated chip achieves 13.6× memory reduction and 89% energy saving compared to a classical fieldbased approach in hardware, based on content-addressable memory (CAM). Furthermore, it achieves 8.6× reduced number of clock cycles in performing search operations compared to the CAM, and five orders of magnitude reduced number of clock cycles compared to a fabricated and measured ultra low-power CPU-based counterpart running a classical search algorithm in software. The energy consumption of the proposed architecture is on average three orders of magnitude smaller than that of a software-based approach. A magnetic tunnel junction (MTJ)based logic-in-memory architecture is presented that allows simple routing and eliminates leakage current in standby using 90 nm CMOS/MTJ-hybrid technologies.

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, 2013

, naoya.onizawa} at mail.mcgill.ca, vincent.gripon at telecom-bretagne.eu, warren.gross at mcgill... more , naoya.onizawa} at mail.mcgill.ca, vincent.gripon at telecom-bretagne.eu, warren.gross at mcgill.ca Abstract-A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data. The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a search. Therefore, the dynamic energy consumption of the proposed design is significantly lower compared to that of a conventional low-power CAM design. Given an input tag, the proposed architecture computes a few possibilities for the location of the matched tag and performs the comparisons on them to locate a single valid match. A 0.13µm CMOS technology was used for simulation purposes. The energy consumption and the search delay of the proposed design are 9.5%, and 30.4% of that of the conventional NAND architecture respectively with a 3.4% higher number of transistors.

2014 IEEE Workshop on Signal Processing Systems (SiPS), 2014

In this paper, a context-driven search engine is presented based on a new family of associative m... more In this paper, a context-driven search engine is presented based on a new family of associative memories. It stores only the associations between items from multiple search fields in the form of binary links, and merges repeated field items to reduce the memory requirements. It achieves 13.6× reduction in memory bits and accesses, and 8.6× reduced number of clock cycles in search operation compared to a classical field-based search structure using content-addressable memory. Furthermore, using parallel computational nodes in the proposed search engine, it achieves five orders of magnitude reduced number of clock cycles compared to a CPU-based counterpart running a classical search algorithm in software.

2014 IEEE 44th International Symposium on Multiple-Valued Logic, 2014

Associative memories are structures that store data patterns and retrieve them given partial inpu... more Associative memories are structures that store data patterns and retrieve them given partial inputs. Sparse Clustered Networks (SCNs) are recently-introduced binary-weighted associative memories that significantly improve the storage and retrieval capabilities over the prior state-of-the art. However, deleting or updating the data patterns result in a significant increase in the data retrieval error probability. In this paper, we propose an algorithm to address this problem by incorporating multiple-valued weights for the interconnections used in the network. The proposed algorithm lowers the error rate by an order of magnitude for our sample network with 60% deleted contents. We then investigate the advantages of the proposed algorithm for hardware implementations.

Journal of Signal Processing Systems, 2014

Associative memories retrieve stored information given partial or erroneous input patterns. A new... more Associative memories retrieve stored information given partial or erroneous input patterns. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose fully-parallel hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65% fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work. Furthermore, the scaling behaviour of the implemented architectures for various design choices are investigated. We explore the effect of varying design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000

We propose a low-power Content-Addressable Memory (CAM) employing a new algorithm for associativi... more We propose a low-power Content-Addressable Memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. The proposed architecture is based on a recently developed sparse clustered-network using binary connections that onaverage eliminates most of the parallel comparisons performed during a search. Therefore, the dynamic energy consumption of the proposed design is significantly lower compared to that of a conventional low-power CAM design. Given an input tag, the proposed architecture computes a few possibilities for the location of the matched tag and performs the comparisons on them to locate a single valid match. TSMC 65 nm CMOS technology was used for simulation purposes. Following a selection of design parameters such as the number of CAM entries, the energy consumption and the search delay of the proposed design are 8%, and 26% of that of the conventional NAND architecture respectively with a 10% area overhead. A design methodology based on the silicon-area and power budgets, and performance requirements is discussed.

This paper addresses a novel five-transistor (5T) CMOS SRAM design with high performance and reli... more This paper addresses a novel five-transistor (5T) CMOS SRAM design with high performance and reliability in 65nm CMOS, and illustrates how it reduces the dynamic power consumption in comparison with the conventional and low-power 6T SRAM counterparts. This design can be used as cache memory in processors and lowpower portable devices. The proposed SRAM cell features ~13% area reduction compared to a conventional 6T cell, and features a unique bit-line and negative supply voltage biasing methodology and ground control architecture to enhance performance, and suppress standby leakage power.

Associative memories are structures that can retrieve previously stored information given a parti... more Associative memories are structures that can retrieve previously stored information given a partial input pattern instead of an explicit address as in indexed memories. A few hardware approaches have recently been introduced for a new family of associative memories based on Sparse-Clustered Networks (SCN) that show attractive features. These architectures are suitable for implementations with low retrieval latency, but are limited to small networks that store a few hundred data entries. In this paper, a new hardware architecture of SCNs is proposed that features a new data-storage technique as well as a method we refer to as Selective Decoding (SD-SCN). The SD-SCN has been implemented using a similar FPGA used in the previous efforts and achieves two orders of magnitude higher capacity, with no error-performance penalty but with the cost of few extra clock cycles per data access.