Paris Kitsos - Academia.edu (original) (raw)

Papers by Paris Kitsos

Research paper thumbnail of Compact FPGA architectures for the two-band fast discrete Hartley transform

Microprocess. Microsystems, 2018

The discrete Hartley transform is a real valued transform similar to the complex Fourier transfor... more The discrete Hartley transform is a real valued transform similar to the complex Fourier transform that finds numerous applications in a variety of fields including pattern recognition and signal and image processing. In this paper, we propose and study two compact and versatile hardware architectures for the computation of the 8-point, 16-point and 32-point Two-Band Fast Discrete Hartley Transform. These highly modular architectures have a symmetric and regular structure consisting of two blocks, a multiplication block and an addition/subtraction block. The first architecture utilizes 8 multipliers and 16 adders/subtractors, achieving a maximum clock frequency of 95 MHz. The second architecture utilizes only 4 multipliers and 8 adders/subtractors, achieving a maximum clock frequency of 100 MHz; however it requires additional multiplexers and more clock cycles (from 1 to 58 clock cycles depends on the points) for the computation. As a result, the proposed hardware architectures cons...

Research paper thumbnail of Algorithm 3 (GEA3)

(GEA3) for data encryption. In this paper, alternative hardware implementations of the GEA3 algor... more (GEA3) for data encryption. In this paper, alternative hardware implementations of the GEA3 algorithm are described. GEA3 algorithm is based on the KASUMI block cipher. Various KASUMI block cipher hardware implementations have been examined in order to provide information about the required silicon area and throughput. In order to achieve a significant performance improvement, Double Edge Triggered pipeline technique is used. The S-BOXes, which are fundamental elements of the KASUMI cipher, have been implemented by using combinational logic and ROM memories. The proposed GEA3 algorithm hardware implementation achieves throughput up to 837 Mbps, which is much faster comparing to the previous designs. The whole system is implemented and evaluated by using Field Programmable Gate Array (FPGA) devices. Keywords: GPRS security; GEA3; KASUMI; stream cipher; block cipher; S-BOX; double edge triggered (DET) pipeline.

Research paper thumbnail of A Hardware Implementation of CURUPIRA Block Cipher for Wireless Sensors

2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, 2008

An architecture and VLSI implementation of a new block cipher called Curupira is presented in thi... more An architecture and VLSI implementation of a new block cipher called Curupira is presented in this paper. This cipher is suitable for wireless sensors and RFID applications. Our 0.13 μm implementation requires resources of 9450 gate equivalences and is capable to encrypt a plaintext in 10 clock cycles. The cipher achieves a maximum throughput up to 2361 Mbps at 246 MHz for encrypting/decrypting. When clocked at 100 KHz a throughput of up to 960 Kbps is achieved and an average power of 0.04 mW is drawn.

Research paper thumbnail of System Design and FPGA Implementation for Cognitive Radio Wireless Devices

Lecture Notes in Electrical Engineering, 2012

System design of devices which support cognitive radio enabled for wireless communications is an ... more System design of devices which support cognitive radio enabled for wireless communications is an area that has attracted a lot of attention in recent years. One of the main goals of designers of such devices is to develop systems that minimize interferences among users. Spectrum, frequencies, user behavior, radio architecture and network state are some of the important parameters that cognitive radio designers need to take into consideration when defining a model for communication system based on cognitive radio. ...

Research paper thumbnail of Low Power FPGA Implementations of 256-bit Luffa Hash Function

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, 2010

Low power techniques in a FPGA implementation of the hash function called Luffa are presented in ... more Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.

Research paper thumbnail of Security and Cryptographic Engineering in Embedded Systems

Applications, Optimization, and Advanced Design

INTRODUCTION The immense growth of portable and mobile systems (smart phones, tablets, netbooks) ... more INTRODUCTION The immense growth of portable and mobile systems (smart phones, tablets, netbooks) and the increasing integration of computational logic into any devices through initiatives like future Internet and Internet of things have led to a flourish in embedded system technology stemming the creativity of the IT market to new levels. Wireless, mobile and portable devices are gradually replacing many traditional computer systems due to the increasing user need for mobility in high-end technology

Research paper thumbnail of FPGA-based Design Approaches of Keccak Hash Function

2012 15th Euromicro Conference on Digital System Design, 2012

Keccak hash function has been submitted to SHA-3 competition and it belongs to the final five can... more Keccak hash function has been submitted to SHA-3 competition and it belongs to the final five candidate functions. In this paper FPGA implementations of Keccak function are presented. The designs were coded using HDL language and for the hardware implementation, a XILINX Virtex-5 FPGA was used. Some of the proposed implementations use DSP48E blocks in order to accelerate the designs execution. So, comparisons between the proposed designs in terms of time performance and FPGA resources are given in order to examine the efficiency of the using DSP48E blocks. Also, comparisons with previous published works are provided.

Research paper thumbnail of Hardware implementation of bluetooth security

IEEE Pervasive Computing, 2003

Research paper thumbnail of On the Hardware Implementation of the MUGI Pseudorandom Number Generator

Fifth International Symposium on Communications Systems, Networks and Digital Signal Processing,(CSNDSP’2006), Patras, Greece, Jul 1, 2006

MUGI pseudorandom number generator is presented in this paper. The MUGI generator is part of the ... more MUGI pseudorandom number generator is presented in this paper. The MUGI generator is part of the ISO/IEC 18033-4: 2005 standard and it is expected to be used in many applications. The design has been coded in VHDL and FPGA devices have been used for its hardware implementation. A maximum throughput equal to 7 Gbps is achieved for a clock frequency of 110 MHz. As no other MUGI implementations do exist, the comparison with previous keystream generator implementations such as RC4, E0, A5/1, are given. These ...

Research paper thumbnail of Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Electronics, 2021

The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasin... more The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow and what are the critical pieces towards the most optimal performance. The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms. The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. A much lower bit-width and fixed point arithmetic is used aiming a good result with reduced footprint and power consumption. This study follows the survey based process and it is aimed to provide answers such as to...

Research paper thumbnail of Importing Custom DNN Models on FPGAs

2021 10th Mediterranean Conference on Embedded Computing (MECO), 2021

With the recent advances and the proliferation of low-power and resource constraint embedded hard... more With the recent advances and the proliferation of low-power and resource constraint embedded hardware being deployed in the field, the edge computing paradigm has become more prominent. Nowadays, it is possible to apply Machine Learning (ML) algorithms and Deep learning (DL) architectures, like Deep Neural Networks (DNNs), by exploiting these edge devices. There is also a plethora of software frameworks to transform the typically complex computations and accommodate the implementation process and specifically for those who are targeted to field-programmable gate arrays (FPGAs), quite a few challenges need to be addressed. Therefore, there is an active research development towards providing efficient techniques for applying DNNs models to these devices. In this paper, we propose a methodology which combines two research tools for the generation of a register-transfer Level (RTL) design using a hardware description language (HDL). We evaluate different implementations experimenting wi...

Research paper thumbnail of Configurable Hardware Implementations of Bulk Encryption Units for Wireless Communications

Int. Arab J. Inf. Technol., 2004

Hardware implementations of bulk encryption units for wireless communications are presented in th... more Hardware implementations of bulk encryption units for wireless communications are presented in this paper. These units are based on the Triple DES (TDES) block cipher. The hardware modules can be configured in order to implement either the TDES or the DES block cipher. Three different hardware implementation s of TDES are proposed. The first two implementations are based on the pipeline design technique, while the third implementation uses the traditional feedback logic design technique (looping). In addition, the DES block cipher's S-BOXes have been implemented by Look Up Tables (LUTs) and/or ROM blocks. Comparing with the LUTs, the ROM blocks implementation approach provides higher performance. But, the LUTs implementation approach is used in cases where the ROM blocks are not available. For high-speed performance applications the loop unrolling architecture is selected. The proposed implementation of this architecture achieves 7.36 Gbps data throughput whilst the 16-stage pip...

Research paper thumbnail of Program Committee DSD 2014

Research paper thumbnail of A Hybrid FPGA Trojan Detection Technique Based-on Combinatorial Testing and On-chip Sensing

A hybrid Hardware Trojan detection technique is proposed in this paper that combines Combinatoria... more A hybrid Hardware Trojan detection technique is proposed in this paper that combines Combinatorial Testing in order to consistently trigger the Hardware Trojan, if one is present, and a grid of compact on-chip sensors in order to detect differentiations in the circuit of the FPGA. Each sensor mainly consist of a three stage Ring Oscillator and a compact Residue Number System ring counter and requires just two FPGA slices, leading to a total overhead of less than 2% in hardware resources. The proposed technique was tested on a cryptographic module performing AES cipher. To emulate the effects of a Hardware Trojan, we used a 64-bit Linear Feedback Shift Register. The experimental results prove that the proposed hybrid technique can detect the presence of a Hardware Trojan.

Research paper thumbnail of On the Hardware Implementation Efficiency of CAESAR Authentication Ciphers for FPGA Devices

Ciphers, also known as authenticated encryption methods, are the outcome of the marriage between ... more Ciphers, also known as authenticated encryption methods, are the outcome of the marriage between the fields of mathematics and logic. The association of ciphers and data comprises Cryptography, the science that assures security and secrecy during a discussion of two parties in the presence of a third one; authentication, integrity and confidentiality are the values of the field in which we trust. In this work, four Caesar Round-Two variants are developed with the register transferlevel (RTL) abstraction, described by a hardware design language (HDL), simulated and implemented on Xilinx FPGAs. COLM, SCREAM, POET and Minalpher variants of the contest are all following an indistinguishable process to ensure the aftermath accuracy, competing each other in the meanings of throughput, area, and throughput-to-area (T/A) quota. Results are being presented and discussed over these aspects. Keywords— RTL Implementations; field programmable gate arrays; cryptography; authenticated ciphers; CAE...

Research paper thumbnail of On the Hardware Implementation of the MICKEY-128 Stream Cipher

IACR Cryptol. ePrint Arch., 2005

2Digital Systems and Media Computing Laboratory School of Science & Technology Hellenic Open Univ... more 2Digital Systems and Media Computing Laboratory School of Science & Technology Hellenic Open University Patras, Greece e-mail: pkitsos@ieee.org ABSTRACT Encryption algorithms are becoming more necessary to ensure the securely transmitted data over insecure communication channels. MICKEY-128 is a recently developed stream cipher with two major advantages: (i) the low hardware complexity, which results in small area and (ii) the high level of security. FPGA device was used for the performance demonstration. Some of the first results of implementing the stream cipher on an FPGA are reported. A maximum throughput equal to 170 Mbps can be achieved, with a clock frequency of 170 MHz.

Research paper thumbnail of Compact Hardware Architectures of Enocoro-128v2 Stream Cipher for Constrained Embedded Devices

Electronics

Lightweight cryptography is a vital and fast growing field in today’s world where billions of con... more Lightweight cryptography is a vital and fast growing field in today’s world where billions of constrained devices interact with each other. In this paper, two novel compact architectures of the Enocoro-128v2 stream cipher are presented. The Enocoro-128v2 is part of the ISO/IEC 29192-3 standard. The first architecture has an 8-bit datapath while the second one has a 4-bit datapath. The proposed architectures were implemented on the BASYS3 board (Artix 7 XC7A35T) using the VERILOG hardware description language. The hardware implementation of the proposed 8-bit architecture runs at a 189 MHz clock and reaches a throughput equal to 302 Mbps, while at the same time, it utilizes only 254 Look-up Tables (LUTs) and 330 Flip-flops (FFs). Each round of computations requires 5 clock cycles. The 4-bit implementation has an operating frequency of 204 MHz and reaches a throughput equal to 181 Mbps, with each round requiring 9 clock cycles. The 4-bit implementation utilizes 249 LUTs and 343 FFs. T...

Research paper thumbnail of An 8-bit Serialized Architecture of SEED Block Cipher for Constrained Devices

IET Circuits, Devices & Systems

This study presents an 8-bit serialised architecture of SEED block cipher for constrained devices... more This study presents an 8-bit serialised architecture of SEED block cipher for constrained devices. The circuit utilises 356 FPGA slices and 447 1-bit registers flip-flops (FFs) in the BASYS3 board, operates with an 8-bit datapath and is aimed for use on area constraints devices. In order to keep the usage of hardware resources to a minimum but, at the same time, achieve a high level of security, the key generation process of SEED is implemented through an on-the-fly procedure. In addition, the necessary S-boxes are implemented using composite field arithmetic without using any block RAMs, resulting in a very compact implementation. The proposed architecture achieves a maximum frequency equal to 125 MHz with a total latency of 280 clock cycles and a throughput up to 57.1 Mbps for encryption or decryption.

Research paper thumbnail of An FPGA design for the Two-Band Fast Discrete Hartley Transform

2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2016

The discrete Hartley transform finds numerous applications in signal and image processing. An eff... more The discrete Hartley transform finds numerous applications in signal and image processing. An efficient Field Programmable Gate Array implementation for the 64-point Two-Band Fast Discrete Hartley Transform is proposed in this communication. The architecture requires 57 clock cycles to compute the 64-point Two-Band Fast Discrete Hartley Transform and reaches a rate of up to 103.82 million samples per second at a 92 MHz clock frequency. The architecture has been implemented using VHDL and realized on a Cyclone IV FPGA of Altera.

Research paper thumbnail of MICPRO DSD 2015 special issue

Microprocessors and Microsystems, 2017

Porto with the collaboration of the University of Madeira. The conference addressed all aspects o... more Porto with the collaboration of the University of Madeira. The conference addressed all aspects of digital and mixed hardware/software systems from high-level design down to microarchitectures, digital circuits, and VLSI techniques. The 2015 final program covered a wide variety of topics in the field of digital systems design by providing a set of coherent technical sessions in the conference's main track together with a strong set of Special Sessions. It is a pleasure to express our gratitude to the Special Sessions Chairs for being so active and successful in attracting submissions in new areas, and for managing the review process with care and competence. We were extremely fortunate to count on an exceptional Program Committee composed of active and highly regarded actors in all fields of digital system design. Our thanks to all of them and to the additional reviewers invited to help with this task. The DSD 2015 conference had 165 paper submissions with authors from 39 countries. From these, 72 were selected for oral presentation. All papers were subject to a rigorous blind review process that averaged more than three reviews per paper. The extended papers from DSD 2015 in this special issue were chosen from the set of submissions that obtained the highest scores in the conference review process. The extended versions were handled according to the regular journal review process. The diversity of domains represented in this selection clearly shows the breadth of coverage of the conference. The paper by Skelin et al. add analyzes worst-case performance metrics of parameterized synchronous dataflow models of computation for streaming applications and shows that in many cases the proposed approach enables the derivation of tighter conservative worst-case throughput and latency bounds than nonparametric methods. Their method can also be used to improve the scalability of enumerative analysis techniques. The use of solid-state storage in high-reliability embedded systems has many advantages but introduces issues of wear-out. The paper by McEwan and Komsul addresses techniques for replacing aged solid-state storage devices in RAID systems so that continuous system reliability is ensured while reducing the performance overhead of the reconstruction process. Data from trace-driven simulations show significant improvements in I/O response time.

Research paper thumbnail of Compact FPGA architectures for the two-band fast discrete Hartley transform

Microprocess. Microsystems, 2018

The discrete Hartley transform is a real valued transform similar to the complex Fourier transfor... more The discrete Hartley transform is a real valued transform similar to the complex Fourier transform that finds numerous applications in a variety of fields including pattern recognition and signal and image processing. In this paper, we propose and study two compact and versatile hardware architectures for the computation of the 8-point, 16-point and 32-point Two-Band Fast Discrete Hartley Transform. These highly modular architectures have a symmetric and regular structure consisting of two blocks, a multiplication block and an addition/subtraction block. The first architecture utilizes 8 multipliers and 16 adders/subtractors, achieving a maximum clock frequency of 95 MHz. The second architecture utilizes only 4 multipliers and 8 adders/subtractors, achieving a maximum clock frequency of 100 MHz; however it requires additional multiplexers and more clock cycles (from 1 to 58 clock cycles depends on the points) for the computation. As a result, the proposed hardware architectures cons...

Research paper thumbnail of Algorithm 3 (GEA3)

(GEA3) for data encryption. In this paper, alternative hardware implementations of the GEA3 algor... more (GEA3) for data encryption. In this paper, alternative hardware implementations of the GEA3 algorithm are described. GEA3 algorithm is based on the KASUMI block cipher. Various KASUMI block cipher hardware implementations have been examined in order to provide information about the required silicon area and throughput. In order to achieve a significant performance improvement, Double Edge Triggered pipeline technique is used. The S-BOXes, which are fundamental elements of the KASUMI cipher, have been implemented by using combinational logic and ROM memories. The proposed GEA3 algorithm hardware implementation achieves throughput up to 837 Mbps, which is much faster comparing to the previous designs. The whole system is implemented and evaluated by using Field Programmable Gate Array (FPGA) devices. Keywords: GPRS security; GEA3; KASUMI; stream cipher; block cipher; S-BOX; double edge triggered (DET) pipeline.

Research paper thumbnail of A Hardware Implementation of CURUPIRA Block Cipher for Wireless Sensors

2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, 2008

An architecture and VLSI implementation of a new block cipher called Curupira is presented in thi... more An architecture and VLSI implementation of a new block cipher called Curupira is presented in this paper. This cipher is suitable for wireless sensors and RFID applications. Our 0.13 μm implementation requires resources of 9450 gate equivalences and is capable to encrypt a plaintext in 10 clock cycles. The cipher achieves a maximum throughput up to 2361 Mbps at 246 MHz for encrypting/decrypting. When clocked at 100 KHz a throughput of up to 960 Kbps is achieved and an average power of 0.04 mW is drawn.

Research paper thumbnail of System Design and FPGA Implementation for Cognitive Radio Wireless Devices

Lecture Notes in Electrical Engineering, 2012

System design of devices which support cognitive radio enabled for wireless communications is an ... more System design of devices which support cognitive radio enabled for wireless communications is an area that has attracted a lot of attention in recent years. One of the main goals of designers of such devices is to develop systems that minimize interferences among users. Spectrum, frequencies, user behavior, radio architecture and network state are some of the important parameters that cognitive radio designers need to take into consideration when defining a model for communication system based on cognitive radio. ...

Research paper thumbnail of Low Power FPGA Implementations of 256-bit Luffa Hash Function

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, 2010

Low power techniques in a FPGA implementation of the hash function called Luffa are presented in ... more Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.

Research paper thumbnail of Security and Cryptographic Engineering in Embedded Systems

Applications, Optimization, and Advanced Design

INTRODUCTION The immense growth of portable and mobile systems (smart phones, tablets, netbooks) ... more INTRODUCTION The immense growth of portable and mobile systems (smart phones, tablets, netbooks) and the increasing integration of computational logic into any devices through initiatives like future Internet and Internet of things have led to a flourish in embedded system technology stemming the creativity of the IT market to new levels. Wireless, mobile and portable devices are gradually replacing many traditional computer systems due to the increasing user need for mobility in high-end technology

Research paper thumbnail of FPGA-based Design Approaches of Keccak Hash Function

2012 15th Euromicro Conference on Digital System Design, 2012

Keccak hash function has been submitted to SHA-3 competition and it belongs to the final five can... more Keccak hash function has been submitted to SHA-3 competition and it belongs to the final five candidate functions. In this paper FPGA implementations of Keccak function are presented. The designs were coded using HDL language and for the hardware implementation, a XILINX Virtex-5 FPGA was used. Some of the proposed implementations use DSP48E blocks in order to accelerate the designs execution. So, comparisons between the proposed designs in terms of time performance and FPGA resources are given in order to examine the efficiency of the using DSP48E blocks. Also, comparisons with previous published works are provided.

Research paper thumbnail of Hardware implementation of bluetooth security

IEEE Pervasive Computing, 2003

Research paper thumbnail of On the Hardware Implementation of the MUGI Pseudorandom Number Generator

Fifth International Symposium on Communications Systems, Networks and Digital Signal Processing,(CSNDSP’2006), Patras, Greece, Jul 1, 2006

MUGI pseudorandom number generator is presented in this paper. The MUGI generator is part of the ... more MUGI pseudorandom number generator is presented in this paper. The MUGI generator is part of the ISO/IEC 18033-4: 2005 standard and it is expected to be used in many applications. The design has been coded in VHDL and FPGA devices have been used for its hardware implementation. A maximum throughput equal to 7 Gbps is achieved for a clock frequency of 110 MHz. As no other MUGI implementations do exist, the comparison with previous keystream generator implementations such as RC4, E0, A5/1, are given. These ...

Research paper thumbnail of Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Electronics, 2021

The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasin... more The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow and what are the critical pieces towards the most optimal performance. The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms. The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. A much lower bit-width and fixed point arithmetic is used aiming a good result with reduced footprint and power consumption. This study follows the survey based process and it is aimed to provide answers such as to...

Research paper thumbnail of Importing Custom DNN Models on FPGAs

2021 10th Mediterranean Conference on Embedded Computing (MECO), 2021

With the recent advances and the proliferation of low-power and resource constraint embedded hard... more With the recent advances and the proliferation of low-power and resource constraint embedded hardware being deployed in the field, the edge computing paradigm has become more prominent. Nowadays, it is possible to apply Machine Learning (ML) algorithms and Deep learning (DL) architectures, like Deep Neural Networks (DNNs), by exploiting these edge devices. There is also a plethora of software frameworks to transform the typically complex computations and accommodate the implementation process and specifically for those who are targeted to field-programmable gate arrays (FPGAs), quite a few challenges need to be addressed. Therefore, there is an active research development towards providing efficient techniques for applying DNNs models to these devices. In this paper, we propose a methodology which combines two research tools for the generation of a register-transfer Level (RTL) design using a hardware description language (HDL). We evaluate different implementations experimenting wi...

Research paper thumbnail of Configurable Hardware Implementations of Bulk Encryption Units for Wireless Communications

Int. Arab J. Inf. Technol., 2004

Hardware implementations of bulk encryption units for wireless communications are presented in th... more Hardware implementations of bulk encryption units for wireless communications are presented in this paper. These units are based on the Triple DES (TDES) block cipher. The hardware modules can be configured in order to implement either the TDES or the DES block cipher. Three different hardware implementation s of TDES are proposed. The first two implementations are based on the pipeline design technique, while the third implementation uses the traditional feedback logic design technique (looping). In addition, the DES block cipher's S-BOXes have been implemented by Look Up Tables (LUTs) and/or ROM blocks. Comparing with the LUTs, the ROM blocks implementation approach provides higher performance. But, the LUTs implementation approach is used in cases where the ROM blocks are not available. For high-speed performance applications the loop unrolling architecture is selected. The proposed implementation of this architecture achieves 7.36 Gbps data throughput whilst the 16-stage pip...

Research paper thumbnail of Program Committee DSD 2014

Research paper thumbnail of A Hybrid FPGA Trojan Detection Technique Based-on Combinatorial Testing and On-chip Sensing

A hybrid Hardware Trojan detection technique is proposed in this paper that combines Combinatoria... more A hybrid Hardware Trojan detection technique is proposed in this paper that combines Combinatorial Testing in order to consistently trigger the Hardware Trojan, if one is present, and a grid of compact on-chip sensors in order to detect differentiations in the circuit of the FPGA. Each sensor mainly consist of a three stage Ring Oscillator and a compact Residue Number System ring counter and requires just two FPGA slices, leading to a total overhead of less than 2% in hardware resources. The proposed technique was tested on a cryptographic module performing AES cipher. To emulate the effects of a Hardware Trojan, we used a 64-bit Linear Feedback Shift Register. The experimental results prove that the proposed hybrid technique can detect the presence of a Hardware Trojan.

Research paper thumbnail of On the Hardware Implementation Efficiency of CAESAR Authentication Ciphers for FPGA Devices

Ciphers, also known as authenticated encryption methods, are the outcome of the marriage between ... more Ciphers, also known as authenticated encryption methods, are the outcome of the marriage between the fields of mathematics and logic. The association of ciphers and data comprises Cryptography, the science that assures security and secrecy during a discussion of two parties in the presence of a third one; authentication, integrity and confidentiality are the values of the field in which we trust. In this work, four Caesar Round-Two variants are developed with the register transferlevel (RTL) abstraction, described by a hardware design language (HDL), simulated and implemented on Xilinx FPGAs. COLM, SCREAM, POET and Minalpher variants of the contest are all following an indistinguishable process to ensure the aftermath accuracy, competing each other in the meanings of throughput, area, and throughput-to-area (T/A) quota. Results are being presented and discussed over these aspects. Keywords— RTL Implementations; field programmable gate arrays; cryptography; authenticated ciphers; CAE...

Research paper thumbnail of On the Hardware Implementation of the MICKEY-128 Stream Cipher

IACR Cryptol. ePrint Arch., 2005

2Digital Systems and Media Computing Laboratory School of Science & Technology Hellenic Open Univ... more 2Digital Systems and Media Computing Laboratory School of Science & Technology Hellenic Open University Patras, Greece e-mail: pkitsos@ieee.org ABSTRACT Encryption algorithms are becoming more necessary to ensure the securely transmitted data over insecure communication channels. MICKEY-128 is a recently developed stream cipher with two major advantages: (i) the low hardware complexity, which results in small area and (ii) the high level of security. FPGA device was used for the performance demonstration. Some of the first results of implementing the stream cipher on an FPGA are reported. A maximum throughput equal to 170 Mbps can be achieved, with a clock frequency of 170 MHz.

Research paper thumbnail of Compact Hardware Architectures of Enocoro-128v2 Stream Cipher for Constrained Embedded Devices

Electronics

Lightweight cryptography is a vital and fast growing field in today’s world where billions of con... more Lightweight cryptography is a vital and fast growing field in today’s world where billions of constrained devices interact with each other. In this paper, two novel compact architectures of the Enocoro-128v2 stream cipher are presented. The Enocoro-128v2 is part of the ISO/IEC 29192-3 standard. The first architecture has an 8-bit datapath while the second one has a 4-bit datapath. The proposed architectures were implemented on the BASYS3 board (Artix 7 XC7A35T) using the VERILOG hardware description language. The hardware implementation of the proposed 8-bit architecture runs at a 189 MHz clock and reaches a throughput equal to 302 Mbps, while at the same time, it utilizes only 254 Look-up Tables (LUTs) and 330 Flip-flops (FFs). Each round of computations requires 5 clock cycles. The 4-bit implementation has an operating frequency of 204 MHz and reaches a throughput equal to 181 Mbps, with each round requiring 9 clock cycles. The 4-bit implementation utilizes 249 LUTs and 343 FFs. T...

Research paper thumbnail of An 8-bit Serialized Architecture of SEED Block Cipher for Constrained Devices

IET Circuits, Devices & Systems

This study presents an 8-bit serialised architecture of SEED block cipher for constrained devices... more This study presents an 8-bit serialised architecture of SEED block cipher for constrained devices. The circuit utilises 356 FPGA slices and 447 1-bit registers flip-flops (FFs) in the BASYS3 board, operates with an 8-bit datapath and is aimed for use on area constraints devices. In order to keep the usage of hardware resources to a minimum but, at the same time, achieve a high level of security, the key generation process of SEED is implemented through an on-the-fly procedure. In addition, the necessary S-boxes are implemented using composite field arithmetic without using any block RAMs, resulting in a very compact implementation. The proposed architecture achieves a maximum frequency equal to 125 MHz with a total latency of 280 clock cycles and a throughput up to 57.1 Mbps for encryption or decryption.

Research paper thumbnail of An FPGA design for the Two-Band Fast Discrete Hartley Transform

2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2016

The discrete Hartley transform finds numerous applications in signal and image processing. An eff... more The discrete Hartley transform finds numerous applications in signal and image processing. An efficient Field Programmable Gate Array implementation for the 64-point Two-Band Fast Discrete Hartley Transform is proposed in this communication. The architecture requires 57 clock cycles to compute the 64-point Two-Band Fast Discrete Hartley Transform and reaches a rate of up to 103.82 million samples per second at a 92 MHz clock frequency. The architecture has been implemented using VHDL and realized on a Cyclone IV FPGA of Altera.

Research paper thumbnail of MICPRO DSD 2015 special issue

Microprocessors and Microsystems, 2017

Porto with the collaboration of the University of Madeira. The conference addressed all aspects o... more Porto with the collaboration of the University of Madeira. The conference addressed all aspects of digital and mixed hardware/software systems from high-level design down to microarchitectures, digital circuits, and VLSI techniques. The 2015 final program covered a wide variety of topics in the field of digital systems design by providing a set of coherent technical sessions in the conference's main track together with a strong set of Special Sessions. It is a pleasure to express our gratitude to the Special Sessions Chairs for being so active and successful in attracting submissions in new areas, and for managing the review process with care and competence. We were extremely fortunate to count on an exceptional Program Committee composed of active and highly regarded actors in all fields of digital system design. Our thanks to all of them and to the additional reviewers invited to help with this task. The DSD 2015 conference had 165 paper submissions with authors from 39 countries. From these, 72 were selected for oral presentation. All papers were subject to a rigorous blind review process that averaged more than three reviews per paper. The extended papers from DSD 2015 in this special issue were chosen from the set of submissions that obtained the highest scores in the conference review process. The extended versions were handled according to the regular journal review process. The diversity of domains represented in this selection clearly shows the breadth of coverage of the conference. The paper by Skelin et al. add analyzes worst-case performance metrics of parameterized synchronous dataflow models of computation for streaming applications and shows that in many cases the proposed approach enables the derivation of tighter conservative worst-case throughput and latency bounds than nonparametric methods. Their method can also be used to improve the scalability of enumerative analysis techniques. The use of solid-state storage in high-reliability embedded systems has many advantages but introduces issues of wear-out. The paper by McEwan and Komsul addresses techniques for replacing aged solid-state storage devices in RAID systems so that continuous system reliability is ensured while reducing the performance overhead of the reconstruction process. Data from trace-driven simulations show significant improvements in I/O response time.