Chokri Souani - Academia.edu (original) (raw)
Papers by Chokri Souani
International audienceMany flow control and resources management algorithms for ATM networks are ... more International audienceMany flow control and resources management algorithms for ATM networks are chosen by the ATM-Forum. But their complexities make the integration in specific circuits expensive and risky. In this paper we present an architecture allowing flow control and resource management in order to prevent network congestion. The developed idea is to design an adaptive flow control algorithm based on two kinds of cells (data and control) ensuring a continuous dialogue between the transmitter and the receiver. Memory resource management uses the linked list allowing optimal use of memories in a circular way. The system design is achieved using a methodology based on VHDL as a hardware description language. The different steps from specification until circuit layout, have been achieved with a 0.6 µm CMOS technology
This paper presents an interactive communication synthesis approach for distributed systems. The ... more This paper presents an interactive communication synthesis approach for distributed systems. The aim of the proposed approach consists in mapping a high level specification into a modular and flexible target architecture. The input specification is composed of a set of finite state machines that communicate via a procedural call mechanism. If we assume that the communication critical part is done through a shared memory and that the partitioning step is performed, this approach allows to refine the communication structures (interfaces, controllers). It allows to reach an operational model easily mappable onto the target architecture. This approach is validated through the design of a communication controlle
Journal of Computer Science, 2006
Neural networks have been widely used for many applications in digital communications. They are a... more Neural networks have been widely used for many applications in digital communications. They are able to give solutions to complex problems due to their nonlinear processing and their learning and generalization. Neural networks are one of the key technologies for the communication domain and accordingly a special effort may be expected to be paid to real time hardware implementation issues. In this study, it is proposed a digital hardware implementation of a neural system based on a multilayer perceptron (MLP). The neural system is used for the nonlinear adaptive prediction of nonstationary signals such as speech signals. The implemented architecture of the MLP is generated using a generic elementary neuron (EN). The polynomial approximation method is used to implement the sigmoidal activation function. The back-propagation algorithm is used to implant the prediction task. The circuit implementation architecture is detailed, for achieving real-time prediction for speech signals. The designed ASIC circuit includes a neural network block, an on-chip learning block and a memory used for storing the synaptic weights for updating.
International audienceWavelet transform coding has been drawing much attention because of its abi... more International audienceWavelet transform coding has been drawing much attention because of its ability to decompose images into a hierarchical structure that is suitable for adaptive processing in the transform domain. This paper presents two efficient and optimized VLSI design of One Dimensional Direct Discrete Wavelet transform processor. The proposed architectures compute three DWT stages and use four parallel filters. The architectures are simple and offers 16-bit precision on input and output data. No memory or registers are used for storing intermediate results. Furthermore, data scheduling and memory management remain very simple. The end result is two efficient VLSI implementations (using CMOS O.6 µm technology)
Real-Time Imaging, 2000
T his paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (... more T his paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in dierent frequency bands (octave bands). We propose a new architecture using parallel ®lters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and oers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four ®lters, and a control unit. The ®lters are of dierent lengths and with new coecients derived from Daubechies ®lter coecients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12 x 10 6 samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI.
Intelligent Techniques in Signal Processing for Multimedia Security, 2016
Representing and extracting good quality of facial feature extraction is an essential step in man... more Representing and extracting good quality of facial feature extraction is an essential step in many applications, such as face recognition, pose normalization, expression recognition, human–computer interaction and face tracking. We are interested in the extraction of the pertinent features in 3D face. In this paper, we propose an improved algorithm for 3D face characterization. We propose novel characteristics based on seven salient points of the 3D face. We have used the Euclidean distances and the angles between these points. This step is highly important in 3D face recognition. Our original technique allows fully automated processing, treating incomplete and noisy input data. Besides, it is robust against holes in a meshed image and insensitive to facial expressions. Moreover, it is suitable for different resolutions of images. All the experiments have been performed on the FRAV3D and GAVAB databases.
Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004.
This paper presents a VLSI implementation of one dimensional direct discrete wavelet transform. W... more This paper presents a VLSI implementation of one dimensional direct discrete wavelet transform. We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. No memory or registers are used for storing intermediate results. The end result is an efficient
Plusieurs algorithmes de controle de flux et de gestion des ressources pour les reseaux ATM ont e... more Plusieurs algorithmes de controle de flux et de gestion des ressources pour les reseaux ATM ont ete retenus par l'ATM-Forum. La complexite accrue de ces algorithmes rend leur integration dans des circuits specifiques delicate. Afin de decongestionner les reseaux a haut debit, nous presentons dans ce papier une architecture permettant le controle de flux et la gestion des ressources. L'idee developpee consiste a concevoir un algorithme de controle de flux adaptatif base sur l'utilisation des cellules de donnees et de controle assurant un dialogue entre l'emetteur et le recepteur. La gestion des ressources en memoire est effectuee de maniere dynamique a l'aide de la technique de la liste chainee permettant une optimisation des espaces memoires. La methodologie de conception du systeme est basee sur l'utilisation du langage de description materielle VHDL. Nous avons realise les differentes etapes, depuis les specifications jusqu'au placement et routage du circuit, a l'aide de la technologie CMOS 0.6 μm.
International Journal of Vehicle Design
2017 International Conference on Control, Automation and Diagnosis (ICCAD)
Electronics
Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceler... more Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared ...
2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)
2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)
IEEE Access
Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI... more Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS. INDEX TERMS Deep learning, electronic design automation, edge computing, FPGA, low power systems.
Sensors
Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardwa... more Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than 0....
Electronics
Motion estimation has become one of the most important techniques used in realtime computer visio... more Motion estimation has become one of the most important techniques used in realtime computer vision application. There are several algorithms to estimate object motions. One of the most widespread techniques consists of calculating the apparent velocity field observed between two successive images of the same scene, known as the optical flow. However, the high accuracy of dense optical flow estimation is costly in run time. In this context, we designed an accurate motion estimation system based on the calculation of the optical flow of a moving object using the Lucas–Kanade algorithm. Our approach was applied on a local treatment region implemented into Raspberry Pi 4, with several improvements. The efficiency of our accurate realtime implementation was demonstrated by the experimental results, showing better performance than with the conventional calculation.
International Journal of Applied Metaheuristic Computing
This article proposes the design of a novel hardware embedded system used for automatic real-time... more This article proposes the design of a novel hardware embedded system used for automatic real-time road sign recognition. The algorithm used was implemented in two main steps. The first step, which detects the road signs, is performed by the maximally stable extremal region method on HSV color space. The second step enables the recognition of the detected signs by using the oriented fast and rotated brief features method. The novelty of the embedded hardware system, on an ARM processor, leads to a real-time implementation of the ADAS applications. The proposed system was tested on the Belgium Traffic Sign Detection and Recognition Benchmark and on the German Traffic Signs Datasets. The proposed approach attained a high detection and recognition rate with real-world situations. The achieved results are acceptable when compared to state-of-the-art systems.
Journal of Ambient Intelligence and Humanized Computing
International audienceMany flow control and resources management algorithms for ATM networks are ... more International audienceMany flow control and resources management algorithms for ATM networks are chosen by the ATM-Forum. But their complexities make the integration in specific circuits expensive and risky. In this paper we present an architecture allowing flow control and resource management in order to prevent network congestion. The developed idea is to design an adaptive flow control algorithm based on two kinds of cells (data and control) ensuring a continuous dialogue between the transmitter and the receiver. Memory resource management uses the linked list allowing optimal use of memories in a circular way. The system design is achieved using a methodology based on VHDL as a hardware description language. The different steps from specification until circuit layout, have been achieved with a 0.6 µm CMOS technology
This paper presents an interactive communication synthesis approach for distributed systems. The ... more This paper presents an interactive communication synthesis approach for distributed systems. The aim of the proposed approach consists in mapping a high level specification into a modular and flexible target architecture. The input specification is composed of a set of finite state machines that communicate via a procedural call mechanism. If we assume that the communication critical part is done through a shared memory and that the partitioning step is performed, this approach allows to refine the communication structures (interfaces, controllers). It allows to reach an operational model easily mappable onto the target architecture. This approach is validated through the design of a communication controlle
Journal of Computer Science, 2006
Neural networks have been widely used for many applications in digital communications. They are a... more Neural networks have been widely used for many applications in digital communications. They are able to give solutions to complex problems due to their nonlinear processing and their learning and generalization. Neural networks are one of the key technologies for the communication domain and accordingly a special effort may be expected to be paid to real time hardware implementation issues. In this study, it is proposed a digital hardware implementation of a neural system based on a multilayer perceptron (MLP). The neural system is used for the nonlinear adaptive prediction of nonstationary signals such as speech signals. The implemented architecture of the MLP is generated using a generic elementary neuron (EN). The polynomial approximation method is used to implement the sigmoidal activation function. The back-propagation algorithm is used to implant the prediction task. The circuit implementation architecture is detailed, for achieving real-time prediction for speech signals. The designed ASIC circuit includes a neural network block, an on-chip learning block and a memory used for storing the synaptic weights for updating.
International audienceWavelet transform coding has been drawing much attention because of its abi... more International audienceWavelet transform coding has been drawing much attention because of its ability to decompose images into a hierarchical structure that is suitable for adaptive processing in the transform domain. This paper presents two efficient and optimized VLSI design of One Dimensional Direct Discrete Wavelet transform processor. The proposed architectures compute three DWT stages and use four parallel filters. The architectures are simple and offers 16-bit precision on input and output data. No memory or registers are used for storing intermediate results. Furthermore, data scheduling and memory management remain very simple. The end result is two efficient VLSI implementations (using CMOS O.6 µm technology)
Real-Time Imaging, 2000
T his paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (... more T his paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in dierent frequency bands (octave bands). We propose a new architecture using parallel ®lters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and oers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four ®lters, and a control unit. The ®lters are of dierent lengths and with new coecients derived from Daubechies ®lter coecients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12 x 10 6 samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI.
Intelligent Techniques in Signal Processing for Multimedia Security, 2016
Representing and extracting good quality of facial feature extraction is an essential step in man... more Representing and extracting good quality of facial feature extraction is an essential step in many applications, such as face recognition, pose normalization, expression recognition, human–computer interaction and face tracking. We are interested in the extraction of the pertinent features in 3D face. In this paper, we propose an improved algorithm for 3D face characterization. We propose novel characteristics based on seven salient points of the 3D face. We have used the Euclidean distances and the angles between these points. This step is highly important in 3D face recognition. Our original technique allows fully automated processing, treating incomplete and noisy input data. Besides, it is robust against holes in a meshed image and insensitive to facial expressions. Moreover, it is suitable for different resolutions of images. All the experiments have been performed on the FRAV3D and GAVAB databases.
Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004.
This paper presents a VLSI implementation of one dimensional direct discrete wavelet transform. W... more This paper presents a VLSI implementation of one dimensional direct discrete wavelet transform. We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. No memory or registers are used for storing intermediate results. The end result is an efficient
Plusieurs algorithmes de controle de flux et de gestion des ressources pour les reseaux ATM ont e... more Plusieurs algorithmes de controle de flux et de gestion des ressources pour les reseaux ATM ont ete retenus par l'ATM-Forum. La complexite accrue de ces algorithmes rend leur integration dans des circuits specifiques delicate. Afin de decongestionner les reseaux a haut debit, nous presentons dans ce papier une architecture permettant le controle de flux et la gestion des ressources. L'idee developpee consiste a concevoir un algorithme de controle de flux adaptatif base sur l'utilisation des cellules de donnees et de controle assurant un dialogue entre l'emetteur et le recepteur. La gestion des ressources en memoire est effectuee de maniere dynamique a l'aide de la technique de la liste chainee permettant une optimisation des espaces memoires. La methodologie de conception du systeme est basee sur l'utilisation du langage de description materielle VHDL. Nous avons realise les differentes etapes, depuis les specifications jusqu'au placement et routage du circuit, a l'aide de la technologie CMOS 0.6 μm.
International Journal of Vehicle Design
2017 International Conference on Control, Automation and Diagnosis (ICCAD)
Electronics
Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceler... more Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared ...
2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)
2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)
IEEE Access
Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI... more Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS. INDEX TERMS Deep learning, electronic design automation, edge computing, FPGA, low power systems.
Sensors
Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardwa... more Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than 0....
Electronics
Motion estimation has become one of the most important techniques used in realtime computer visio... more Motion estimation has become one of the most important techniques used in realtime computer vision application. There are several algorithms to estimate object motions. One of the most widespread techniques consists of calculating the apparent velocity field observed between two successive images of the same scene, known as the optical flow. However, the high accuracy of dense optical flow estimation is costly in run time. In this context, we designed an accurate motion estimation system based on the calculation of the optical flow of a moving object using the Lucas–Kanade algorithm. Our approach was applied on a local treatment region implemented into Raspberry Pi 4, with several improvements. The efficiency of our accurate realtime implementation was demonstrated by the experimental results, showing better performance than with the conventional calculation.
International Journal of Applied Metaheuristic Computing
This article proposes the design of a novel hardware embedded system used for automatic real-time... more This article proposes the design of a novel hardware embedded system used for automatic real-time road sign recognition. The algorithm used was implemented in two main steps. The first step, which detects the road signs, is performed by the maximally stable extremal region method on HSV color space. The second step enables the recognition of the detected signs by using the oriented fast and rotated brief features method. The novelty of the embedded hardware system, on an ARM processor, leads to a real-time implementation of the ADAS applications. The proposed system was tested on the Belgium Traffic Sign Detection and Recognition Benchmark and on the German Traffic Signs Datasets. The proposed approach attained a high detection and recognition rate with real-world situations. The achieved results are acceptable when compared to state-of-the-art systems.
Journal of Ambient Intelligence and Humanized Computing
This paper proposes a method to estimate three parameters: the radius, the depth of buried empty ... more This paper proposes a method to estimate three parameters: the radius, the depth of buried empty cylindrical tubes and the dielectric constant of the surrounding medium, by Ground Penetration Radar (GPR). These parameters are detected and characterized by radargrams. Those radargrams contain a parabolic shape that indicates the presence of target. This is can be achieved through two major phases: the processing stage of the electromagnetic (EM) signals which are received by the GPR. This stage is followed by another one which is the fitting curve of the parabolic shape appeared in the radargram. Finally, the results clearly indicate that this method is perfectly able to estimate the depth within 1.66%, mean average error rate and the relative permittivity of the emulsion of 3.41% and that the radius of 29.52%, whish justify and validates the model used.
Ground penetrating radar (GPR) is a technique based on sending electromagnetic (EM) waves through... more Ground penetrating radar (GPR) is a technique based on sending electromagnetic (EM) waves through the structure and then recording the reflected signals, which present at dielectric discontinuities into the structure. Peak detection of GPR data is important for the diagnosis of ground infrastructures. This paper describes the application of lifting wavelet transform (LWT) to detect peaks in GPR signals. These peaks are used to calculate the two-way travel times ti of the EM waves between the antenna and the layers i. Then distances di are calculated from these times. The measured values (distances) are compared with the manually determined values. The performance of peak detection with Lifting Wavelet Transform was evaluated by computing the mean absolute percentage error (MAPE). The results clearly indicate that this method is perfectly able to estimate thickness within 14.27%, mean average error rate which justifies and validates the model used.
Ground penetrating radar (GPR) is a method founded on sending electromagnetic waves which are tra... more Ground penetrating radar (GPR) is a method founded on sending electromagnetic waves which are transmitted from an antenna and reflect off layers or objects in the earth. GPR is used to detect objects buried in the ground and predict their depths, such as landmines, pipes and cables or any other dielectric material. GPR gives information about the ground infrastructures throughout the scanning of the surface to be examined. In this paper GPR method is used to determine the depth and diameter of the buried tube. The method is based on the analytic signal which is used to extract the signal envelope of GPR data to estimate the depth and the diameter of underground tube. The measurement value is approximately coherent with the real value. This proposed method achieves good result.