Edna Barros - Academia.edu (original) (raw)
Papers by Edna Barros
Learning and Design of Computational Systems: Integrating the CompSim Simulator to FPGAs
2022 Congreso de Tecnología, Aprendizaje y Enseñanza de la Electrónica (XV Technologies Applied to Electronics Teaching Conference), Jun 29, 2022
Sensors
Pedestrian detection (PD) systems capable of locating pedestrians over large distances and locati... more Pedestrian detection (PD) systems capable of locating pedestrians over large distances and locating them faster are needed in Pedestrian Collision Prediction (PCP) systems to increase the decision-making distance. This paper proposes a performance-optimized FPGA implementation of a HOG-SVM-based PD system with support for image pyramids and detection windows of different sizes to locate near and far pedestrians. This work proposes a hardware architecture that can process one pixel per clock cycle by exploring data and temporal parallelism using techniques such as pipeline and spatial division of data between parallel processing units. The proposed architecture for the PD module was validated in FPGA and integrated with the stereo semi-global matching (SGM) module, also prototyped in FPGA. Processing two windows of different dimensions permitted a reduction in miss rate of at least 6% compared to a uniquely sized window detector. The performances achieved by the PD system and the PCP...
An FPGA-Based RFID Baseband Processor Using a RISC-V Platform
2018 31st Symposium on Integrated Circuits and Systems Design (SBCCI), 2018
Modern applications involving communication systems require the fulfillment of several requiremen... more Modern applications involving communication systems require the fulfillment of several requirements regarding energy consumption, physical area, operational frequency, cost of production and others. Among the technologies used for communication, RFID (Radio Frequency Identification) is a significant and recurring approach to perform identification and control in industrial and commercial applications. The processor architecture chosen to compose such systems must be capable of handling with the real-time computing effort needed to perform the tasks of digital signal processing required to execute RFID applications. This work proposes a combined platform composed by a processor based on the RISC-V Open Instruction Set Architecture and dedicated hardware to perform the tasks of baseband signal processing required for RFID transmission. The RTL system specification was described in VHDL and SystemVerilog languages and prototyped in a Cyclone III FPGA device.
A hardware accelerator for the alignment of multiple DNA sequences in forensic identification
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
The comparison of DNA sequences is a classic problem in molecular biology. Forensic applications ... more The comparison of DNA sequences is a classic problem in molecular biology. Forensic applications uses this comparison for personal identication. For instance, in the USA, the CODES system has today 14.9 million DNA proles stored on its database. To accelerate the recurrent task to query into similar databases, this work presents a hardware acclerator for the parallel alignment of multiple DNA sequences, aiming for the maximum throughput. The proposed accelerator architecture optimizes the use of hardware resources, the data access strategy and, as a result, memory bandwidth. The experiments were conducted using a DNA database with 8 million individuals, in which, each of them is represented using a set of 15 sequences with a length of 256 nucleotides. In this case study, a prototype of the proposed hardware accelerator using a single Stratix IV FPGA and running at the frequency of 250MHz outperforms by tens of times consolidated software applications like SWIPE and FASTA which are running in a GPP platform, as well as an optimized GPU implementation in OpenCL.
Towards better generalization in WLAN positioning systems with genetic algorithms and neural networks
Proceedings of the Genetic and Evolutionary Computation Conference, 2019
The most widely used positioning system today is the GPS (Global Positioning System), which has m... more The most widely used positioning system today is the GPS (Global Positioning System), which has many commercial, civil and military applications, being present in most smartphones. However, this system does not perform well in indoor locations, which poses a constraint for the positioning task on environments like shopping malls, office buildings, and other public places. In this context, WLAN positioning systems based on fingerprinting have attracted a lot of attention as a promising approach for indoor localization while using the existing infrastructure. This paper contributes to this field by presenting a methodology for developing WLAN positioning systems using genetic algorithms and neural networks. The fitness function of the genetic algorithm is based on the generalization capabilities of the network for test points that are not included in the training set. By using this approach, we have achieved state-of-the-art results with few parameters, and our method has shown to be less prone to overfitting than other techniques in the literature, showing better generalization in points that are not recorded on the radio map.
Porosity features extraction based on image segmentation technique applying k-means clustering algorithm
Occupancy Grid Map Estimation Based on Visual SLAM and Ground Segmentation
2021 Latin American Robotics Symposium (LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021 Workshop on Robotics in Education (WRE), 2021
Feature-based SLAM is efficient, fast, and can offer an accurate localization system; on the othe... more Feature-based SLAM is efficient, fast, and can offer an accurate localization system; on the other hand, the map produced is a sparse representation of the environment, limiting path planning activities and reducing robotic autonomy. We extend this mapping stage to build an occupancy grid map given the sparse point cloud. Our method uses the pose estimation from the SLAM system, its sparse map, and an image segmentation technique. Tests made in synthetic and real-world environments demonstrate maps with high precision and excellent coverage. Furthermore, the application can run in conjunction with the SLAM system in real-time while requiring a low memory footprint. Finally, the map generated represents high-level information that allows a link between a feature-based SLAM and navigation tasks.
Design Automation for Embedded Systems, 2019
A system to automatically recognize vehicle license plates is a growing need to improve safety an... more A system to automatically recognize vehicle license plates is a growing need to improve safety and traffic control, specifically in major urban centers. However, the license plate recognition task is generally computationally intensive, where the entire input image frame is scanned, the found plates are segmented, and character recognition is then performed for each segmented character. This paper presents a methodology for engineering a system to detect and recognize Brazilian license plates using convolutional neural networks (CNN) that is suitable for embedded systems. The resulting system detects license plates in the captured image using Tiny YOLOv3 architecture and identifies its characters using a second convolutional network trained on synthetic images and fine-tuned with real license plate images. The proposed architecture has demonstrated to be robust to angle, lightning, and noise variations while requiring a single forward pass for each network, therefore allowing faster processing compared to other deep learning approaches. Our methodology was validated using real license plate images under different environmental conditions reached a detection rate of 99.37% and an overall recognition rate of 98.43% while showing an average time of 2.70 s to process 1024 × 768 images with a single license plate in a Raspberry Pi3 (ARM Cortex-A53 CPU). To improve the recognition accuracy, an ensemble of CNN models was tested instead of a single CNN model, which resulted in an increase in the average processing time to 4.88 s for each image while increasing the recognition rate to 99.53%. Finally, we discuss the impact of using an ensemble of CNNs considering the accuracy-performance trade-off when engineering embedded systems for license plate recognition.
Journal of Real-Time Image Processing, 2019
Stereo matching approaches are an appealing choice for acquiring depth information in a number of... more Stereo matching approaches are an appealing choice for acquiring depth information in a number of video processing applications. It is desirable that these solutions generate dense, robust disparity maps in real time. However, occlusion regions may disturb the applications that need these maps. Among the best of these approaches is the semi-global matching (SGM) technique. This paper presents an FPGA-based stereo vision system based on SGM. This system calculates disparity maps by streaming, which are scalable to several resolutions and disparity ranges. To increase the robustness of the SGM technique even further, the present work has implemented a combination of the gradient filter and the sampling-insensitive absolute difference in the pre-processing phase. Furthermore, as a post-processing step, this paper proposes a novel streaming architecture to detect noisy and occluded regions. The FPGA-based implementations of the proposed stereo matching system in two distinct heterogeneous architecture (GPP-general purpose processor, and FPGA) were evaluated using the Middlebury stereo vision benchmark. The achieved results reported a frame rate of 25 FPS for the disparity maps processing in HD resolution (1024 × 768 pixels), with 256 disparity levels. The results have demonstrated that the memory utilization, processing performance, and accuracy are among the best of FPGA-based stereo vision systems.
IFAC Proceedings Volumes, 1998
This work presents a method to compute the mutual exclusion degree of processes and of pairs of p... more This work presents a method to compute the mutual exclusion degree of processes and of pairs of processes in order to perform hardware/software partitioning. Considering that our partitioning approach allows for the use of multiple software components , several aspects of multiprocessors systems have to be considered. One of the main problems in systems with multiple processors is the throughput degradation caused by the saturation effects. Most allocation approaches take into account criteria such as interprocessor communication and workload between processors. Another very important aspect which should be considered is the mutual exclusion degree between processes. Mutually exclusive processes are less likely to reduce performance when assigned to one single processor. However, even mutually exclusive processes could allow for distinct levels of concurrency.
Oolong: A Baseband processor extension to the RISC-V ISA
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
RISC-V is an open-source instruction set-architecture, designed to support customized extensions ... more RISC-V is an open-source instruction set-architecture, designed to support customized extensions and architectures. This paper presents an instruction-set extension to the RISC-V ISA, idealized for software-defined radio applications. The custom instructions perform complex-number arithmetic, tailored for complex or quadrature modulation and baseband processing, and can perform one complex multiply-accumulate per cycle. The proposed system architecture includes the processor core, a WISHBONE bus interconnection, IO and peripherals, and was targeted to an Altera Cyclone III FPGA, achieving 0.9 DMIPS/MHz without the use of any compiler optimizations.
Extreme Value Theory for Estimating Task Execution Time Bounds: A Careful Look
2016 28th Euromicro Conference on Real-Time Systems (ECRTS), 2016
Extreme Value Theory (EVT) is a powerful statistical framework for estimating maximum values of r... more Extreme Value Theory (EVT) is a powerful statistical framework for estimating maximum values of random variables and has recently been applied for deriving probabilistic bounds on task execution times (pWCET). Task execution time data are collected from measurements and the maximum measured values are fit to an extreme value model. In this paper we provide a careful study on the applicability and effectiveness of EVT in this application field. The study is based on extensive experiments for which we have designed an embedded platform equipped with random cache of configurable sizes. Based on evidences of the experiments, we provide the following contributions: we give a new definition of pWCET that conforms with the fact that pWCET estimates depend on input data distribution used during analysis, we show that using the Generalized Extreme Value (GEV) distribution is necessary since the more restrictive modeling, based on the Gumbel distribution, may yield unsafe or over-estimated values of pWCET, we confirm that hardware randomization favors the applicability of EVT, although it does not ensure it since the distribution of maxima for execution time data are not guaranteed to be analyzable via EVT.
A Method for Partitioning UNITY
In this paper we introduce a method to partition UNITY system specifications into software and ha... more In this paper we introduce a method to partition UNITY system specifications into software and hardware parts. This method considers different design possibilities and defines cost functions to find out the most suitable one under given design constraint in terms of hardware-softwaretrade-off.
Methods based on Petri net for resource sharing estimation
Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843), 2000
Abstract This work presents two approaches for computing the number of functional units in hardwa... more Abstract This work presents two approaches for computing the number of functional units in hardware/software codesign context. The proposed hardware/software codesign framework uses Petri net as common formalism for performing quantitative and qualitative analysis. ...
A Petri net based approach for performing the initial allocation in hardware/software codesign
SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218)
This work presents a method of hardware/software partitioning considering multiple software compo... more This work presents a method of hardware/software partitioning considering multiple software components. The proposed method uses Petri nets as a common for-malism to perform quantitative and qualitative analysis. The use of Petri net permits one to use a specification non-...
Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor
Proceedings of the 27th Symposium on Integrated Circuits and Systems Design, 2014
The performance gap between memory hierarchy and processor is a well-known issue and the prefetch... more The performance gap between memory hierarchy and processor is a well-known issue and the prefetching approach is often used to minimize this problem. This technique performs a data prefetch in memory and makes it available in the private cache before its request. Thus, as more prefetching transactions are performed (very aggressive prefetching aggressiveness), the miss rate in the first levels of cache tends to be lower. However, a very aggressive prefetching can cause cache pollution, increase network traffic, and thereby degrade the system performance. In a multiprocessors platform, the prefetching of a core could interfere on the operation of others cores since they share resources, such as memory and network bandwidth. A very aggressive prefetching of a core can overload the network connection, increasing the communication which delays the network requests, increasing the penalty in the processor. In this context, this paper presents a Balanced Prefetching Aggressiveness Controller for a multiprocessor platform that minimizes the processor penalty. We tested the proposed controller in a network-based multiprocessor based on the Sparc V8. The results show a reduction of up to 23% and 7% on average in the processor's penalty, 34% in the cache pollution on average, and increase of 30% on prefetching accuracy for concurrent applications when compared to a system with fixed prefetching aggressiveness approach.
Towards more reliable embedded systems through a mechanism for monitoring driver devices communication
Fifteenth International Symposium on Quality Electronic Design, 2014
Embedded systems require even more flexibility. Several system permits on-the-market software upd... more Embedded systems require even more flexibility. Several system permits on-the-market software updates. However these updates must be reliable, otherwise, the results can be catastrophic. Device drivers may have any updates and they are very vulnerable to this problem, requiring mechanisms that are able to capture errors arising from updates at runtime. This work proposes an approach for runtime errors checking in the driver when accessing the device, allowing detecting bugs throughout the lifetime of the embedded system. The proposed mechanism for capturing errors is composed of two modules: Monitor of Device/Driver Communication (MDDC) and the set of FSM. Both modules can be synthesized from a device description at a high-level of abstraction. When connected to a hardware platform in a FPGA they are able to check if the driver operations will lead to a correct state of operation. Thus, the designer can be sure that the updated device driver is reliable. To validate the technique, drivers for an Ethernet controller and a serial port were developed. Results show the effectiveness in finding device driver errors during runtime, as well low MDDC's overhead about 1.5% in terms of performance and footprint area of the system FPGA prototype.
Genetic Programming - New Approaches and Successful Applications, 2012
Brazilian symposium on computing system engineering
ACM SIGOPS Operating Systems Review, 2012
ABSTRACT The paper proposes microsharding, a relational alternative for the recent procedural app... more ABSTRACT The paper proposes microsharding, a relational alternative for the recent procedural approaches with large-scale data stores to support OLTP workloads elastically. It employs a declarative specification, called transaction classes, of constraints applied ...
Aquarius
Proceedings of the 20th annual conference on Integrated circuits and systems design - SBCCI '07, 2007
This work presents the development of a dynamically reconfigurable computing platform targeting V... more This work presents the development of a dynamically reconfigurable computing platform targeting Virtex-II devices under control of a host system based on the Nios II soft-core processor from Altera. The platform, called Aquarius, controls the system by means of the μCLinux embedded Operating System (OS). Through this OS, an IP-SelectMAP core and its device driver, a Virtex-II FPGA device can
Learning and Design of Computational Systems: Integrating the CompSim Simulator to FPGAs
2022 Congreso de Tecnología, Aprendizaje y Enseñanza de la Electrónica (XV Technologies Applied to Electronics Teaching Conference), Jun 29, 2022
Sensors
Pedestrian detection (PD) systems capable of locating pedestrians over large distances and locati... more Pedestrian detection (PD) systems capable of locating pedestrians over large distances and locating them faster are needed in Pedestrian Collision Prediction (PCP) systems to increase the decision-making distance. This paper proposes a performance-optimized FPGA implementation of a HOG-SVM-based PD system with support for image pyramids and detection windows of different sizes to locate near and far pedestrians. This work proposes a hardware architecture that can process one pixel per clock cycle by exploring data and temporal parallelism using techniques such as pipeline and spatial division of data between parallel processing units. The proposed architecture for the PD module was validated in FPGA and integrated with the stereo semi-global matching (SGM) module, also prototyped in FPGA. Processing two windows of different dimensions permitted a reduction in miss rate of at least 6% compared to a uniquely sized window detector. The performances achieved by the PD system and the PCP...
An FPGA-Based RFID Baseband Processor Using a RISC-V Platform
2018 31st Symposium on Integrated Circuits and Systems Design (SBCCI), 2018
Modern applications involving communication systems require the fulfillment of several requiremen... more Modern applications involving communication systems require the fulfillment of several requirements regarding energy consumption, physical area, operational frequency, cost of production and others. Among the technologies used for communication, RFID (Radio Frequency Identification) is a significant and recurring approach to perform identification and control in industrial and commercial applications. The processor architecture chosen to compose such systems must be capable of handling with the real-time computing effort needed to perform the tasks of digital signal processing required to execute RFID applications. This work proposes a combined platform composed by a processor based on the RISC-V Open Instruction Set Architecture and dedicated hardware to perform the tasks of baseband signal processing required for RFID transmission. The RTL system specification was described in VHDL and SystemVerilog languages and prototyped in a Cyclone III FPGA device.
A hardware accelerator for the alignment of multiple DNA sequences in forensic identification
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
The comparison of DNA sequences is a classic problem in molecular biology. Forensic applications ... more The comparison of DNA sequences is a classic problem in molecular biology. Forensic applications uses this comparison for personal identication. For instance, in the USA, the CODES system has today 14.9 million DNA proles stored on its database. To accelerate the recurrent task to query into similar databases, this work presents a hardware acclerator for the parallel alignment of multiple DNA sequences, aiming for the maximum throughput. The proposed accelerator architecture optimizes the use of hardware resources, the data access strategy and, as a result, memory bandwidth. The experiments were conducted using a DNA database with 8 million individuals, in which, each of them is represented using a set of 15 sequences with a length of 256 nucleotides. In this case study, a prototype of the proposed hardware accelerator using a single Stratix IV FPGA and running at the frequency of 250MHz outperforms by tens of times consolidated software applications like SWIPE and FASTA which are running in a GPP platform, as well as an optimized GPU implementation in OpenCL.
Towards better generalization in WLAN positioning systems with genetic algorithms and neural networks
Proceedings of the Genetic and Evolutionary Computation Conference, 2019
The most widely used positioning system today is the GPS (Global Positioning System), which has m... more The most widely used positioning system today is the GPS (Global Positioning System), which has many commercial, civil and military applications, being present in most smartphones. However, this system does not perform well in indoor locations, which poses a constraint for the positioning task on environments like shopping malls, office buildings, and other public places. In this context, WLAN positioning systems based on fingerprinting have attracted a lot of attention as a promising approach for indoor localization while using the existing infrastructure. This paper contributes to this field by presenting a methodology for developing WLAN positioning systems using genetic algorithms and neural networks. The fitness function of the genetic algorithm is based on the generalization capabilities of the network for test points that are not included in the training set. By using this approach, we have achieved state-of-the-art results with few parameters, and our method has shown to be less prone to overfitting than other techniques in the literature, showing better generalization in points that are not recorded on the radio map.
Porosity features extraction based on image segmentation technique applying k-means clustering algorithm
Occupancy Grid Map Estimation Based on Visual SLAM and Ground Segmentation
2021 Latin American Robotics Symposium (LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021 Workshop on Robotics in Education (WRE), 2021
Feature-based SLAM is efficient, fast, and can offer an accurate localization system; on the othe... more Feature-based SLAM is efficient, fast, and can offer an accurate localization system; on the other hand, the map produced is a sparse representation of the environment, limiting path planning activities and reducing robotic autonomy. We extend this mapping stage to build an occupancy grid map given the sparse point cloud. Our method uses the pose estimation from the SLAM system, its sparse map, and an image segmentation technique. Tests made in synthetic and real-world environments demonstrate maps with high precision and excellent coverage. Furthermore, the application can run in conjunction with the SLAM system in real-time while requiring a low memory footprint. Finally, the map generated represents high-level information that allows a link between a feature-based SLAM and navigation tasks.
Design Automation for Embedded Systems, 2019
A system to automatically recognize vehicle license plates is a growing need to improve safety an... more A system to automatically recognize vehicle license plates is a growing need to improve safety and traffic control, specifically in major urban centers. However, the license plate recognition task is generally computationally intensive, where the entire input image frame is scanned, the found plates are segmented, and character recognition is then performed for each segmented character. This paper presents a methodology for engineering a system to detect and recognize Brazilian license plates using convolutional neural networks (CNN) that is suitable for embedded systems. The resulting system detects license plates in the captured image using Tiny YOLOv3 architecture and identifies its characters using a second convolutional network trained on synthetic images and fine-tuned with real license plate images. The proposed architecture has demonstrated to be robust to angle, lightning, and noise variations while requiring a single forward pass for each network, therefore allowing faster processing compared to other deep learning approaches. Our methodology was validated using real license plate images under different environmental conditions reached a detection rate of 99.37% and an overall recognition rate of 98.43% while showing an average time of 2.70 s to process 1024 × 768 images with a single license plate in a Raspberry Pi3 (ARM Cortex-A53 CPU). To improve the recognition accuracy, an ensemble of CNN models was tested instead of a single CNN model, which resulted in an increase in the average processing time to 4.88 s for each image while increasing the recognition rate to 99.53%. Finally, we discuss the impact of using an ensemble of CNNs considering the accuracy-performance trade-off when engineering embedded systems for license plate recognition.
Journal of Real-Time Image Processing, 2019
Stereo matching approaches are an appealing choice for acquiring depth information in a number of... more Stereo matching approaches are an appealing choice for acquiring depth information in a number of video processing applications. It is desirable that these solutions generate dense, robust disparity maps in real time. However, occlusion regions may disturb the applications that need these maps. Among the best of these approaches is the semi-global matching (SGM) technique. This paper presents an FPGA-based stereo vision system based on SGM. This system calculates disparity maps by streaming, which are scalable to several resolutions and disparity ranges. To increase the robustness of the SGM technique even further, the present work has implemented a combination of the gradient filter and the sampling-insensitive absolute difference in the pre-processing phase. Furthermore, as a post-processing step, this paper proposes a novel streaming architecture to detect noisy and occluded regions. The FPGA-based implementations of the proposed stereo matching system in two distinct heterogeneous architecture (GPP-general purpose processor, and FPGA) were evaluated using the Middlebury stereo vision benchmark. The achieved results reported a frame rate of 25 FPS for the disparity maps processing in HD resolution (1024 × 768 pixels), with 256 disparity levels. The results have demonstrated that the memory utilization, processing performance, and accuracy are among the best of FPGA-based stereo vision systems.
IFAC Proceedings Volumes, 1998
This work presents a method to compute the mutual exclusion degree of processes and of pairs of p... more This work presents a method to compute the mutual exclusion degree of processes and of pairs of processes in order to perform hardware/software partitioning. Considering that our partitioning approach allows for the use of multiple software components , several aspects of multiprocessors systems have to be considered. One of the main problems in systems with multiple processors is the throughput degradation caused by the saturation effects. Most allocation approaches take into account criteria such as interprocessor communication and workload between processors. Another very important aspect which should be considered is the mutual exclusion degree between processes. Mutually exclusive processes are less likely to reduce performance when assigned to one single processor. However, even mutually exclusive processes could allow for distinct levels of concurrency.
Oolong: A Baseband processor extension to the RISC-V ISA
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
RISC-V is an open-source instruction set-architecture, designed to support customized extensions ... more RISC-V is an open-source instruction set-architecture, designed to support customized extensions and architectures. This paper presents an instruction-set extension to the RISC-V ISA, idealized for software-defined radio applications. The custom instructions perform complex-number arithmetic, tailored for complex or quadrature modulation and baseband processing, and can perform one complex multiply-accumulate per cycle. The proposed system architecture includes the processor core, a WISHBONE bus interconnection, IO and peripherals, and was targeted to an Altera Cyclone III FPGA, achieving 0.9 DMIPS/MHz without the use of any compiler optimizations.
Extreme Value Theory for Estimating Task Execution Time Bounds: A Careful Look
2016 28th Euromicro Conference on Real-Time Systems (ECRTS), 2016
Extreme Value Theory (EVT) is a powerful statistical framework for estimating maximum values of r... more Extreme Value Theory (EVT) is a powerful statistical framework for estimating maximum values of random variables and has recently been applied for deriving probabilistic bounds on task execution times (pWCET). Task execution time data are collected from measurements and the maximum measured values are fit to an extreme value model. In this paper we provide a careful study on the applicability and effectiveness of EVT in this application field. The study is based on extensive experiments for which we have designed an embedded platform equipped with random cache of configurable sizes. Based on evidences of the experiments, we provide the following contributions: we give a new definition of pWCET that conforms with the fact that pWCET estimates depend on input data distribution used during analysis, we show that using the Generalized Extreme Value (GEV) distribution is necessary since the more restrictive modeling, based on the Gumbel distribution, may yield unsafe or over-estimated values of pWCET, we confirm that hardware randomization favors the applicability of EVT, although it does not ensure it since the distribution of maxima for execution time data are not guaranteed to be analyzable via EVT.
A Method for Partitioning UNITY
In this paper we introduce a method to partition UNITY system specifications into software and ha... more In this paper we introduce a method to partition UNITY system specifications into software and hardware parts. This method considers different design possibilities and defines cost functions to find out the most suitable one under given design constraint in terms of hardware-softwaretrade-off.
Methods based on Petri net for resource sharing estimation
Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843), 2000
Abstract This work presents two approaches for computing the number of functional units in hardwa... more Abstract This work presents two approaches for computing the number of functional units in hardware/software codesign context. The proposed hardware/software codesign framework uses Petri net as common formalism for performing quantitative and qualitative analysis. ...
A Petri net based approach for performing the initial allocation in hardware/software codesign
SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218)
This work presents a method of hardware/software partitioning considering multiple software compo... more This work presents a method of hardware/software partitioning considering multiple software components. The proposed method uses Petri nets as a common for-malism to perform quantitative and qualitative analysis. The use of Petri net permits one to use a specification non-...
Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor
Proceedings of the 27th Symposium on Integrated Circuits and Systems Design, 2014
The performance gap between memory hierarchy and processor is a well-known issue and the prefetch... more The performance gap between memory hierarchy and processor is a well-known issue and the prefetching approach is often used to minimize this problem. This technique performs a data prefetch in memory and makes it available in the private cache before its request. Thus, as more prefetching transactions are performed (very aggressive prefetching aggressiveness), the miss rate in the first levels of cache tends to be lower. However, a very aggressive prefetching can cause cache pollution, increase network traffic, and thereby degrade the system performance. In a multiprocessors platform, the prefetching of a core could interfere on the operation of others cores since they share resources, such as memory and network bandwidth. A very aggressive prefetching of a core can overload the network connection, increasing the communication which delays the network requests, increasing the penalty in the processor. In this context, this paper presents a Balanced Prefetching Aggressiveness Controller for a multiprocessor platform that minimizes the processor penalty. We tested the proposed controller in a network-based multiprocessor based on the Sparc V8. The results show a reduction of up to 23% and 7% on average in the processor's penalty, 34% in the cache pollution on average, and increase of 30% on prefetching accuracy for concurrent applications when compared to a system with fixed prefetching aggressiveness approach.
Towards more reliable embedded systems through a mechanism for monitoring driver devices communication
Fifteenth International Symposium on Quality Electronic Design, 2014
Embedded systems require even more flexibility. Several system permits on-the-market software upd... more Embedded systems require even more flexibility. Several system permits on-the-market software updates. However these updates must be reliable, otherwise, the results can be catastrophic. Device drivers may have any updates and they are very vulnerable to this problem, requiring mechanisms that are able to capture errors arising from updates at runtime. This work proposes an approach for runtime errors checking in the driver when accessing the device, allowing detecting bugs throughout the lifetime of the embedded system. The proposed mechanism for capturing errors is composed of two modules: Monitor of Device/Driver Communication (MDDC) and the set of FSM. Both modules can be synthesized from a device description at a high-level of abstraction. When connected to a hardware platform in a FPGA they are able to check if the driver operations will lead to a correct state of operation. Thus, the designer can be sure that the updated device driver is reliable. To validate the technique, drivers for an Ethernet controller and a serial port were developed. Results show the effectiveness in finding device driver errors during runtime, as well low MDDC's overhead about 1.5% in terms of performance and footprint area of the system FPGA prototype.
Genetic Programming - New Approaches and Successful Applications, 2012
Brazilian symposium on computing system engineering
ACM SIGOPS Operating Systems Review, 2012
ABSTRACT The paper proposes microsharding, a relational alternative for the recent procedural app... more ABSTRACT The paper proposes microsharding, a relational alternative for the recent procedural approaches with large-scale data stores to support OLTP workloads elastically. It employs a declarative specification, called transaction classes, of constraints applied ...
Aquarius
Proceedings of the 20th annual conference on Integrated circuits and systems design - SBCCI '07, 2007
This work presents the development of a dynamically reconfigurable computing platform targeting V... more This work presents the development of a dynamically reconfigurable computing platform targeting Virtex-II devices under control of a host system based on the Nios II soft-core processor from Altera. The platform, called Aquarius, controls the system by means of the μCLinux embedded Operating System (OS). Through this OS, an IP-SelectMAP core and its device driver, a Virtex-II FPGA device can