Daniel Mozos - Profile on Academia.edu (original) (raw)

Papers by Daniel Mozos

IEEE journal of selected topics in applied earth observations and remote sensing, 2024

k-means stands out as one of the most common clustering algorithms, widely employed for classific... more k-means stands out as one of the most common clustering algorithms, widely employed for classification in hyperspectral imaging. In this context, large amounts of data are gathered by sensors that are embedded into satellites with strict constraints in terms of power consumption, weight, physical space or radiation tolerance. Since communication bandwidth is also limited, data processing must be performed on board. However, meeting all those constraints also entails a significant trade-off with computing performance. The aim of this work is clustering hyperspectral images in real-time. Custom hardware has been designed with the objective of reducing overhead and maximizing performance, by exploiting several acceleration techniques. The implementation targets a space-grade Xilinx Kintex FPGA, that features low power consumption and is shielded against radiation. The design has a deep pipelined architecture, able to process all bands of each hyperspectral pixel in parallel. In consequence, it attains a throughput of 100M hyperspectral pixels per second, even with a discrete use of FPGA resources. In addition, it is also fully parametric, with on-the-fly adaptation to different kinds of images and clustering configurations. Compared to previous implementations, ours takes advantage of a fully RTL design that avoids CPU bottlenecks and HLS design overheads. It also has a fixed throughput regardless of image or clustering properties, while having lower FPGA resource usage than performance-wise equivalent implementations.

An Extremely Pipelined FPGA Implementation of a Lossy Hyperspectral Image Compression Algorithm

IEEE Transactions on Geoscience and Remote Sensing, Oct 1, 2020

Segmented and pipelined execution has been a staple of computing for the past decades. Operations... more Segmented and pipelined execution has been a staple of computing for the past decades. Operations over different values can be carried out at the same time speeding up computations. Hyperspectral image compression sequentially processes samples, exploiting local redundancies to generate a predictable data stream that can be compressed. In this article, we take advantage of a low complexity predictive lossy compression algorithm which can be executed over an extremely long pipeline of hundreds of stages. We can avoid most stalls and maintain throughput close to the theoretical maximum. The different steps operate over integers with simple arithmetic operations, so they are especially well-suited for our FPGA implementation. Results on a Virtex-7 show a maximum frequency of over 300 MHz for a throughput of over 290 MB/s, with a space-qualified Virtex-5 reaching 258 MHz, being five times as fast as the previous FPGA designs. This shows that a modular pipelined approach is beneficial for these kinds of compression algorithms.

EURASIP Journal on Advances in Signal Processing, Jun 14, 2010

Hyperspectral imaging is a new emerging technology in remote sensing which generates hundreds of ... more Hyperspectral imaging is a new emerging technology in remote sensing which generates hundreds of images, at different wavelength channels, for the same area on the surface of the Earth. Over the last years, many algorithms have been developed with the purpose of finding endmembers, assumed to be pure spectral signatures in remotely sensed hyperspectral data sets. One of the most popular techniques has been the pixel purity index (PPI). This algorithm is very time-consuming. The reconfigurability, compact size, and high computational power of Field programmable gate arrays (FPGAs) make them particularly attractive for exploitation in remote sensing applications with (near) real-time requirements. In this paper, we present an FPGA design for implementation of the PPI algorithm. Our systolic array design includes a DMA and implements a prefetching technique to reduce the penalties due to the I/O communications. We have also included a hardware module for random number generation. The proposed method has been tested using real hyperspectral data collected by NASA's Airborne Visible Infrared Imaging Spectrometer over the Cuprite mining district in Nevada. Experimental results reveal that the proposed hardware system is easily scalable and able to provide accurate results with compact size in (near) real-time, which make our reconfigurable system appealing for on-board hyperspectral data processing.

The graph embedding (GE) methods have been widely applied for dimensionality reduction of hypersp... more The graph embedding (GE) methods have been widely applied for dimensionality reduction of hyperspectral imagery (HSI). However, a major challenge of GE is how to choose the proper neighbors for graph construction and explore the spatial information of HSI data. In this paper, we proposed an unsupervised dimensionality reduction algorithm called spatial-spectral manifold reconstruction preserving embedding (SSMRPE) for HSI classification. At first, a weighted mean filter (WMF) is employed to preprocess the image, which aims to reduce the influence of background noise. According to the spatial consistency property of HSI, SSMRPE utilizes a new spatial-spectral combined distance (SSCD) to fuse the spatial structure and spectral information for selecting effective spatial-spectral neighbors of HSI pixels. Then, it explores the spatial relationship between each point and its neighbors to adjust the reconstruction weights to improve the efficiency of manifold reconstruction. As a result, the proposed method can extract the discriminant features and subsequently improve the classification performance of HSI. The experimental results on the PaviaU and Salinas hyperspectral data sets indicate that SSMRPE can achieve better classification results in comparison with some state-of-the-art methods.

Fast processing solutions for compression and/or interpretation of hyperspectral data onboard spa... more Fast processing solutions for compression and/or interpretation of hyperspectral data onboard spacecraft imaging platforms are discussed in this paper with the purpose of giving a more efficient exploitation of hyperspectral data sets in various applications.

As the Codesign problems become more and more complex, characterizing the scheduling and allocati... more As the Codesign problems become more and more complex, characterizing the scheduling and allocation details of the tasks with macroscopic magnitudes easy to handle, can help to solve them in an efficient way.

Control del diseño en un sistema de síntesis de alto nivel

Este articulo describe el modulo de control del diseno del Sistema de Sintesis FIDIAS, el Experto... more Este articulo describe el modulo de control del diseno del Sistema de Sintesis FIDIAS, el Experto de Diseno. Este modulo es un sistema experto basado en reglas que controla el resto de las herramientas algoritmicas que componen el sistema. Se revisan sus caracteristicas principales, incidiendo fundamentalmente en los parametros de control, las estadisticas de caracterizacion, las funciones de evaluacion de los objetivos y las reglas que componen el experto. Finalmente se hace un breve repaso de los resultados mas significativos obtenidos.

IEEE Transactions on Geoscience and Remote Sensing, Feb 1, 2012

Hyperspectral remote sensing attempts to identify features in the surface of the Earth using sens... more Hyperspectral remote sensing attempts to identify features in the surface of the Earth using sensors that generally provide large amounts of data. The data are usually collected by a satellite or an airborne instrument and sent to a ground station that processes it. The main bottleneck of this approach is the (often reduced) bandwidth connection between the satellite and the station, which drastically limits the information that can be sent and processed in real time. A possible way to overcome this problem is to include onboard computing resources able to preprocess the data, reducing its size by orders of magnitude. Reconfigurable field-programmable gate arrays (FPGAs) are a promising platform that allows hardware/software codesign and the potential to provide powerful onboard computing capability and flexibility at the same time. Since FPGAs can implement custom hardware solutions, they can reach very high performance levels. Moreover, using run-time reconfiguration, the functionality of the FPGA can be updated at run time as many times as needed to perform different computations. Hence, the FPGA can be reused for several applications reducing the number of computing resources needed. One of the most popular and widely used techniques for analyzing hyperspectral data is linear spectral unmixing, which relies on the identification of pure spectral signatures via a so-called endmember extraction algorithm. In this paper, we present the first FPGA design for N-FINDR, a widely used endmember extraction algorithm in the literature. Our system includes a direct memory access module and implements a prefetching technique to hide the latency of the input/output communications. The proposed method has been implemented on a Virtex-4 XC4VFX60 FPGA (a model that is similar to radiation-hardened FPGAs certified for space operation) and tested using real hyperspectral data collected by NASA's Earth Observing-1 Hyperion (a satellite instrument) and the Airborne Visible Infra-Red Imaging Spectrometer over the Cuprite mining district in Nevada and the Jasper Ridge Biological Preserve in California. Experimental results demonstrate that our hardware version of the N-FINDR algorithm can significantly outperform an equivalent software version and is able to provide Manuscript

A real-time FPGA implementation of the CCSDS 123.0-B-2 standard

IEEE Transactions on Geoscience and Remote Sensing

Hyperspectral images are a useful remote sensing tool that often reaches hundreds of megabytes in... more Hyperspectral images are a useful remote sensing tool that often reaches hundreds of megabytes in size. The CCSDS 123.0-B-2 is a recent algorithm that achieves lossless and near-lossless compression of hyperspectral images by introducing a configurable maximum error over its predecessor CCSDS 123.0-B-1. In this article, a field-programmable gate array (FPGA) implementation of the revised standard that works in real-time is presented. We have developed an extremely pipelined and fast core in VHDL, that is able to process a sample per cycle at over 250 MHz, working eight times faster than in real time for the Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) sensor. New dependencies in the revised standard are avoided by using a novel sample ordering called frame interleaved by diagonal. The predictor stage has been designed to work in this order, and two reorder buffers encapsulate it to be band interleaved by pixel compliant. Predictor data are encoded using a novel FPGA implementation of the CCSDS 123.0-B-2 hybrid coder. The modules are tested and verified on a Virtex-7 VC709 board. For medium (256 bands <inline-formula> <tex-math notation="LaTeX">$\times4096$ </tex-math></inline-formula> frames <inline-formula> <tex-math notation="LaTeX">$\times512$ </tex-math></inline-formula> samples) and large (<inline-formula> <tex-math notation="LaTeX">$512\times 4096\times 1024$ </tex-math></inline-formula>) images, the core occupies, respectively, 14% and 50% of an XQRKU060 FPGA.

Integración del análisis y mejora de la testabilidad en una herramienta de SAN

En el presente trabajo, se expone un nuevo metodo de sintesis de circuitos testables que, debido ... more En el presente trabajo, se expone un nuevo metodo de sintesis de circuitos testables que, debido a la exploracion de un espacio de diseno que incluye area, tiempo y testabilidad simultaneamente, obtiene resultados testables con incrementos de area pequenos sobre los disenos no testables. Esto se logra no solo mediante la adicion de hardware BIST sino por la elevada testabilidad intrinseca de las estructuras generadas. Para justificar estas dos afirmaciones se comparan, de forma muy satisfactoria, los resultados obtenidos con nuestro sistema tanto con sistemas de sintesis no testables como con aquellos que incluyen la testabilidad en los disenos.

Un enfoque global al problema de la síntesis de alto nivel

Este trabajo presenta un algoritmo que realiza simultaneamente dos de las principales tareas de l... more Este trabajo presenta un algoritmo que realiza simultaneamente dos de las principales tareas de la Sintesis de Alto Nivel: Planificacion de operaciones y Asignacion de hardware. El algoritmo esta basado en un calculo, novedoso y exhaustivo, de las probabilidades asociadas a las distintas alternativas de planificacion y/o asignacion. En este calculo se tienen en cuenta consideraciones relativas a la estructura del grafo correspondiente al diseno que queremos sintetizar (solapamiento de operaciones, reusabilidad de hardware, etc.) y a la libreria de modulos usada como entrada. El resultado es un algoritmo sencillo y compacto, con una complejidad relativamente pequena, y que genera resultados satisfactorios.

The Journal of Supercomputing, 2018

We present a reliable and efficient FPGA implementation of a procedure for the computation of the... more We present a reliable and efficient FPGA implementation of a procedure for the computation of the noise estimation matrix, a key stage for subspace identification of hyperspectral images. Our hardware realization is based on numerically stable orthogonal transformations, avoids the numerical difficulties of the normal equations methods for the solution of linear least squares problems (LLS), and exploits the special relations between coupled LLS problems arising in the hyperspectral image. Our modular implementation decomposes the QR factorization that comprises a significant part of the cost into a sequence of suboperations, which can be efficiently computed on an FPGA.

FPGA implementation of the principal component analysis algorithm for dimensionality reduction of hyperspectral images

Journal of Real-Time Image Processing, 2016

Remotely sensed hyperspectral imaging is a very active research area, with numerous contributions... more Remotely sensed hyperspectral imaging is a very active research area, with numerous contributions in the recent scientific literature. The analysis of these images represents an extremely complex procedure from a computational point of view, mainly due to the high dimensionality of the data and the inherent complexity of the state-of-the-art algorithms for processing hyperspectral images. This computational cost represents a significant disadvantage in applications that require real-time response, such as fire tracing, prevention and monitoring of natural disasters, chemical spills, and other environmental pollution. Many of these algorithms consider, as one of their fundamental stages to fully process a hyperspectral image, a dimensionality reduction in order to remove noise and redundant information in the hyperspectral images under analysis. Therefore, it is possible to significantly reduce the size of the images, and hence, alleviate data storage requirements. However, this step is not exempt of computationally complex matrix operations, such as the computation of the eigenvalues and the eigenvectors of large and dense matrices. Hence, for the aforementioned applications in which prompt replies are mandatory, this dimensionality reduction must be considerably accelerated, typically through the utilization of high-performance computing platforms. For this purpose, reconfigurable hardware solutions such as field-programmable gate arrays have been consolidated during the last years as one of the standard choices for the fast processing of hyperspectral remotely sensed images due to their smaller size, weight and power consumption when compared with other high-performance computing systems. In this paper, we propose the implementation in reconfigurable hardware of the principal component analysis (PCA) algorithm to carry out the dimensionality reduction in hyperspectral images. Experimental results demonstrate that our hardware version of the PCA algorithm significantly outperforms a commercial software version, which makes our reconfigurable system appealing for onboard hyperspectral data processing. Furthermore, our implementation exhibits real-time performance with regard to the time that the targeted hyperspectral instrument takes to collect the image data.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018

Hyperspectral images taken by satellites pose a challenge for data transmission. Communication wi... more Hyperspectral images taken by satellites pose a challenge for data transmission. Communication with Earth's antennas is usually time restricted and bandwidth is very limited. The CCSDS 1.2.3 algorithm mitigates this issue by defining a lossless compression standard for this kind of data, allowing more efficient usage of the transmission link. Reconfigurable field-programmable gate arrays (FPGAs) are promising platforms that provide powerful on-board computing capabilities and flexibility at the same time. In this paper, we present an FPGA implementation for the CCSDS 1.2.3 algorithm. The proposed method has been implemented on the Virtex-4 XC2VFX60 FPGA (the commercial equivalent of the space-qualified Virtex-4QV XQR4VF60 FPGA) and on the Virtex-7 XC7VX690T, and tested using real hyperspectral data collected by NASA's airborne visible infra-red imaging spectrometer (AVIRIS) and two procedurally generated synthetic images. Our design, occupying a mere third of the Virtex-4 XC2VFX60 FPGA, has a very low power consumption and achieves real-time compression for hyperspectral imaging devices such as NASA's NG-AVIRIS. For this, we use the board's memory as a cache for input data, which allows us to process images as streams of data, completely eliminating storage needs. All these factors make it a great option for on-satellite compression.

Lecture Notes in Computer Science, 2003

Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability t... more Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability to adapt at run-time its performance and its energy consumption. However, due to the lack of programming support for dynamic task placement on these platforms, little previous work has been presented studying these run-time performance/power trade-offs. To cope with the task placement problem we have adopted an interconnection-network-based DRHW model with specific support for reallocating tasks at run-time. On top of it, we have applied an emerging task concurrency management (TCM) methodology previously applied to multiprocessor platforms. We have identified that the reconfiguration overhead can drastically affect both the system performance and energy consumption. Hence, we have developed two new modules for the TCM run-time scheduler that minimize these effects. The first module reuses previously loaded configurations, whereas the second minimizes the impact of the reconfiguration latency by applying a configuration prefetching technique. With these techniques reconfiguration overhead is reduced by a factor of 4.

11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003.

Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability t... more Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability to adapt at run-time its performance and its power consumption. However, due to the lack of programming support for dynamic task placement on these platforms, no previous work has been presented studying the performance/power trade-offs. To cope with the task placement problem in a straight way that allows us to go one step further, we have adopted an interconnection-network-based DRHW mode, which includes Operating System support to reallocate tasks at run-time. On top of this model we have applied an emerging task concurrency management (TCM) methodology initially developed for multiprocessor platforms with promising results. Moreover, we have identified the next step needed to create a specific TCM support for DRHW platforms.

A Hardware/Software Partitioning and Scheduling Approach for Embedded Systems with Low-Power and High Performance Requirements

Lecture Notes in Computer Science, 2003

ABSTRACT Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and... more ABSTRACT Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power consumption. Most of the previous hw/sw parti- tioning approaches are focused on either optimising the hw area, or the per- formance. Thus, they ignore the influence of the partitioning process on the en- ergy consumption. However, during this process the designer still has the maximum flexibility, hence, it is clearly the best moment to analyse the energy consumption. We have developed,a new hw/sw partitioning and scheduling tool that reduces the energy consumption ,of an ,embedded ,system ,while meeting high performance ,constraints. We have ,applied it to two current multimedia applications saving up to,30% of the,system energy without reducing the per- formance.

Electronics Letters, 2007

Data broadcasting has been recognised as a very effective data delivery mechanism in mobile compu... more Data broadcasting has been recognised as a very effective data delivery mechanism in mobile computing environments. An efficient scheduling algorithm needed for the delivery of the broadcast information is presented and evaluated, which performs efficiently when time and bandwidth restrictions are present.

A vertex-list approach to 2D HW multitasking management in RTR FPGAs

This paper presents a novel approach to the management of run-time reconfigurable resources by an... more This paper presents a novel approach to the management of run-time reconfigurable resources by an operating system with extended hardware multitasking functionality. Rectangular hardware tasks are placed at free locations in a two dimensional reconfigurable resource. Area management is done with techniques derived from bin-packing heuristics. A structure consisting of a set of vertex lists, each one describing the contour of each unoccupied area fragment in the reconfigurable device is presented. Some vertices of such structures may be used as candidate locations for the tasks, with bottomleft or top-right heuristics. We show that our approach has a reasonable complexity and gives better results, in terms of device fragmentation, than similar approaches.

Microprocessors and Microsystems, 2004

Current multimedia applications are characterized by highly dynamic and non-deterministic behavio... more Current multimedia applications are characterized by highly dynamic and non-deterministic behaviour as well as high-performance requirements. Potentially, partially reconfigurable fine-grain configurable architectures like FPGAs can be reconfigured at run-time to match the dynamic behaviour. However, the lack of programming support for dynamic task placement as well as the large configuration overhead has prevented their use for highly dynamic applications. To cope with these two problems, we have adopted an FPGA model with specific support for task allocation. On top of this model, we have applied an existing hybrid design-time/run-time scheduling flow initially developed for multiprocessor systems. Finally, we have extended this flow with specific modules that greatly reduce the reconfiguration overhead making it affordable for current multimedia applications.

IEEE journal of selected topics in applied earth observations and remote sensing, 2024

An Extremely Pipelined FPGA Implementation of a Lossy Hyperspectral Image Compression Algorithm

IEEE Transactions on Geoscience and Remote Sensing, Oct 1, 2020

EURASIP Journal on Advances in Signal Processing, Jun 14, 2010

Control del diseño en un sistema de síntesis de alto nivel

IEEE Transactions on Geoscience and Remote Sensing, Feb 1, 2012

A real-time FPGA implementation of the CCSDS 123.0-B-2 standard

IEEE Transactions on Geoscience and Remote Sensing

Integración del análisis y mejora de la testabilidad en una herramienta de SAN

Un enfoque global al problema de la síntesis de alto nivel

The Journal of Supercomputing, 2018

FPGA implementation of the principal component analysis algorithm for dimensionality reduction of hyperspectral images

Journal of Real-Time Image Processing, 2016

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018

Lecture Notes in Computer Science, 2003

11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003.

Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability t... more Dynamically Reconfigurable Hardware (DRHW) can take advantage of its reconfiguration capability to adapt at run-time its performance and its power consumption. However, due to the lack of programming support for dynamic task placement on these platforms, no previous work has been presented studying the performance/power trade-offs. To cope with the task placement problem in a straight way that allows us to go one step further, we have adopted an interconnection-network-based DRHW mode, which includes Operating System support to reallocate tasks at run-time. On top of this model we have applied an emerging task concurrency management (TCM) methodology initially developed for multiprocessor platforms with promising results. Moreover, we have identified the next step needed to create a specific TCM support for DRHW platforms.

A Hardware/Software Partitioning and Scheduling Approach for Embedded Systems with Low-Power and High Performance Requirements

Lecture Notes in Computer Science, 2003

Electronics Letters, 2007

A vertex-list approach to 2D HW multitasking management in RTR FPGAs

Microprocessors and Microsystems, 2004