Miriam Leeser | Northeastern University (original) (raw)

Papers by Miriam Leeser

Research paper thumbnail of Verifying a logic synthesis tool in Nuprl

Computer Aided Verification, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Effect of data truncation in an implementation of pixel clustering on a custom computing machine</title>

Reconfigurable Technology: FPGAs for Computing and Applications II, 2000

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Algorithmic transformations in the implementation of K- means clustering on reconfigurable hardware

Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays - FPGA '01, 2001

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Rothko: A three dimensional FPGA architecture, its fabrication, and design tools

Lecture Notes in Computer Science, 1997

Bookmarks Related papers MentionsView impact

Research paper thumbnail of From programs to transistors: Verifying hardware synthesis tools

Lecture Notes in Computer Science, 1990

We describe a project for synthesizing circuits from a high-level language description. The aims ... more We describe a project for synthesizing circuits from a high-level language description. The aims of this project are to guarantee the correctness of the resulting designs while allowing the designer flexibility in interacting with the system. In this paper we discuss two components of the project. The first starts with a state transition system and generates a specification of a datapath and an implementation of a controller as a microcode ROM. The second generates correct CMOS implementations of boolean expressions. This component produces highly optimized circuits which contain transmission gates as well as series and parallel networks of transistors. These two components are part of a larger goal: to go from programs to transistors with a flexible, yet guaranteed correct system.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Efficient FPGA implementation of qr decomposition using a systolic array architecture

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays - FPGA '08, 2008

QR decomposition is used in many signal processing applications. We have implemented a systolic a... more QR decomposition is used in many signal processing applications. We have implemented a systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. It uses a truly two dimensional systolic array architecture so latency scales well for large matrices. To accommodate the dynamic range of input data, floating-point arithmetic is chosen, using the Northeastern University Variable Precision Floating-Point (VFloat) library. We support any general floating-point format including IEEE single precision. Our design uses straightforward floating-point divide and square root implementations, compared to prior work which uses special operations or formats such as CORDIC or the logarithmic number system (LNS). This makes our design more standard and portable to different systems, thus easier to fit into a larger design. We support square, tall and short matrices. The input matrix size can be configured at compile-time to virtually any size. Therefore, it can be easily scaled to future larger FPGA devices, or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130 MHz for IEEE single precision floating-point format. 35 GFlops throughput peak performance is achieved for a 12 by 12 matrix with this implementation

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Implementing a Highly Parameterized Digital PIV System on Reconfigurable Hardware

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-2, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Methodology for Reusable Hardware Proofs

Higher Order Logic Theorem Proving and its Applications, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Verifying a logic synthesis tool in Nuprl: A case study in software verification

Lecture Notes in Computer Science, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Reasoning about pipelines with structural hazards

Lecture Notes in Computer Science, 1995

We have developed a formal definition of correctness for pipelines that ensures that transactions... more We have developed a formal definition of correctness for pipelines that ensures that transactions terminate and satisfy a functional specification. This definition separates the correctness criteria associated with the pipelining aspects of a design from the functional relationship between input and output transactions. Using this definition, we developed and formally verified a technique that divides the verification of a pipeline

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Automatic Sliding Window Operation Optimization for FPGA-Based

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006

FPGA-based computing boards are frequently used as hardware accelerators for image processing alg... more FPGA-based computing boards are frequently used as hardware accelerators for image processing algorithms based on sliding window operations (SWOs). SWOs are both computationally intensive and data intensive and benefit from hardware acceleration with FPGAs, especially for delay sensitive applications. The current design process requires that, for each specific application using SWOs with different size of window, image, etc.; a detail

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Design issues for hardware implementation of an algorithm for segmenting hyperspectral imagery</title>

Imaging Spectrometry VI, 2000

ABSTRACT Modern hyperspectral imagers can produce data cubes with hundreds of spectral channels a... more ABSTRACT Modern hyperspectral imagers can produce data cubes with hundreds of spectral channels and millions of pixels. One way to cope with this massive volume is to organize the data so that pixels with similar spectral content are clustered together in the same category. This provides both a compression of the data and a segmentation of the image that can be useful for other image processing tasks downstream. The classic approach for segmentation of multidimensional data is the k-means algorithm; this is an iterative method that produces successively better segmentations. It is a simple algorithm, but the computational expense can be considerable, particularly for clustering large hyperspectral images into many categories. The ASAPP (Accelerating Segmentation And Pixel Purity) project aims to relieve this processing bottleneck by putting the k-means algorithm into eld-programmable gate array (FPGA) hardware. The standard software implementation of k-means uses oating-point arithmetic and...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of CUDA and OpenCL implementations of 3D CT reconstruction for biomedical imaging

2012 IEEE Conference on High Performance Extreme Computing, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Library of Parameterized Floating-Point Modules and Their Use

Lecture Notes in Computer Science, 2002

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Enabling a RealTime Solution for Neuron Detection with Reconfigurable Hardware (abstract only)

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays - FPGA '05, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Real-Time Particle Image Velocimetry for Feedback Loops Using FPGA Implementation

Journal of Aerospace Computing, Information, and Communication, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Accelerating protein coordinate conversion using GPUs

2014 IEEE High Performance Extreme Computing Conference (HPEC), 2014

ABSTRACT For modeling proteins in conformational states, two methods of representation are used: ... more ABSTRACT For modeling proteins in conformational states, two methods of representation are used: internal coordinates and Cartesian coordinates. Each of these representations contain a large amount of structural and simulation information. Different processing steps require one or the other representation. Our goal is to rapidly translate between these coordinate spaces so that a scientist can choose whichever method he or she would like independent of the coordinate representation required. An algorithm to convert Cartesian to internal coordinates is implemented by taking a protein structure file and the trajectories of protein&#39;s atoms within a time frame. The implementation then computes bond distances, bond angles and torsion angles of the atoms. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU sequential codes in MATLAB and C are compared with MATLAB Parallel Computing Toolbox, OpenMP, and GPU versions in CUDA-C and CUDA-MATLAB. The performance is evaluated on two different protein structure files and their trajectories. Our results show that this computation is well suited to the parallelism offered in modern Graphics Processing Units. We see many orders of magnitude improvement in speed over the original MATLAB code and have brought the computation time from over an hour down to tens of milliseconds.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Toward a super duper hardware tactic

Lecture Notes in Computer Science, 1994

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Heterogeneous tasks and conduits framework for rapid application portability and deployment

2012 Innovative Parallel Computing (InPar), 2012

ABSTRACT Emerging heterogeneous and homogeneous processing architectures demonstrate significant ... more ABSTRACT Emerging heterogeneous and homogeneous processing architectures demonstrate significant increases in throughput for scientific applications over traditional single core processors. Each of these processing architectures vary widely in their processing capabilities, memory hierarchies, and programming models. Determining the system architecture best suited to an application or deploying an application that is portable across a number of different platforms is increasingly complex and error prone within this rapidly increasing and evolving design space. Quickly and easily designing portable, high-performance applications that can function and maintain their correctness properly across these widely varied systems has become paramount. To deal with these programming challenges, there is a great need for new models and tools to be developed. One example is MIT Lincoln Laboratory&#39;s Parallel Vector Tile Optimizing Library (PVTOL) which simplifies the task of developing software in C++ for these complex systems. This work extends the Tasks and Conduits framework in PVTOL to support GPU architectures and other heterogeneous platforms supported by the NVIDIA CUDA and OpenCL programming models. This allows the rapid portability of applications to a very wide range of architectures and clusters. Using this framework, porting applications from a single CPU core to a GPU requires a change of only 5 source lines of code (SLOC) in addition to the CUDA or OpenCL kernel. Using GPU-PVTOL we have achieved 22x speedup in an application of Monte Carlo simulations of photon propagation through a biological medium, and a 60x speedup of a 3D cone beam computed tomography (CT) image reconstruction algorithm.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Verifying a logic synthesis tool in Nuprl

Computer Aided Verification, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Effect of data truncation in an implementation of pixel clustering on a custom computing machine</title>

Reconfigurable Technology: FPGAs for Computing and Applications II, 2000

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Algorithmic transformations in the implementation of K- means clustering on reconfigurable hardware

Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays - FPGA '01, 2001

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Rothko: A three dimensional FPGA architecture, its fabrication, and design tools

Lecture Notes in Computer Science, 1997

Bookmarks Related papers MentionsView impact

Research paper thumbnail of From programs to transistors: Verifying hardware synthesis tools

Lecture Notes in Computer Science, 1990

We describe a project for synthesizing circuits from a high-level language description. The aims ... more We describe a project for synthesizing circuits from a high-level language description. The aims of this project are to guarantee the correctness of the resulting designs while allowing the designer flexibility in interacting with the system. In this paper we discuss two components of the project. The first starts with a state transition system and generates a specification of a datapath and an implementation of a controller as a microcode ROM. The second generates correct CMOS implementations of boolean expressions. This component produces highly optimized circuits which contain transmission gates as well as series and parallel networks of transistors. These two components are part of a larger goal: to go from programs to transistors with a flexible, yet guaranteed correct system.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Efficient FPGA implementation of qr decomposition using a systolic array architecture

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays - FPGA '08, 2008

QR decomposition is used in many signal processing applications. We have implemented a systolic a... more QR decomposition is used in many signal processing applications. We have implemented a systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. It uses a truly two dimensional systolic array architecture so latency scales well for large matrices. To accommodate the dynamic range of input data, floating-point arithmetic is chosen, using the Northeastern University Variable Precision Floating-Point (VFloat) library. We support any general floating-point format including IEEE single precision. Our design uses straightforward floating-point divide and square root implementations, compared to prior work which uses special operations or formats such as CORDIC or the logarithmic number system (LNS). This makes our design more standard and portable to different systems, thus easier to fit into a larger design. We support square, tall and short matrices. The input matrix size can be configured at compile-time to virtually any size. Therefore, it can be easily scaled to future larger FPGA devices, or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130 MHz for IEEE single precision floating-point format. 35 GFlops throughput peak performance is achieved for a 12 by 12 matrix with this implementation

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Implementing a Highly Parameterized Digital PIV System on Reconfigurable Hardware

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-2, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Methodology for Reusable Hardware Proofs

Higher Order Logic Theorem Proving and its Applications, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Verifying a logic synthesis tool in Nuprl: A case study in software verification

Lecture Notes in Computer Science, 1993

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Reasoning about pipelines with structural hazards

Lecture Notes in Computer Science, 1995

We have developed a formal definition of correctness for pipelines that ensures that transactions... more We have developed a formal definition of correctness for pipelines that ensures that transactions terminate and satisfy a functional specification. This definition separates the correctness criteria associated with the pipelining aspects of a design from the functional relationship between input and output transactions. Using this definition, we developed and formally verified a technique that divides the verification of a pipeline

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Automatic Sliding Window Operation Optimization for FPGA-Based

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006

FPGA-based computing boards are frequently used as hardware accelerators for image processing alg... more FPGA-based computing boards are frequently used as hardware accelerators for image processing algorithms based on sliding window operations (SWOs). SWOs are both computationally intensive and data intensive and benefit from hardware acceleration with FPGAs, especially for delay sensitive applications. The current design process requires that, for each specific application using SWOs with different size of window, image, etc.; a detail

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Design issues for hardware implementation of an algorithm for segmenting hyperspectral imagery</title>

Imaging Spectrometry VI, 2000

ABSTRACT Modern hyperspectral imagers can produce data cubes with hundreds of spectral channels a... more ABSTRACT Modern hyperspectral imagers can produce data cubes with hundreds of spectral channels and millions of pixels. One way to cope with this massive volume is to organize the data so that pixels with similar spectral content are clustered together in the same category. This provides both a compression of the data and a segmentation of the image that can be useful for other image processing tasks downstream. The classic approach for segmentation of multidimensional data is the k-means algorithm; this is an iterative method that produces successively better segmentations. It is a simple algorithm, but the computational expense can be considerable, particularly for clustering large hyperspectral images into many categories. The ASAPP (Accelerating Segmentation And Pixel Purity) project aims to relieve this processing bottleneck by putting the k-means algorithm into eld-programmable gate array (FPGA) hardware. The standard software implementation of k-means uses oating-point arithmetic and...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of CUDA and OpenCL implementations of 3D CT reconstruction for biomedical imaging

2012 IEEE Conference on High Performance Extreme Computing, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Library of Parameterized Floating-Point Modules and Their Use

Lecture Notes in Computer Science, 2002

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Enabling a RealTime Solution for Neuron Detection with Reconfigurable Hardware (abstract only)

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays - FPGA '05, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Real-Time Particle Image Velocimetry for Feedback Loops Using FPGA Implementation

Journal of Aerospace Computing, Information, and Communication, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Accelerating protein coordinate conversion using GPUs

2014 IEEE High Performance Extreme Computing Conference (HPEC), 2014

ABSTRACT For modeling proteins in conformational states, two methods of representation are used: ... more ABSTRACT For modeling proteins in conformational states, two methods of representation are used: internal coordinates and Cartesian coordinates. Each of these representations contain a large amount of structural and simulation information. Different processing steps require one or the other representation. Our goal is to rapidly translate between these coordinate spaces so that a scientist can choose whichever method he or she would like independent of the coordinate representation required. An algorithm to convert Cartesian to internal coordinates is implemented by taking a protein structure file and the trajectories of protein&#39;s atoms within a time frame. The implementation then computes bond distances, bond angles and torsion angles of the atoms. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU sequential codes in MATLAB and C are compared with MATLAB Parallel Computing Toolbox, OpenMP, and GPU versions in CUDA-C and CUDA-MATLAB. The performance is evaluated on two different protein structure files and their trajectories. Our results show that this computation is well suited to the parallelism offered in modern Graphics Processing Units. We see many orders of magnitude improvement in speed over the original MATLAB code and have brought the computation time from over an hour down to tens of milliseconds.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Toward a super duper hardware tactic

Lecture Notes in Computer Science, 1994

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Heterogeneous tasks and conduits framework for rapid application portability and deployment

2012 Innovative Parallel Computing (InPar), 2012

ABSTRACT Emerging heterogeneous and homogeneous processing architectures demonstrate significant ... more ABSTRACT Emerging heterogeneous and homogeneous processing architectures demonstrate significant increases in throughput for scientific applications over traditional single core processors. Each of these processing architectures vary widely in their processing capabilities, memory hierarchies, and programming models. Determining the system architecture best suited to an application or deploying an application that is portable across a number of different platforms is increasingly complex and error prone within this rapidly increasing and evolving design space. Quickly and easily designing portable, high-performance applications that can function and maintain their correctness properly across these widely varied systems has become paramount. To deal with these programming challenges, there is a great need for new models and tools to be developed. One example is MIT Lincoln Laboratory&#39;s Parallel Vector Tile Optimizing Library (PVTOL) which simplifies the task of developing software in C++ for these complex systems. This work extends the Tasks and Conduits framework in PVTOL to support GPU architectures and other heterogeneous platforms supported by the NVIDIA CUDA and OpenCL programming models. This allows the rapid portability of applications to a very wide range of architectures and clusters. Using this framework, porting applications from a single CPU core to a GPU requires a change of only 5 source lines of code (SLOC) in addition to the CUDA or OpenCL kernel. Using GPU-PVTOL we have achieved 22x speedup in an application of Monte Carlo simulations of photon propagation through a biological medium, and a 60x speedup of a 3D cone beam computed tomography (CT) image reconstruction algorithm.

Bookmarks Related papers MentionsView impact