Lawrence Stewart | Massachusetts Institute of Technology (MIT) (original) (raw)

Uploads

Papers by Lawrence Stewart

Research paper thumbnail of An Extension to HTTP : Digest Access Authentication

Research paper thumbnail of HTTP Authentication: Basic and Digest Access Authentication

Research paper thumbnail of Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021

Research paper thumbnail of An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs

ArXiv, 2020

3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallel... more 3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to {\boldmath 1283128^31283}. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory accesses and communications. Our implementation outperforms GPUs for smaller FFTs, even without distribution. For {\boldmath$32^3$} we achieve 4.4 microseconds on a single FPGA, similar to Anton 1 on 512 nodes. For 8 parallel pipelines (hardware limited), we reach the same performance both locally and distributed, showing that communications are not limiting the performance. Our FFT implementation is designed to be part of the electrostatic force pipeline of a scalable MD engin

Research paper thumbnail of The alpha demonstration unit: a high-performance multiprocessor for software and chip development

Digital Technical Journal, 1992

Research paper thumbnail of The design of trellis waveform coders

IEEE Transactions on Communications, 1982

New algorithms for the design of trellis encoding data compression systems are described. The mai... more New algorithms for the design of trellis encoding data compression systems are described. The main algorithm uses a training sequence of actual data from a source to improve an initial trellis decoder. An additional algorithm extends the constraint length of a given decoder. Combined, these algorithms allow the automatic design of a trellis encoding system for a particular source. The

Research paper thumbnail of Bright fiber-free biconically tapered couplers

Research paper thumbnail of AudioFile: A Network-Transparent System for Distributed Audio Applications

Research paper thumbnail of Are DSP Chips Obsolete?

Research paper thumbnail of Electronic commerce system for offer and acceptance negotiation with encryption

Research paper thumbnail of Statistical contention control for star configured communication networks

Research paper thumbnail of Payment for Open Networks

Research paper thumbnail of Web advertising method

Research paper thumbnail of Error-Correcting Code

Research paper thumbnail of Large scale multi-processor system with a link-level interconnect providing in-order packet delivery

Research paper thumbnail of Computer system and method using a kautz-like digraph to interconnect computer nodes and having control back channel between nodes

Research paper thumbnail of Method and system for counting web access requests

Research paper thumbnail of Remote DMA systems and methods for supporting synchronization of distributed processes in a multi-processor system using collective operations

Research paper thumbnail of Efficient tilings of de Bruijn and Kautz graphs

Eprint Arxiv 1101 1932, Jan 10, 2011

Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates f... more Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.

Research paper thumbnail of Network sales system

Research paper thumbnail of An Extension to HTTP : Digest Access Authentication

Research paper thumbnail of HTTP Authentication: Basic and Digest Access Authentication

Research paper thumbnail of Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021

Research paper thumbnail of An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs

ArXiv, 2020

3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallel... more 3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to {\boldmath 1283128^31283}. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory accesses and communications. Our implementation outperforms GPUs for smaller FFTs, even without distribution. For {\boldmath$32^3$} we achieve 4.4 microseconds on a single FPGA, similar to Anton 1 on 512 nodes. For 8 parallel pipelines (hardware limited), we reach the same performance both locally and distributed, showing that communications are not limiting the performance. Our FFT implementation is designed to be part of the electrostatic force pipeline of a scalable MD engin

Research paper thumbnail of The alpha demonstration unit: a high-performance multiprocessor for software and chip development

Digital Technical Journal, 1992

Research paper thumbnail of The design of trellis waveform coders

IEEE Transactions on Communications, 1982

New algorithms for the design of trellis encoding data compression systems are described. The mai... more New algorithms for the design of trellis encoding data compression systems are described. The main algorithm uses a training sequence of actual data from a source to improve an initial trellis decoder. An additional algorithm extends the constraint length of a given decoder. Combined, these algorithms allow the automatic design of a trellis encoding system for a particular source. The

Research paper thumbnail of Bright fiber-free biconically tapered couplers

Research paper thumbnail of AudioFile: A Network-Transparent System for Distributed Audio Applications

Research paper thumbnail of Are DSP Chips Obsolete?

Research paper thumbnail of Electronic commerce system for offer and acceptance negotiation with encryption

Research paper thumbnail of Statistical contention control for star configured communication networks

Research paper thumbnail of Payment for Open Networks

Research paper thumbnail of Web advertising method

Research paper thumbnail of Error-Correcting Code

Research paper thumbnail of Large scale multi-processor system with a link-level interconnect providing in-order packet delivery

Research paper thumbnail of Computer system and method using a kautz-like digraph to interconnect computer nodes and having control back channel between nodes

Research paper thumbnail of Method and system for counting web access requests

Research paper thumbnail of Remote DMA systems and methods for supporting synchronization of distributed processes in a multi-processor system using collective operations

Research paper thumbnail of Efficient tilings of de Bruijn and Kautz graphs

Eprint Arxiv 1101 1932, Jan 10, 2011

Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates f... more Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.

Research paper thumbnail of Network sales system

Log In