Lawrence Stewart | Massachusetts Institute of Technology (MIT) (original) (raw)
Uploads
Papers by Lawrence Stewart
2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021
ArXiv, 2020
3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallel... more 3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to {\boldmath 1283128^31283}. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory accesses and communications. Our implementation outperforms GPUs for smaller FFTs, even without distribution. For {\boldmath$32^3$} we achieve 4.4 microseconds on a single FPGA, similar to Anton 1 on 512 nodes. For 8 parallel pipelines (hardware limited), we reach the same performance both locally and distributed, showing that communications are not limiting the performance. Our FFT implementation is designed to be part of the electrostatic force pipeline of a scalable MD engin
Digital Technical Journal, 1992
IEEE Transactions on Communications, 1982
New algorithms for the design of trellis encoding data compression systems are described. The mai... more New algorithms for the design of trellis encoding data compression systems are described. The main algorithm uses a training sequence of actual data from a source to improve an initial trellis decoder. An additional algorithm extends the constraint length of a given decoder. Combined, these algorithms allow the automatic design of a trellis encoding system for a particular source. The
Eprint Arxiv 1101 1932, Jan 10, 2011
Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates f... more Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.
2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021
ArXiv, 2020
3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallel... more 3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to {\boldmath 1283128^31283}. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory accesses and communications. Our implementation outperforms GPUs for smaller FFTs, even without distribution. For {\boldmath$32^3$} we achieve 4.4 microseconds on a single FPGA, similar to Anton 1 on 512 nodes. For 8 parallel pipelines (hardware limited), we reach the same performance both locally and distributed, showing that communications are not limiting the performance. Our FFT implementation is designed to be part of the electrostatic force pipeline of a scalable MD engin
Digital Technical Journal, 1992
IEEE Transactions on Communications, 1982
New algorithms for the design of trellis encoding data compression systems are described. The mai... more New algorithms for the design of trellis encoding data compression systems are described. The main algorithm uses a training sequence of actual data from a source to improve an initial trellis decoder. An additional algorithm extends the constraint length of a given decoder. Combined, these algorithms allow the automatic design of a trellis encoding system for a particular source. The
Eprint Arxiv 1101 1932, Jan 10, 2011
Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates f... more Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.