Kesheng Wu | Lawrence Berkeley National Laboratory (original) (raw)

Papers by Kesheng Wu

Research paper thumbnail of FastBit Reference Manual

Research paper thumbnail of Technical Highlights from the ExaHDF5 project

Doe Exascale Research Conference Portland or April 16 18 2012, Apr 21, 2014

Research paper thumbnail of Detecting atmospheric rivers in large climate datasets

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities - PDAC '11, 2011

Extreme precipitation events on the western coast of North America are often traced to an unusual... more Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Research paper thumbnail of Recent advances in visit: Amr streamlines and query-driven visualization

Adaptive Mesh Refinement (AMR) is a highly effective method for simulations spanning a large rang... more Adaptive Mesh Refinement (AMR) is a highly effective method for simulations spanning a large range of spatiotemporal scales such as those encountered in astrophysical simulations. Combining research in novel AMR visualization algorithms and basic infrastructure work, the Department of Energy's (DOEs) Science Discovery through Advanced Computing (SciDAC) Visualization and Analytics Center for Enabling Technologies (VACET) has extended VisIt, an open source visualization tool that can handle AMR data without converting it to alternate representations. This paper focuses on two recent advances in the development of VisIt. First, we have developed streamline computation methods that properly handle multi-domain data sets and utilize effectively multiple processors on parallel machines. Furthermore, we are working on streamline calculation methods that consider an AMR hierarchy and detect transitions from a lower resolution patch into a finer patch and improve interpolation at level boundaries. Second, we focus on visualization of large-scale particle data sets. By integrating the DOE Scientific Data Management (SDM) Center's FastBit indexing technology into VisIt, we are able to reduce particle counts effectively by thresholding and by loading only those particles from disk that satisfy the thresholding criteria. Furthermore, using FastBit it becomes possible to compute parallel coordinate views efficiently, thus facilitating interactive data exploration of massive particle data sets.

Research paper thumbnail of Preconditioned techniques for large eigenvalue problems

Research paper thumbnail of Compressed bitmap indices for e cient query processing

Research paper thumbnail of Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma

Research paper thumbnail of Scientific discovery at the exascale: report from the DOE ASCR 2011 workshop on exascale data management, analysis, and visualization

Research paper thumbnail of A Block Orthogonalization Procedure with Constant Synchronization Requirements

SIAM Journal on Scientific Computing, 2002

We propose an alternative orthonormalization method that computes the orthonormal basis from the ... more We propose an alternative orthonormalization method that computes the orthonormal basis from the right singular vectors of a matrix. Its advantage are: a all operations are matrix-matrix multiplications and thus cache-e cient, b only one synchronization point is required in parallel implementations, c could be more stable than Gram-Schmidt. In addition, we consider the problem of incremental orthonormalization where a block o f v ectors is orthonormalized against a previously orthonormal set of vectors and among itself. We solve this problem by alternating iteratively between a phase of Gram-Schmidt and a phase of the new method. We provide error analysis and use it to derive bounds on how accurately the two successive orthonormalization phases should be performed to minimize total work performed. Our experiments con rm the favorable numerical behavior of the new method and its e ectiveness on modern parallel computers.

Research paper thumbnail of Solution of large eigenvalue problems in electronic structure calculations

BIT Numerical Mathematics, 1996

Predicting the structural and electronic properties of complex systems is one of the outstanding ... more Predicting the structural and electronic properties of complex systems is one of the outstanding problems in condensed matter physics. Central to most methods used in molecular dynamics is the repeated solution of large eigenvalue problems. This paper reviews the source of these eigenvalue problems, describes some techniques for solving them, and addresses the difficulties and challenges which are faced. Parallel implementations are also discussed .

Research paper thumbnail of Scientific discovery at the exascale. Report from the DOE ASCR 2011 Workshop on Exascale Data Management

Research paper thumbnail of DQGMRES: a quasi-minimal residual algorithm based on incomplete orthogonalization

Page 1. DQGMRES: a Quasi { minimal residual algorithm based on incomplete orthogonalizationYoucef... more Page 1. DQGMRES: a Quasi { minimal residual algorithm based on incomplete orthogonalizationYoucef Saad and Kesheng Wu University of Minnesota, Computer Science Department October, 1993 ... i = 0 B B B B B B B B B B B B @ 1 ... 1 ci si ?si ci 1 ... 1 1 ...

Research paper thumbnail of Inexact Newton preconditioning techniques for large symmetric eigenvalue problems

This paper studies a number of Newton methods and use them to define new secondary linear systems... more This paper studies a number of Newton methods and use them to define new secondary linear systems of equations for the Davidson eigenvalue method. The new secondary equations avoid some common pitfalls of the existing ones such as the correction equation and the Jacobi-Davidson preconditioning. We will also demonstrate that the new schemes can be used efficiently in test problems.

Research paper thumbnail of High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma

Research paper thumbnail of Grid collector: an event catalog with automated file management

2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515), 2003

High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce larg... more High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides "direct" access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, "production date between March 10 and 20, and the number of charged tracks > 100." The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large varieties of users.

Research paper thumbnail of Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices

18th International Conference on Scientific and Statistical Database Management (SSDBM'06), 2000

Bitmap indices have been widely used in scientific applications and commercial systems for proces... more Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs, however, results of queries based on bins often require additional filtering for discarding \it false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering, also known as "candidate checking," requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for "candidate checking" for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.

Research paper thumbnail of Efficient Analysis of Live and Historical Streaming Data and its Application to Cybersecurity

This paper describes our experiences building a coherent framework for ef- ficient simultaneous q... more This paper describes our experiences building a coherent framework for ef- ficient simultaneous querying of live and archived stream data. This work was motivated by the need to analyze the network traffic patterns of research labora- tories funded by the U.S. Department of Energy. We review the requirements of such a system and implement a prototype based on the TelegraphCQ

Research paper thumbnail of Fastconnected-componentlabeling

Research paper thumbnail of Brans-Dicke theory in non-commutative geometry

Research paper thumbnail of The Iterative Solvers Module

P SPARSLIBis a library of portable FORTRAN routines for sparse matrix compuations. The current th... more P SPARSLIBis a library of portable FORTRAN routines for sparse matrix compuations. The current thrust of the library is in iterative solution techniques. In this note we present the`accelerators' part of the library, which consists of the best known of Krylov subspace techniques. This iterative solution module is implemented in reverse communication mode so as to allow any preconditioned to be combined with the pacgake. In addition, this mechanism allows us to ensure portability, since the communication calls required in the iterative solution process are hidden in the dot product and the matrix-vector product and preconditioning operatins. P SPARSLIB 4 CGNR This algorithm is intended for solving linear systms as well as leastsquares problems. It consists of solving the linear system, A T Ax = A T b by a CG method. Since A T A is always positive semi-de nite, it is guaranteed, in theory, to always converge to a solution. CGNR may be a good approach for highly inde nite matrices. For example if the matrix is unitary, then it can solve the linear system in just one step, whereas most of the other Krylov subspace projection methods will typically converge slowly. For typical problems arising from the discretization of partial di erential equations, CGNR converges more slowly than CG or BCG and so this approach is not as popular in this particular context.

Research paper thumbnail of FastBit Reference Manual

Research paper thumbnail of Technical Highlights from the ExaHDF5 project

Doe Exascale Research Conference Portland or April 16 18 2012, Apr 21, 2014

Research paper thumbnail of Detecting atmospheric rivers in large climate datasets

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities - PDAC '11, 2011

Extreme precipitation events on the western coast of North America are often traced to an unusual... more Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Research paper thumbnail of Recent advances in visit: Amr streamlines and query-driven visualization

Adaptive Mesh Refinement (AMR) is a highly effective method for simulations spanning a large rang... more Adaptive Mesh Refinement (AMR) is a highly effective method for simulations spanning a large range of spatiotemporal scales such as those encountered in astrophysical simulations. Combining research in novel AMR visualization algorithms and basic infrastructure work, the Department of Energy's (DOEs) Science Discovery through Advanced Computing (SciDAC) Visualization and Analytics Center for Enabling Technologies (VACET) has extended VisIt, an open source visualization tool that can handle AMR data without converting it to alternate representations. This paper focuses on two recent advances in the development of VisIt. First, we have developed streamline computation methods that properly handle multi-domain data sets and utilize effectively multiple processors on parallel machines. Furthermore, we are working on streamline calculation methods that consider an AMR hierarchy and detect transitions from a lower resolution patch into a finer patch and improve interpolation at level boundaries. Second, we focus on visualization of large-scale particle data sets. By integrating the DOE Scientific Data Management (SDM) Center's FastBit indexing technology into VisIt, we are able to reduce particle counts effectively by thresholding and by loading only those particles from disk that satisfy the thresholding criteria. Furthermore, using FastBit it becomes possible to compute parallel coordinate views efficiently, thus facilitating interactive data exploration of massive particle data sets.

Research paper thumbnail of Preconditioned techniques for large eigenvalue problems

Research paper thumbnail of Compressed bitmap indices for e cient query processing

Research paper thumbnail of Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma

Research paper thumbnail of Scientific discovery at the exascale: report from the DOE ASCR 2011 workshop on exascale data management, analysis, and visualization

Research paper thumbnail of A Block Orthogonalization Procedure with Constant Synchronization Requirements

SIAM Journal on Scientific Computing, 2002

We propose an alternative orthonormalization method that computes the orthonormal basis from the ... more We propose an alternative orthonormalization method that computes the orthonormal basis from the right singular vectors of a matrix. Its advantage are: a all operations are matrix-matrix multiplications and thus cache-e cient, b only one synchronization point is required in parallel implementations, c could be more stable than Gram-Schmidt. In addition, we consider the problem of incremental orthonormalization where a block o f v ectors is orthonormalized against a previously orthonormal set of vectors and among itself. We solve this problem by alternating iteratively between a phase of Gram-Schmidt and a phase of the new method. We provide error analysis and use it to derive bounds on how accurately the two successive orthonormalization phases should be performed to minimize total work performed. Our experiments con rm the favorable numerical behavior of the new method and its e ectiveness on modern parallel computers.

Research paper thumbnail of Solution of large eigenvalue problems in electronic structure calculations

BIT Numerical Mathematics, 1996

Predicting the structural and electronic properties of complex systems is one of the outstanding ... more Predicting the structural and electronic properties of complex systems is one of the outstanding problems in condensed matter physics. Central to most methods used in molecular dynamics is the repeated solution of large eigenvalue problems. This paper reviews the source of these eigenvalue problems, describes some techniques for solving them, and addresses the difficulties and challenges which are faced. Parallel implementations are also discussed .

Research paper thumbnail of Scientific discovery at the exascale. Report from the DOE ASCR 2011 Workshop on Exascale Data Management

Research paper thumbnail of DQGMRES: a quasi-minimal residual algorithm based on incomplete orthogonalization

Page 1. DQGMRES: a Quasi { minimal residual algorithm based on incomplete orthogonalizationYoucef... more Page 1. DQGMRES: a Quasi { minimal residual algorithm based on incomplete orthogonalizationYoucef Saad and Kesheng Wu University of Minnesota, Computer Science Department October, 1993 ... i = 0 B B B B B B B B B B B B @ 1 ... 1 ci si ?si ci 1 ... 1 1 ...

Research paper thumbnail of Inexact Newton preconditioning techniques for large symmetric eigenvalue problems

This paper studies a number of Newton methods and use them to define new secondary linear systems... more This paper studies a number of Newton methods and use them to define new secondary linear systems of equations for the Davidson eigenvalue method. The new secondary equations avoid some common pitfalls of the existing ones such as the correction equation and the Jacobi-Davidson preconditioning. We will also demonstrate that the new schemes can be used efficiently in test problems.

Research paper thumbnail of High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma

Research paper thumbnail of Grid collector: an event catalog with automated file management

2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515), 2003

High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce larg... more High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides "direct" access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, "production date between March 10 and 20, and the number of charged tracks > 100." The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large varieties of users.

Research paper thumbnail of Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices

18th International Conference on Scientific and Statistical Database Management (SSDBM'06), 2000

Bitmap indices have been widely used in scientific applications and commercial systems for proces... more Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs, however, results of queries based on bins often require additional filtering for discarding \it false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering, also known as "candidate checking," requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for "candidate checking" for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.

Research paper thumbnail of Efficient Analysis of Live and Historical Streaming Data and its Application to Cybersecurity

This paper describes our experiences building a coherent framework for ef- ficient simultaneous q... more This paper describes our experiences building a coherent framework for ef- ficient simultaneous querying of live and archived stream data. This work was motivated by the need to analyze the network traffic patterns of research labora- tories funded by the U.S. Department of Energy. We review the requirements of such a system and implement a prototype based on the TelegraphCQ

Research paper thumbnail of Fastconnected-componentlabeling

Research paper thumbnail of Brans-Dicke theory in non-commutative geometry

Research paper thumbnail of The Iterative Solvers Module

P SPARSLIBis a library of portable FORTRAN routines for sparse matrix compuations. The current th... more P SPARSLIBis a library of portable FORTRAN routines for sparse matrix compuations. The current thrust of the library is in iterative solution techniques. In this note we present the`accelerators' part of the library, which consists of the best known of Krylov subspace techniques. This iterative solution module is implemented in reverse communication mode so as to allow any preconditioned to be combined with the pacgake. In addition, this mechanism allows us to ensure portability, since the communication calls required in the iterative solution process are hidden in the dot product and the matrix-vector product and preconditioning operatins. P SPARSLIB 4 CGNR This algorithm is intended for solving linear systms as well as leastsquares problems. It consists of solving the linear system, A T Ax = A T b by a CG method. Since A T A is always positive semi-de nite, it is guaranteed, in theory, to always converge to a solution. CGNR may be a good approach for highly inde nite matrices. For example if the matrix is unitary, then it can solve the linear system in just one step, whereas most of the other Krylov subspace projection methods will typically converge slowly. For typical problems arising from the discretization of partial di erential equations, CGNR converges more slowly than CG or BCG and so this approach is not as popular in this particular context.