Bo Hong - Academia.edu (original) (raw)
Papers by Bo Hong
... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern Ca... more ... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 {bohong, prasanna}@usc.edu ... one of the fol-lowing three operations as long as (Īi) φ= 0: (a) Č Ł×(Īi Īk): applies when (Īi) 0 and Īk st ik - (Īi Īk ...
With continuing advancements in sensor node design and increasingly complex applications for wire... more With continuing advancements in sensor node design and increasingly complex applications for wireless sensor networks (WSNs), formal communication models are needed for either fair comparison between various algorithms or the development of design automation in WSNs. Toward such a goal, we formally define two link-wise communication models, namely, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While programming under CFM resembles similarity with traditional parallel and distributed computation, performance analysis in CFM is often inaccurate or even misleading, due to its over-simplification on modeling packet collisions. On the other hand, by exposing low level details of packet collisions in CAM, we are able to model and analyze the behavior of the network in a more realistic, accurate, and precise fashion. To validate the above arguments, we study a basic operation in WSNs --broadcasting, using the well-known simple flooding and probabilistic based broadcasting schemes. We present an analytical framework that facilitates an accurate and precise modeling and analysis for both schemes in CAM, in terms of two performance metrics including reachability and latency. Our analytical results indicate that the impact of node density on the two performance metrics can be minimized by carefully choosing the probability of broadcasting, implying a strong scalability of the probability based broadcast scheme. Moreover, our analysis is confirmed by extensive simulation results.
IEEE Transactions on Parallel and Distributed Systems, 2003
Recently, several experimental studies have been conducted on block data layout in conjunction wi... more Recently, several experimental studies have been conducted on block data layout in conjunction with tiling as a data transformation technique to improve cache performance. In this paper, we analyze cache and translation look-aside buffer (TLB) performance of such alternate layouts (including block data layout and Morton layout) when used in conjunction with tiling. We derive a tight lower bound on TLB performance for standard matrix access patterns, and show that block data layout and Morton layout achieve this bound. To improve cache performance, block data layout is used in concert with tiling. Based on the cache and TLB performance analysis, we propose a data block size selection algorithm that finds a tight range for optimal block size. To validate our analysis, we conducted simulations and experiments using tiled matrix multiplication, LU decomposition, and Cholesky factorization. For matrix multiplication, simulation results using UltraSparc II parameters show that tiling and block data layout with a block size given by our block size selection algorithm, reduces up to 93 percent of TLB misses compared with other techniques. The total miss cost is reduced considerably. Experiments on several platforms show that tiling with block data layout achieves up to 50 percent performance improvement over other techniques that use conventional layouts. Morton layout is also analyzed and compared with block data layout. Experimental results show that matrix multiplication using block data layout is up to 15 percent faster than that using Morton data layout.
We focus on data gathering problems in energy-constrained wireless sensor networks. We study stor... more We focus on data gathering problems in energy-constrained wireless sensor networks. We study store-and-gather problems where data are locally stored on the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint, which reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in sensor networks. The efficiency of the proposed method is illustrated through simulations.
A key application of networked sensor systems is to detect and classify events of interest in an ... more A key application of networked sensor systems is to detect and classify events of interest in an environment. Such applications require processing of raw data and the fusion of individual decisions. In-network processing of the sensed data has been shown to be more energy efficient than the centralized scheme that gathers all the raw data to a (powerful) base station for further processing. We formulate the problem as a special class of flow optimization problem. We propose a decentralized adaptive algorithm to maximize the throughput of a class of in-network processing applications. This algorithm is further implemented as a decentralized in-network processing protocol that adapts to any changes in link bandwidths and node processing capabilities. Simulations show that the proposed in-network processing protocol achieves upto 95% of the optimal system throughput. We also show that path based greedy heuristics have very poor performance in the worst case. 0-7803-8815-
Recently, several experimental studies have been conducted on block data layout as a data transfo... more Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In this paper, we provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.
Journal of Parallel and Distributed Computing, 2006
We focus on data gathering problems in energy constrained networked sensor systems. The system op... more We focus on data gathering problems in energy constrained networked sensor systems. The system operates in rounds where a subset of the sensors generate a certain number of data packets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system lifetime in terms of the number of rounds the system can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing spanning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available information. No global synchronization is needed. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic.
IEEE Transactions on Parallel and Distributed Systems, 2007
In this paper, we consider the task allocation problem for computing a large set of equal-sized i... more In this paper, we consider the task allocation problem for computing a large set of equal-sized independent tasks on a heterogeneous computing system where the tasks initially reside on a single computer (the root) in the system. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider the scenario
International Journal of Distributed Sensor Networks, 2005
We focus on data gathering problems in energy-constrained networked sensor systems. We study stor... more We focus on data gathering problems in energy-constrained networked sensor systems. We study storeand-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. £
We consider the resource allocation problem for computing a large set of equal-sized independent ... more We consider the resource allocation problem for computing a large set of equal-sized independent tasks on heterogeneous computing systems. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider a general problem in which the interconnection between the nodes is modeled using a graph. We maximize the throughput of the system by using a linear programming formulation. This linear programming formulation is further transformed to an extended network flow representation, which can be solved efficiently using maximum flow/minimum cut algorithms. This leads to a simple distributed protocol for the problem. The effectiveness of the proposed resource allocation approach is verified through simulations
Pervasive and Mobile Computing, 2005
Towards building a systematic methodology of algorithm design for applications of networked senso... more Towards building a systematic methodology of algorithm design for applications of networked sensor systems, we formally define two link-wise communication models, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While CFM provides ease of programming and analysis for high level application functionality, CAM enables more accurate performance analysis and hence more efficient algorithms through cross-layer optimization, at the expense of increased programming and analysis complexity. These communication models are part of an abstract network model, above which algorithm design and performance optimization is performed. We use the example of optimizing a probability based broadcasting scheme under CAM to illustrate algorithm optimization based on the defined models. Specifically, we present an analytical framework that facilitates an accurate modeling and analysis for the probability based broadcasting in CAM (PB_CAM). Our analytical results indicate that (1) the optimal broadcast probability for either maximizing the reachability within a given latency constraint or minimizing the latency for a given reachability constraint decreases rapidly with node density, and (2) the optimal probability for either maximizing the reachability with a given energy constraint or minimizing the energy cost for a given reachability constraint varies slowly between 0 and 0.1 over the entire range of the variations in node density. Our analysis is also confirmed by extensive simulation results.
... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern Ca... more ... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 {bohong, prasanna}@usc.edu ... one of the fol-lowing three operations as long as (Īi) φ= 0: (a) Č Ł×(Īi Īk): applies when (Īi) 0 and Īk st ik - (Īi Īk ...
With continuing advancements in sensor node design and increasingly complex applications for wire... more With continuing advancements in sensor node design and increasingly complex applications for wireless sensor networks (WSNs), formal communication models are needed for either fair comparison between various algorithms or the development of design automation in WSNs. Toward such a goal, we formally define two link-wise communication models, namely, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While programming under CFM resembles similarity with traditional parallel and distributed computation, performance analysis in CFM is often inaccurate or even misleading, due to its over-simplification on modeling packet collisions. On the other hand, by exposing low level details of packet collisions in CAM, we are able to model and analyze the behavior of the network in a more realistic, accurate, and precise fashion. To validate the above arguments, we study a basic operation in WSNs --broadcasting, using the well-known simple flooding and probabilistic based broadcasting schemes. We present an analytical framework that facilitates an accurate and precise modeling and analysis for both schemes in CAM, in terms of two performance metrics including reachability and latency. Our analytical results indicate that the impact of node density on the two performance metrics can be minimized by carefully choosing the probability of broadcasting, implying a strong scalability of the probability based broadcast scheme. Moreover, our analysis is confirmed by extensive simulation results.
IEEE Transactions on Parallel and Distributed Systems, 2003
Recently, several experimental studies have been conducted on block data layout in conjunction wi... more Recently, several experimental studies have been conducted on block data layout in conjunction with tiling as a data transformation technique to improve cache performance. In this paper, we analyze cache and translation look-aside buffer (TLB) performance of such alternate layouts (including block data layout and Morton layout) when used in conjunction with tiling. We derive a tight lower bound on TLB performance for standard matrix access patterns, and show that block data layout and Morton layout achieve this bound. To improve cache performance, block data layout is used in concert with tiling. Based on the cache and TLB performance analysis, we propose a data block size selection algorithm that finds a tight range for optimal block size. To validate our analysis, we conducted simulations and experiments using tiled matrix multiplication, LU decomposition, and Cholesky factorization. For matrix multiplication, simulation results using UltraSparc II parameters show that tiling and block data layout with a block size given by our block size selection algorithm, reduces up to 93 percent of TLB misses compared with other techniques. The total miss cost is reduced considerably. Experiments on several platforms show that tiling with block data layout achieves up to 50 percent performance improvement over other techniques that use conventional layouts. Morton layout is also analyzed and compared with block data layout. Experimental results show that matrix multiplication using block data layout is up to 15 percent faster than that using Morton data layout.
We focus on data gathering problems in energy-constrained wireless sensor networks. We study stor... more We focus on data gathering problems in energy-constrained wireless sensor networks. We study store-and-gather problems where data are locally stored on the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint, which reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in sensor networks. The efficiency of the proposed method is illustrated through simulations.
A key application of networked sensor systems is to detect and classify events of interest in an ... more A key application of networked sensor systems is to detect and classify events of interest in an environment. Such applications require processing of raw data and the fusion of individual decisions. In-network processing of the sensed data has been shown to be more energy efficient than the centralized scheme that gathers all the raw data to a (powerful) base station for further processing. We formulate the problem as a special class of flow optimization problem. We propose a decentralized adaptive algorithm to maximize the throughput of a class of in-network processing applications. This algorithm is further implemented as a decentralized in-network processing protocol that adapts to any changes in link bandwidths and node processing capabilities. Simulations show that the proposed in-network processing protocol achieves upto 95% of the optimal system throughput. We also show that path based greedy heuristics have very poor performance in the worst case. 0-7803-8815-
Recently, several experimental studies have been conducted on block data layout as a data transfo... more Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In this paper, we provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.
Journal of Parallel and Distributed Computing, 2006
We focus on data gathering problems in energy constrained networked sensor systems. The system op... more We focus on data gathering problems in energy constrained networked sensor systems. The system operates in rounds where a subset of the sensors generate a certain number of data packets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system lifetime in terms of the number of rounds the system can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing spanning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available information. No global synchronization is needed. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic.
IEEE Transactions on Parallel and Distributed Systems, 2007
In this paper, we consider the task allocation problem for computing a large set of equal-sized i... more In this paper, we consider the task allocation problem for computing a large set of equal-sized independent tasks on a heterogeneous computing system where the tasks initially reside on a single computer (the root) in the system. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider the scenario
International Journal of Distributed Sensor Networks, 2005
We focus on data gathering problems in energy-constrained networked sensor systems. We study stor... more We focus on data gathering problems in energy-constrained networked sensor systems. We study storeand-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. £
... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern Ca... more ... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 {bohong, prasanna}@usc.edu ... one of the fol-lowing three operations as long as (Īi) φ= 0: (a) Č Ł×(Īi Īk): applies when (Īi) 0 and Īk st ik - (Īi Īk ...
With continuing advancements in sensor node design and increasingly complex applications for wire... more With continuing advancements in sensor node design and increasingly complex applications for wireless sensor networks (WSNs), formal communication models are needed for either fair comparison between various algorithms or the development of design automation in WSNs. Toward such a goal, we formally define two link-wise communication models, namely, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While programming under CFM resembles similarity with traditional parallel and distributed computation, performance analysis in CFM is often inaccurate or even misleading, due to its over-simplification on modeling packet collisions. On the other hand, by exposing low level details of packet collisions in CAM, we are able to model and analyze the behavior of the network in a more realistic, accurate, and precise fashion. To validate the above arguments, we study a basic operation in WSNs --broadcasting, using the well-known simple flooding and probabilistic based broadcasting schemes. We present an analytical framework that facilitates an accurate and precise modeling and analysis for both schemes in CAM, in terms of two performance metrics including reachability and latency. Our analytical results indicate that the impact of node density on the two performance metrics can be minimized by carefully choosing the probability of broadcasting, implying a strong scalability of the probability based broadcast scheme. Moreover, our analysis is confirmed by extensive simulation results.
IEEE Transactions on Parallel and Distributed Systems, 2003
Recently, several experimental studies have been conducted on block data layout in conjunction wi... more Recently, several experimental studies have been conducted on block data layout in conjunction with tiling as a data transformation technique to improve cache performance. In this paper, we analyze cache and translation look-aside buffer (TLB) performance of such alternate layouts (including block data layout and Morton layout) when used in conjunction with tiling. We derive a tight lower bound on TLB performance for standard matrix access patterns, and show that block data layout and Morton layout achieve this bound. To improve cache performance, block data layout is used in concert with tiling. Based on the cache and TLB performance analysis, we propose a data block size selection algorithm that finds a tight range for optimal block size. To validate our analysis, we conducted simulations and experiments using tiled matrix multiplication, LU decomposition, and Cholesky factorization. For matrix multiplication, simulation results using UltraSparc II parameters show that tiling and block data layout with a block size given by our block size selection algorithm, reduces up to 93 percent of TLB misses compared with other techniques. The total miss cost is reduced considerably. Experiments on several platforms show that tiling with block data layout achieves up to 50 percent performance improvement over other techniques that use conventional layouts. Morton layout is also analyzed and compared with block data layout. Experimental results show that matrix multiplication using block data layout is up to 15 percent faster than that using Morton data layout.
We focus on data gathering problems in energy-constrained wireless sensor networks. We study stor... more We focus on data gathering problems in energy-constrained wireless sensor networks. We study store-and-gather problems where data are locally stored on the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint, which reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in sensor networks. The efficiency of the proposed method is illustrated through simulations.
A key application of networked sensor systems is to detect and classify events of interest in an ... more A key application of networked sensor systems is to detect and classify events of interest in an environment. Such applications require processing of raw data and the fusion of individual decisions. In-network processing of the sensed data has been shown to be more energy efficient than the centralized scheme that gathers all the raw data to a (powerful) base station for further processing. We formulate the problem as a special class of flow optimization problem. We propose a decentralized adaptive algorithm to maximize the throughput of a class of in-network processing applications. This algorithm is further implemented as a decentralized in-network processing protocol that adapts to any changes in link bandwidths and node processing capabilities. Simulations show that the proposed in-network processing protocol achieves upto 95% of the optimal system throughput. We also show that path based greedy heuristics have very poor performance in the worst case. 0-7803-8815-
Recently, several experimental studies have been conducted on block data layout as a data transfo... more Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In this paper, we provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.
Journal of Parallel and Distributed Computing, 2006
We focus on data gathering problems in energy constrained networked sensor systems. The system op... more We focus on data gathering problems in energy constrained networked sensor systems. The system operates in rounds where a subset of the sensors generate a certain number of data packets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system lifetime in terms of the number of rounds the system can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing spanning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available information. No global synchronization is needed. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic.
IEEE Transactions on Parallel and Distributed Systems, 2007
In this paper, we consider the task allocation problem for computing a large set of equal-sized i... more In this paper, we consider the task allocation problem for computing a large set of equal-sized independent tasks on a heterogeneous computing system where the tasks initially reside on a single computer (the root) in the system. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider the scenario
International Journal of Distributed Sensor Networks, 2005
We focus on data gathering problems in energy-constrained networked sensor systems. We study stor... more We focus on data gathering problems in energy-constrained networked sensor systems. We study storeand-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. £
We consider the resource allocation problem for computing a large set of equal-sized independent ... more We consider the resource allocation problem for computing a large set of equal-sized independent tasks on heterogeneous computing systems. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider a general problem in which the interconnection between the nodes is modeled using a graph. We maximize the throughput of the system by using a linear programming formulation. This linear programming formulation is further transformed to an extended network flow representation, which can be solved efficiently using maximum flow/minimum cut algorithms. This leads to a simple distributed protocol for the problem. The effectiveness of the proposed resource allocation approach is verified through simulations
Pervasive and Mobile Computing, 2005
Towards building a systematic methodology of algorithm design for applications of networked senso... more Towards building a systematic methodology of algorithm design for applications of networked sensor systems, we formally define two link-wise communication models, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While CFM provides ease of programming and analysis for high level application functionality, CAM enables more accurate performance analysis and hence more efficient algorithms through cross-layer optimization, at the expense of increased programming and analysis complexity. These communication models are part of an abstract network model, above which algorithm design and performance optimization is performed. We use the example of optimizing a probability based broadcasting scheme under CAM to illustrate algorithm optimization based on the defined models. Specifically, we present an analytical framework that facilitates an accurate modeling and analysis for the probability based broadcasting in CAM (PB_CAM). Our analytical results indicate that (1) the optimal broadcast probability for either maximizing the reachability within a given latency constraint or minimizing the latency for a given reachability constraint decreases rapidly with node density, and (2) the optimal probability for either maximizing the reachability with a given energy constraint or minimizing the energy cost for a given reachability constraint varies slowly between 0 and 0.1 over the entire range of the variations in node density. Our analysis is also confirmed by extensive simulation results.
... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern Ca... more ... Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 {bohong, prasanna}@usc.edu ... one of the fol-lowing three operations as long as (Īi) φ= 0: (a) Č Ł×(Īi Īk): applies when (Īi) 0 and Īk st ik - (Īi Īk ...
With continuing advancements in sensor node design and increasingly complex applications for wire... more With continuing advancements in sensor node design and increasingly complex applications for wireless sensor networks (WSNs), formal communication models are needed for either fair comparison between various algorithms or the development of design automation in WSNs. Toward such a goal, we formally define two link-wise communication models, namely, the Collision Free Model (CFM) and the Collision Aware Model (CAM). While programming under CFM resembles similarity with traditional parallel and distributed computation, performance analysis in CFM is often inaccurate or even misleading, due to its over-simplification on modeling packet collisions. On the other hand, by exposing low level details of packet collisions in CAM, we are able to model and analyze the behavior of the network in a more realistic, accurate, and precise fashion. To validate the above arguments, we study a basic operation in WSNs --broadcasting, using the well-known simple flooding and probabilistic based broadcasting schemes. We present an analytical framework that facilitates an accurate and precise modeling and analysis for both schemes in CAM, in terms of two performance metrics including reachability and latency. Our analytical results indicate that the impact of node density on the two performance metrics can be minimized by carefully choosing the probability of broadcasting, implying a strong scalability of the probability based broadcast scheme. Moreover, our analysis is confirmed by extensive simulation results.
IEEE Transactions on Parallel and Distributed Systems, 2003
Recently, several experimental studies have been conducted on block data layout in conjunction wi... more Recently, several experimental studies have been conducted on block data layout in conjunction with tiling as a data transformation technique to improve cache performance. In this paper, we analyze cache and translation look-aside buffer (TLB) performance of such alternate layouts (including block data layout and Morton layout) when used in conjunction with tiling. We derive a tight lower bound on TLB performance for standard matrix access patterns, and show that block data layout and Morton layout achieve this bound. To improve cache performance, block data layout is used in concert with tiling. Based on the cache and TLB performance analysis, we propose a data block size selection algorithm that finds a tight range for optimal block size. To validate our analysis, we conducted simulations and experiments using tiled matrix multiplication, LU decomposition, and Cholesky factorization. For matrix multiplication, simulation results using UltraSparc II parameters show that tiling and block data layout with a block size given by our block size selection algorithm, reduces up to 93 percent of TLB misses compared with other techniques. The total miss cost is reduced considerably. Experiments on several platforms show that tiling with block data layout achieves up to 50 percent performance improvement over other techniques that use conventional layouts. Morton layout is also analyzed and compared with block data layout. Experimental results show that matrix multiplication using block data layout is up to 15 percent faster than that using Morton data layout.
We focus on data gathering problems in energy-constrained wireless sensor networks. We study stor... more We focus on data gathering problems in energy-constrained wireless sensor networks. We study store-and-gather problems where data are locally stored on the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint, which reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in sensor networks. The efficiency of the proposed method is illustrated through simulations.
A key application of networked sensor systems is to detect and classify events of interest in an ... more A key application of networked sensor systems is to detect and classify events of interest in an environment. Such applications require processing of raw data and the fusion of individual decisions. In-network processing of the sensed data has been shown to be more energy efficient than the centralized scheme that gathers all the raw data to a (powerful) base station for further processing. We formulate the problem as a special class of flow optimization problem. We propose a decentralized adaptive algorithm to maximize the throughput of a class of in-network processing applications. This algorithm is further implemented as a decentralized in-network processing protocol that adapts to any changes in link bandwidths and node processing capabilities. Simulations show that the proposed in-network processing protocol achieves upto 95% of the optimal system throughput. We also show that path based greedy heuristics have very poor performance in the worst case. 0-7803-8815-
Recently, several experimental studies have been conducted on block data layout as a data transfo... more Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In this paper, we provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.
Journal of Parallel and Distributed Computing, 2006
We focus on data gathering problems in energy constrained networked sensor systems. The system op... more We focus on data gathering problems in energy constrained networked sensor systems. The system operates in rounds where a subset of the sensors generate a certain number of data packets during each round. All the data packets need to be transferred to the base station. The goal is to maximize the system lifetime in terms of the number of rounds the system can operate. We show that the above problem reduces to a restricted flow problem with quota constraint, flow conservation requirement, and edge capacity constraint. We further develop a strongly polynomial time algorithm for this problem, which is guaranteed to find an optimal solution. We then study the performance of a distributed shortest path heuristic for the problem. This heuristic is based on self-stabilizing spanning tree construction and shortest path routing methods. In this heuristic, every node determines its sensing activities and data transfers based on locally available information. No global synchronization is needed. Although the heuristic cannot guarantee optimality, simulations show that the heuristic has good average case performance over randomly generated deployment of sensors. We also derive bounds for the worst case performance of the heuristic.
IEEE Transactions on Parallel and Distributed Systems, 2007
In this paper, we consider the task allocation problem for computing a large set of equal-sized i... more In this paper, we consider the task allocation problem for computing a large set of equal-sized independent tasks on a heterogeneous computing system where the tasks initially reside on a single computer (the root) in the system. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider the scenario
International Journal of Distributed Sensor Networks, 2005
We focus on data gathering problems in energy-constrained networked sensor systems. We study stor... more We focus on data gathering problems in energy-constrained networked sensor systems. We study storeand-gather problems where data are locally stored at the sensors before the data gathering starts, and continuous sensing and gathering problems that model time critical applications. We show that these problems reduce to maximization of network flow under vertex capacity constraint. This flow problem in turn reduces to a standard network flow problem. We develop a distributed and adaptive algorithm to optimize data gathering. This algorithm leads to a simple protocol that coordinates the sensor nodes in the system. Our approach provides a unified framework to study a variety of data gathering problems in networked sensor systems. The performance of the proposed method is illustrated through simulations. £