Xuepeng Fan | Huazhong University of Science and Technology (original) (raw)
Papers by Xuepeng Fan
IEEE Transactions on Sustainable Computing
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize data shuffling. We identify useful functional properties for userdefined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 dataparallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and network I/O for shuffling, and up to 48 % in terms of cross-pod...
IEICE Transactions on Information and Systems, 2021
Blockchain-based voting, including liquid voting, has been extensively studied in recent years. H... more Blockchain-based voting, including liquid voting, has been extensively studied in recent years. However, it remains challenging to implement liquid voting on blockchain using Ethereum smart contract. The challenge comes from the gas limit, which is that the number of instructions for processing a ballot cannot exceed a certain amount. This restricts the application scenario with respect to algorithms whose time complexity is linear to the number of voters, i.e., O(n). As the blockchain technology can well share and reuse the resources, we study a model of liquid voting on blockchain and propose a fast algorithm, named Flash, to eliminate the restriction. The key idea behind our algorithm is to shift some on-chain process to off-chain. In detail, we first construct a Merkle tree offchain which contains all voters' properties. Second, we use Merkle proof and interval tree to process each ballot with O(log n) on-chain time complexity. Theoretically, the algorithm can support up to 2 1000 voters with respect to the current gas limit on Ethereum. Experimentally, the result implies that the consumed gas fee remains at a very low level when the number of voters increases. This means our algorithm makes liquid voting on blockchain practical even for massive voters.
ArXiv, 2019
We study the liquid democracy problem, where each voter can either directly vote to a candidate o... more We study the liquid democracy problem, where each voter can either directly vote to a candidate or delegate his voting power to a proxy. We consider the implementation of liquid democracy on the blockchain through Ethereum smart contract and to be compatible with the realtime self-tallying property, where the contract itself can record ballots and update voting status upon receiving each voting massage. A challenge comes due to the gas fee limitation of Ethereum mainnet, that the number of instruction for processing a voting massage can not exceed a certain amount, which restrict the application scenario with respect to algorithms whose time complexity is linear to the number of voters. We propose a fast algorithm to overcome the challenge, such that i) shifts the on-chain initialization to off-chain and ii) the on-chain complexity for processing each voting massage is O(\log n), where n is the number of voters.
Frontiers of Computer Science, 2018
With the growing popularity of task-based parallel programming, nowadays task-parallel programmin... more With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks-the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks' dependencies. In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks' termination. The key idea behind dependency expression is to use && for both task's termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run. Finally, we demonstrate FunctionFlow's effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.
Mobile Networks and Applications, 2016
Parameter-server frameworks play an important role in scaling-up distributed deep learning algori... more Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server's resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck. In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.
IEEE Transactions on Emerging Topics in Computing
Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to ... more Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to represent user accounts. On the other hand, inferring blockchain account types (i.e., miners, smart contracts or exchanges), which are also referred to as blockchain identities, is significant in many scenarios, such as risk assessment and trade regulation. Existing work on blockchain deanonymization mainly focuses on Bitcoin that supports simple transactions of cryptocurrencies. As the popularity of decentralized application (DApp) platform blockchains with Turing-complete smart contracts, represented by Ethereum, identity inference in blockchain faces new challenges because of user diversity and complexity of activities enabled by smart contracts. In this paper, we propose I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq1-3027309.gif"/></alternatives></inline-formula>GL, an identify inference approach based on big graph analytics and learning to address these challenges. Specifically, I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq2-3027309.gif"/></alternatives></inline-formula>GL constructs a transaction graph and aims to infer the identity of nodes using the graph learning technique based on Graph Convolutional Networks. Furthermore, a series of enhancement has been proposed by exploiting unique features of blockchain transaction graph. The experimental results on Ethereum transaction records show that I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq3-3027309.gif"/></alternatives></inline-formula>GL significantly outperforms other state-of-the-art methods.
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Frontiers of Computer Science, 2015
2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012
IEEE Transactions on Parallel and Distributed Systems, 2014
Expressing synchronization in task parallelism remains a significant challenge because of the com... more Expressing synchronization in task parallelism remains a significant challenge because of the complicated relationships between tasks. In this paper, we propose a novel parallel programming model, namely function flow, where synchronization is easier to express. We release the burden of synchronizing by the virtue of parallel functions and functional wait. In function flow, parallel functions are defined to represent parallel tasks. Functional wait is designed to coordinate relationships of parallel functions at task-level. The main aspects of functional waits are booleanexpressions which are coupled with parallel functions' invocations. The functional wait mechanisms, based on parallel functions, provide powerful semantic supports and compiling time checking on relationships of parallel tasks. Our preliminary result of function flow shows that a wide range of realistic parallel programs can be easily expressed with performance coming close to well-tuned multi-threaded programs using barriers/sync. The overhead of task-level coordination is very low, not exceed 8%.
Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in di... more Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our Transaction Memory Consistency Protocol (TMCP), and point the new features compared to the current protocols. After formulating our model and analyzing the performance, we found both too much and too little execution time will cause more conflicts, given that the execution time of transaction population follows Gamma distribution. We indicate that it is important to adjust the execution time to a reasonable value to improve performance.
The largest difference between a distributed and a non-distributed system is that the former intr... more The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.
IEEE Transactions on Sustainable Computing
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize data shuffling. We identify useful functional properties for userdefined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 dataparallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and network I/O for shuffling, and up to 48 % in terms of cross-pod...
IEICE Transactions on Information and Systems, 2021
Blockchain-based voting, including liquid voting, has been extensively studied in recent years. H... more Blockchain-based voting, including liquid voting, has been extensively studied in recent years. However, it remains challenging to implement liquid voting on blockchain using Ethereum smart contract. The challenge comes from the gas limit, which is that the number of instructions for processing a ballot cannot exceed a certain amount. This restricts the application scenario with respect to algorithms whose time complexity is linear to the number of voters, i.e., O(n). As the blockchain technology can well share and reuse the resources, we study a model of liquid voting on blockchain and propose a fast algorithm, named Flash, to eliminate the restriction. The key idea behind our algorithm is to shift some on-chain process to off-chain. In detail, we first construct a Merkle tree offchain which contains all voters' properties. Second, we use Merkle proof and interval tree to process each ballot with O(log n) on-chain time complexity. Theoretically, the algorithm can support up to 2 1000 voters with respect to the current gas limit on Ethereum. Experimentally, the result implies that the consumed gas fee remains at a very low level when the number of voters increases. This means our algorithm makes liquid voting on blockchain practical even for massive voters.
ArXiv, 2019
We study the liquid democracy problem, where each voter can either directly vote to a candidate o... more We study the liquid democracy problem, where each voter can either directly vote to a candidate or delegate his voting power to a proxy. We consider the implementation of liquid democracy on the blockchain through Ethereum smart contract and to be compatible with the realtime self-tallying property, where the contract itself can record ballots and update voting status upon receiving each voting massage. A challenge comes due to the gas fee limitation of Ethereum mainnet, that the number of instruction for processing a voting massage can not exceed a certain amount, which restrict the application scenario with respect to algorithms whose time complexity is linear to the number of voters. We propose a fast algorithm to overcome the challenge, such that i) shifts the on-chain initialization to off-chain and ii) the on-chain complexity for processing each voting massage is O(\log n), where n is the number of voters.
Frontiers of Computer Science, 2018
With the growing popularity of task-based parallel programming, nowadays task-parallel programmin... more With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks-the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks' dependencies. In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks' termination. The key idea behind dependency expression is to use && for both task's termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run. Finally, we demonstrate FunctionFlow's effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.
Mobile Networks and Applications, 2016
Parameter-server frameworks play an important role in scaling-up distributed deep learning algori... more Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server's resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck. In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.
IEEE Transactions on Emerging Topics in Computing
Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to ... more Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to represent user accounts. On the other hand, inferring blockchain account types (i.e., miners, smart contracts or exchanges), which are also referred to as blockchain identities, is significant in many scenarios, such as risk assessment and trade regulation. Existing work on blockchain deanonymization mainly focuses on Bitcoin that supports simple transactions of cryptocurrencies. As the popularity of decentralized application (DApp) platform blockchains with Turing-complete smart contracts, represented by Ethereum, identity inference in blockchain faces new challenges because of user diversity and complexity of activities enabled by smart contracts. In this paper, we propose I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq1-3027309.gif"/></alternatives></inline-formula>GL, an identify inference approach based on big graph analytics and learning to address these challenges. Specifically, I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq2-3027309.gif"/></alternatives></inline-formula>GL constructs a transaction graph and aims to infer the identity of nodes using the graph learning technique based on Graph Convolutional Networks. Furthermore, a series of enhancement has been proposed by exploiting unique features of blockchain transaction graph. The experimental results on Ethereum transaction records show that I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq3-3027309.gif"/></alternatives></inline-formula>GL significantly outperforms other state-of-the-art methods.
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015
Frontiers of Computer Science, 2015
2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012
IEEE Transactions on Parallel and Distributed Systems, 2014
Expressing synchronization in task parallelism remains a significant challenge because of the com... more Expressing synchronization in task parallelism remains a significant challenge because of the complicated relationships between tasks. In this paper, we propose a novel parallel programming model, namely function flow, where synchronization is easier to express. We release the burden of synchronizing by the virtue of parallel functions and functional wait. In function flow, parallel functions are defined to represent parallel tasks. Functional wait is designed to coordinate relationships of parallel functions at task-level. The main aspects of functional waits are booleanexpressions which are coupled with parallel functions' invocations. The functional wait mechanisms, based on parallel functions, provide powerful semantic supports and compiling time checking on relationships of parallel tasks. Our preliminary result of function flow shows that a wide range of realistic parallel programs can be easily expressed with performance coming close to well-tuned multi-threaded programs using barriers/sync. The overhead of task-level coordination is very low, not exceed 8%.
Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in di... more Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our Transaction Memory Consistency Protocol (TMCP), and point the new features compared to the current protocols. After formulating our model and analyzing the performance, we found both too much and too little execution time will cause more conflicts, given that the execution time of transaction population follows Gamma distribution. We indicate that it is important to adjust the execution time to a reasonable value to improve performance.
The largest difference between a distributed and a non-distributed system is that the former intr... more The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.