Xuepeng Fan | Huazhong University of Science and Technology (original) (raw)

Papers by Xuepeng Fan

Research paper thumbnail of Phoenix: A Live Upgradable Blockchain Client

IEEE Transactions on Sustainable Computing

Research paper thumbnail of Microsoft Bing Peking University

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize data shuffling. We identify useful functional properties for userdefined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 dataparallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and network I/O for shuffling, and up to 48 % in terms of cross-pod...

Research paper thumbnail of A Fast Algorithm for Liquid Voting on Blockchain

IEICE Transactions on Information and Systems, 2021

Blockchain-based voting, including liquid voting, has been extensively studied in recent years. H... more Blockchain-based voting, including liquid voting, has been extensively studied in recent years. However, it remains challenging to implement liquid voting on blockchain using Ethereum smart contract. The challenge comes from the gas limit, which is that the number of instructions for processing a ballot cannot exceed a certain amount. This restricts the application scenario with respect to algorithms whose time complexity is linear to the number of voters, i.e., O(n). As the blockchain technology can well share and reuse the resources, we study a model of liquid voting on blockchain and propose a fast algorithm, named Flash, to eliminate the restriction. The key idea behind our algorithm is to shift some on-chain process to off-chain. In detail, we first construct a Merkle tree offchain which contains all voters' properties. Second, we use Merkle proof and interval tree to process each ballot with O(log n) on-chain time complexity. Theoretically, the algorithm can support up to 2 1000 voters with respect to the current gas limit on Ethereum. Experimentally, the result implies that the consumed gas fee remains at a very low level when the number of voters increases. This means our algorithm makes liquid voting on blockchain practical even for massive voters.

Research paper thumbnail of Implement Liquid Democracy on Ethereum: A Fast Algorithm for Realtime Self-tally Voting System

ArXiv, 2019

We study the liquid democracy problem, where each voter can either directly vote to a candidate o... more We study the liquid democracy problem, where each voter can either directly vote to a candidate or delegate his voting power to a proxy. We consider the implementation of liquid democracy on the blockchain through Ethereum smart contract and to be compatible with the realtime self-tallying property, where the contract itself can record ballots and update voting status upon receiving each voting massage. A challenge comes due to the gas fee limitation of Ethereum mainnet, that the number of instruction for processing a voting massage can not exceed a certain amount, which restrict the application scenario with respect to algorithms whose time complexity is linear to the number of voters. We propose a fast algorithm to overcome the challenge, such that i) shifts the on-chain initialization to off-chain and ii) the on-chain complexity for processing each voting massage is O(\log n), where n is the number of voters.

Research paper thumbnail of FunctionFlow: coordinating parallel tasks

Frontiers of Computer Science, 2018

With the growing popularity of task-based parallel programming, nowadays task-parallel programmin... more With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks-the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks' dependencies. In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks' termination. The key idea behind dependency expression is to use && for both task's termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run. Finally, we demonstrate FunctionFlow's effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.

Research paper thumbnail of Automatically Setting Parameter-Exchanging Interval for Deep Learning

Mobile Networks and Applications, 2016

Parameter-server frameworks play an important role in scaling-up distributed deep learning algori... more Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server's resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck. In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

Research paper thumbnail of A Graph Learning Based Approach for Identity Inference in DApp Platform Blockchain

IEEE Transactions on Emerging Topics in Computing

Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to ... more Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to represent user accounts. On the other hand, inferring blockchain account types (i.e., miners, smart contracts or exchanges), which are also referred to as blockchain identities, is significant in many scenarios, such as risk assessment and trade regulation. Existing work on blockchain deanonymization mainly focuses on Bitcoin that supports simple transactions of cryptocurrencies. As the popularity of decentralized application (DApp) platform blockchains with Turing-complete smart contracts, represented by Ethereum, identity inference in blockchain faces new challenges because of user diversity and complexity of activities enabled by smart contracts. In this paper, we propose I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq1-3027309.gif"/></alternatives></inline-formula>GL, an identify inference approach based on big graph analytics and learning to address these challenges. Specifically, I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq2-3027309.gif"/></alternatives></inline-formula>GL constructs a transaction graph and aims to infer the identity of nodes using the graph learning technique based on Graph Convolutional Networks. Furthermore, a series of enhancement has been proposed by exploiting unique features of blockchain transaction graph. The experimental results on Ethereum transaction records show that I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq3-3027309.gif"/></alternatives></inline-formula>GL significantly outperforms other state-of-the-art methods.

Research paper thumbnail of Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of Understanding and identifying latent data races cross-thread interleaving

Frontiers of Computer Science, 2015

Research paper thumbnail of An Efficient Distributed Transactional Memory System

2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012

Research paper thumbnail of Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

IEEE Transactions on Parallel and Distributed Systems, 2014

Research paper thumbnail of Function flow: making synchronization easier in task parallelism

Expressing synchronization in task parallelism remains a significant challenge because of the com... more Expressing synchronization in task parallelism remains a significant challenge because of the complicated relationships between tasks. In this paper, we propose a novel parallel programming model, namely function flow, where synchronization is easier to express. We release the burden of synchronizing by the virtue of parallel functions and functional wait. In function flow, parallel functions are defined to represent parallel tasks. Functional wait is designed to coordinate relationships of parallel functions at task-level. The main aspects of functional waits are booleanexpressions which are coupled with parallel functions' invocations. The functional wait mechanisms, based on parallel functions, provide powerful semantic supports and compiling time checking on relationships of parallel tasks. Our preliminary result of function flow shows that a wide range of realistic parallel programs can be easily expressed with performance coming close to well-tuned multi-threaded programs using barriers/sync. The overhead of task-level coordination is very low, not exceed 8%.

Research paper thumbnail of Transactional Memory Consistency: A New Consistency Model for Distributed Transactional Memory

Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in di... more Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our Transaction Memory Consistency Protocol (TMCP), and point the new features compared to the current protocols. After formulating our model and analyzing the performance, we found both too much and too little execution time will cause more conflicts, given that the execution time of transaction population follows Gamma distribution. We indicate that it is important to adjust the execution time to a reasonable value to improve performance.

Research paper thumbnail of Meld: A Real-Time Message Logic Debugging System for Distributed Systems

The largest difference between a distributed and a non-distributed system is that the former intr... more The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld

Research paper thumbnail of Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.

Research paper thumbnail of Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

Research paper thumbnail of Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.

Research paper thumbnail of Phoenix: A Live Upgradable Blockchain Client

IEEE Transactions on Sustainable Computing

Research paper thumbnail of Microsoft Bing Peking University

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize data shuffling. We identify useful functional properties for userdefined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 dataparallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and network I/O for shuffling, and up to 48 % in terms of cross-pod...

Research paper thumbnail of A Fast Algorithm for Liquid Voting on Blockchain

IEICE Transactions on Information and Systems, 2021

Blockchain-based voting, including liquid voting, has been extensively studied in recent years. H... more Blockchain-based voting, including liquid voting, has been extensively studied in recent years. However, it remains challenging to implement liquid voting on blockchain using Ethereum smart contract. The challenge comes from the gas limit, which is that the number of instructions for processing a ballot cannot exceed a certain amount. This restricts the application scenario with respect to algorithms whose time complexity is linear to the number of voters, i.e., O(n). As the blockchain technology can well share and reuse the resources, we study a model of liquid voting on blockchain and propose a fast algorithm, named Flash, to eliminate the restriction. The key idea behind our algorithm is to shift some on-chain process to off-chain. In detail, we first construct a Merkle tree offchain which contains all voters' properties. Second, we use Merkle proof and interval tree to process each ballot with O(log n) on-chain time complexity. Theoretically, the algorithm can support up to 2 1000 voters with respect to the current gas limit on Ethereum. Experimentally, the result implies that the consumed gas fee remains at a very low level when the number of voters increases. This means our algorithm makes liquid voting on blockchain practical even for massive voters.

Research paper thumbnail of Implement Liquid Democracy on Ethereum: A Fast Algorithm for Realtime Self-tally Voting System

ArXiv, 2019

We study the liquid democracy problem, where each voter can either directly vote to a candidate o... more We study the liquid democracy problem, where each voter can either directly vote to a candidate or delegate his voting power to a proxy. We consider the implementation of liquid democracy on the blockchain through Ethereum smart contract and to be compatible with the realtime self-tallying property, where the contract itself can record ballots and update voting status upon receiving each voting massage. A challenge comes due to the gas fee limitation of Ethereum mainnet, that the number of instruction for processing a voting massage can not exceed a certain amount, which restrict the application scenario with respect to algorithms whose time complexity is linear to the number of voters. We propose a fast algorithm to overcome the challenge, such that i) shifts the on-chain initialization to off-chain and ii) the on-chain complexity for processing each voting massage is O(\log n), where n is the number of voters.

Research paper thumbnail of FunctionFlow: coordinating parallel tasks

Frontiers of Computer Science, 2018

With the growing popularity of task-based parallel programming, nowadays task-parallel programmin... more With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks-the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks' dependencies. In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks' termination. The key idea behind dependency expression is to use && for both task's termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run. Finally, we demonstrate FunctionFlow's effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.

Research paper thumbnail of Automatically Setting Parameter-Exchanging Interval for Deep Learning

Mobile Networks and Applications, 2016

Parameter-server frameworks play an important role in scaling-up distributed deep learning algori... more Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server's resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck. In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

Research paper thumbnail of A Graph Learning Based Approach for Identity Inference in DApp Platform Blockchain

IEEE Transactions on Emerging Topics in Computing

Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to ... more Current cryptocurrencies, such as Bitcoin and Ethereum, enable anonymity by using public keys to represent user accounts. On the other hand, inferring blockchain account types (i.e., miners, smart contracts or exchanges), which are also referred to as blockchain identities, is significant in many scenarios, such as risk assessment and trade regulation. Existing work on blockchain deanonymization mainly focuses on Bitcoin that supports simple transactions of cryptocurrencies. As the popularity of decentralized application (DApp) platform blockchains with Turing-complete smart contracts, represented by Ethereum, identity inference in blockchain faces new challenges because of user diversity and complexity of activities enabled by smart contracts. In this paper, we propose I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq1-3027309.gif"/></alternatives></inline-formula>GL, an identify inference approach based on big graph analytics and learning to address these challenges. Specifically, I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq2-3027309.gif"/></alternatives></inline-formula>GL constructs a transaction graph and aims to infer the identity of nodes using the graph learning technique based on Graph Convolutional Networks. Furthermore, a series of enhancement has been proposed by exploiting unique features of blockchain transaction graph. The experimental results on Ethereum transaction records show that I<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="li-ieq3-3027309.gif"/></alternatives></inline-formula>GL significantly outperforms other state-of-the-art methods.

Research paper thumbnail of Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015

Research paper thumbnail of Understanding and identifying latent data races cross-thread interleaving

Frontiers of Computer Science, 2015

Research paper thumbnail of An Efficient Distributed Transactional Memory System

2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012

Research paper thumbnail of Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

IEEE Transactions on Parallel and Distributed Systems, 2014

Research paper thumbnail of Function flow: making synchronization easier in task parallelism

Expressing synchronization in task parallelism remains a significant challenge because of the com... more Expressing synchronization in task parallelism remains a significant challenge because of the complicated relationships between tasks. In this paper, we propose a novel parallel programming model, namely function flow, where synchronization is easier to express. We release the burden of synchronizing by the virtue of parallel functions and functional wait. In function flow, parallel functions are defined to represent parallel tasks. Functional wait is designed to coordinate relationships of parallel functions at task-level. The main aspects of functional waits are booleanexpressions which are coupled with parallel functions' invocations. The functional wait mechanisms, based on parallel functions, provide powerful semantic supports and compiling time checking on relationships of parallel tasks. Our preliminary result of function flow shows that a wide range of realistic parallel programs can be easily expressed with performance coming close to well-tuned multi-threaded programs using barriers/sync. The overhead of task-level coordination is very low, not exceed 8%.

Research paper thumbnail of Transactional Memory Consistency: A New Consistency Model for Distributed Transactional Memory

Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in di... more Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our Transaction Memory Consistency Protocol (TMCP), and point the new features compared to the current protocols. After formulating our model and analyzing the performance, we found both too much and too little execution time will cause more conflicts, given that the execution time of transaction population follows Gamma distribution. We indicate that it is important to adjust the execution time to a reasonable value to improve performance.

Research paper thumbnail of Meld: A Real-Time Message Logic Debugging System for Distributed Systems

The largest difference between a distributed and a non-distributed system is that the former intr... more The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld

Research paper thumbnail of Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.

Research paper thumbnail of Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

Research paper thumbnail of Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined ... more Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as "black boxes", we propose to analyze those functions to turn them into "gray boxes" that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of crosspod network traffic.