Yusong Tan - Academia.edu (original) (raw)
Papers by Yusong Tan
Abstract. One of differences between cloud storage and previous storage is that there is a financ... more Abstract. One of differences between cloud storage and previous storage is that there is a financial contract between user and the cloud service provider (CSP). User pay for service in exchange for certain guarantees and the cloud is a liable entity. But some mechanisms need to ensure the liability of CSP. Some work use non-repudiation to realize it. Compared with these non-repudiation schemes, we use third party auditor not client to manage proofs and some metadata, which are security critical data in cloud security. It can provide a more security environment for these data. Against the big overhead in update process of current non-repudiation scheme, we propose three schemes to improve it.
Intelligent Computing Theories and Application
2021 International Conference on Communications, Information System and Computer Engineering (CISCE)
Function as a Service (FaaS) is a new popular cloud computing model in recent years, which has th... more Function as a Service (FaaS) is a new popular cloud computing model in recent years, which has the characteristics of automatic scaling, on-demand billing and easy maintenance. However, the SDKs provided by different public FaaS platforms are inconsistent, increasing the cost of learning and application migration for developers. This paper proposes to build a unified development framework by encapsulating the SDKs of each FaaS platform, which uses the factory pattern of object-oriented design mode to define a set of unified abstract classes. This framework can help developers to operate across different FaaS platforms with a unified set of SDK, and has good scalability.
2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 2018
In this paper, we study the problem of privacy-preserving SAT solving in the cloud computing para... more In this paper, we study the problem of privacy-preserving SAT solving in the cloud computing paradigm. We present a novel Husk formula based CNF obfuscation algorithm and its corresponding solution recovery algorithm, to prevent unauthorized third party in cloud obtaining sensitive information. By obfuscation, the CNF formula is transformed into another formula with over-approximated solution space. Solution of the original CNF can be extracted from the solution of obfuscated CNF by projection based solution recovery algorithm. By customizing the Husk formula, we make solution space of obfuscated CNF adjustable to control the ratio of false solution introduced by Obfuscation, to satisfying performance requiments. Theoretical analysis demonstrates that, with the aid of Cubic Husks formula, obfuscation algorithm can not only change the structure of CNF formula with polynomial time complexity, but also construct an adjustable over-approximated solution space. While solution recovery al...
Open source represents an important way in which today’s software is developed. The adoption of o... more Open source represents an important way in which today’s software is developed. The adoption of open source software continues to accelerate because of the compelling economic and productivity benefits they provide. Therefore, it is important to mitigate potential legal exposure by scanning and analyzing the source of code. However, it is not the only benefit of this kind of work. While the complexity and the size of software composition grow, it is also critical to analyze the source of code to ensure code traceability and manageability, especially for software with tremendous scale of code, such as operating system. In this paper, we argue that it is beneficial for code audit and code quality insurance to distinguish the source of system packages and manage them separately. Then we propose an efficient method for code source classification based on package change log info extraction. And we design and implement KyAnalyzer, which support source analyzing and distinguished managemen...
The performance of MapReduce greatly depends on its data splitting process which happens before t... more The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.
Advanced Data Mining and Applications
2016 5th International Conference on Computer Science and Network Technology (ICCSNT), 2016
Cloud data center is becoming more and more important for tenants because cloud resources and ass... more Cloud data center is becoming more and more important for tenants because cloud resources and associated services can be shared by multi-tenant. Although resource isolation and traffic control, which are effective ways to protect the QoS of tenants, have been used extensively, the topology of the network needs to be further studied extensively. In this paper, combining VN with SDN, a topology-aware VSDN embedding approach, named ToVSDN, is proposed to enhance the topology management. Extensive experimental evaluations have been conducted and the results verify that the proposed ToVSDN has improved embedding efficiency and obtained a higher revenue/cost ratio in comparison of correlative algorithms.
2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 2019
Network I/O workloads are dominating as one of the leading costs for most of the virtualized clou... more Network I/O workloads are dominating as one of the leading costs for most of the virtualized clouds. One way to improve the inter virtual machine (VM) inefficiency is to build shared memory channels between VMs co-located on the same physical node to by-pass traditional TCP/IP network stack, so that the overhead is reduced by shorter communication path and fewer kernel interactions. However, it is a key challenge for existing work to achieve high performance inter-VM communication while keeping the capability of VM live migration, and most of existing work are neither seamlessly agile in the presence of VM live migration nor transparent to upper users as well as to operating system kernels, which limits the application of current co-location aware shared-memory based approaches. In this paper, we present the design and implementation of XenVMC, an adaptive and transparent inter-VM communication system for high performance network I/O in virtualized clouds. With proposed dynamic co-l...
2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC), 2017
Software-defined networking (SDN) has separated control and forwarding planes, which is a powerfu... more Software-defined networking (SDN) has separated control and forwarding planes, which is a powerful approach to enable network virtualization by SDN hypervisors to share physical SDN networks among multiple virtual SDN in cloud data center. However, the centralized control and virtual network shared physical network are delicately weak and easy to failure if virtual topology embedding is not properly designed. This paper develops an enhanced virtual controller node mapping strategy to improve the survivability of virtual SDN. The strategy explicitly considers the network delay and the number of disjoint path between controller and forwarding devices to form survivable factor. Our proposed approach has been evaluated and the results show that the proposed technology has improved the survivability and network delay of virtual SDN while keeping the other within reasonable bounds.
2018 14th International Conference on Semantics, Knowledge and Grids (SKG), 2018
It is generally believed that neural network classifiers are vulnerable to misclassification atta... more It is generally believed that neural network classifiers are vulnerable to misclassification attacks. An adversary generates adversarial samples by adding small perturbations to the original samples. These adversarial samples will mislead classifiers, although they are almost the same as the original samples from the perspective of human observation. However, the existing misclassification attacks need either the details of classifier or a fake classifier to craft the adversarial samples. One may think a black box classifier is robust enough against misclassification attacks. We demonstrate that black box classifier is still vulnerable to our proposed misclassification attack. We conceal the details of classifier. The only thing an adversary can do is to query samples' classification results. We proposed a particle swarm optimization based misclassification attack. Using this attack an adversary can make black box classifiers yield erroneous results. The experiments show that Le...
Open source represents an important way in which today's software is developed. The adoption ... more Open source represents an important way in which today's software is developed. The adoption of open source software continues to accelerate because of the great potential it offers, such as productivity improvement, cost savings and quicker innovation. While the complexity and the size of software composition grow, it becomes difficult to effectively scan and track the code source, especially for software with tremendous scale of code, such as operating systems. So far, existing work on open source components mainly focus on how to mitigate potential license incompliance, to reduce potential security risks introduced by open source vulnerabilities, and to detect and match open source components in the code. To ensure code traceability and manageability for large scale mixed-source operating system, we believe it is beneficial to automatically distinguish sources of the system code in the granularity of software packages and manage them separately. However, according to the lite...
Stereo matching is an important research field of computer vision. Due to the dimension of cost a... more Stereo matching is an important research field of computer vision. Due to the dimension of cost aggregation, current neural network-based stereo methods are difficult to trade-off speed and accuracy. To this end, we integrate fast 2D stereo methods with accurate 3D networks to improve performance and reduce running time. We leverage a 2D encoder-decoder network to generate a rough disparity map and construct a disparity range to guide the 3D aggregation network, which can significantly improve the accuracy and reduce the computational cost. We use a stacked hourglass structure to refine the disparity from coarse to fine. We evaluated our method on three public datasets. According to the KITTI official website results, Our network can generate an accurate result in 80 ms on a modern GPU. Compared to other 2D stereo networks (AANet, DeepPruner, FADNet, etc.), our network has a big improvement in accuracy. Meanwhile, it is significantly faster than other 3D stereo networks (5× than PSM...
Stereo matching has attracted much attention in recent years. Traditional methods can quickly gen... more Stereo matching has attracted much attention in recent years. Traditional methods can quickly generate a disparity result, but the accuracy is low. On the contrary, methods based on neural networks can achieve a high accuracy level, but they are difficult to reach the real-time level. Therefore, this paper presents MCDRNet, which combines traditional methods with neural networks to achieve real-time and accurate stereo matching results. Concretely, our network first generates a rough disparity map based on the traditional ADCensus algorithm. Then we design a novel Multi-Scale Cascade Network to refine the disparity map from coarse to fine. We evaluate our best-trained model on the KITTI official website. The results show that our network is much faster than most current top-performing methods(31×than CSPN, 56×than GANet, etc.). Meanwhile, it is more accurate than traditional stereo methods(SGM, SPS-St) and other fast 2D convolution networks(Fast DS-CS, DispNetC, etc.), demonstrating...
In this paper, we propose a novel white-box attack against word-level CNN text classifier. On the... more In this paper, we propose a novel white-box attack against word-level CNN text classifier. On the one hand, we use an Euclidean distance and cosine distance combined metric to find the most semantically similar substitution when generating perturbations, which can effectively increase the attack success rate. We’ve increased global search success rate from 75.8% to 85.8%. On the other hand, we can control the dispersion of the location of the modified words in the adversarial examples by introducing the coefficient of variation(CV) factor, because greedy search sometimes has poor readability for the modified positions in adversarial examples are close. More dispersed modifications can increase human imperceptibility and text readability. We use the attack success rate to evaluate the validity of the attack method, and use CV value to measure the dispersion degree of the modified words in the generated adversarial examples. Finally, we use the combination of these two methods, which ...
2021 IEEE International Conference on Robotics and Automation (ICRA)
Algorithms and Architectures for Parallel Processing
More and more application workflows are computed in cloud and most of them can be expressed by Di... more More and more application workflows are computed in cloud and most of them can be expressed by Directed Acyclic Graph DAG. As Cloud resource providers, they should guarantee as many as possible DAGs be accomplished within their deadline when they face the overstep request of computer resource. In this paper, we define the urgency of DAG and introduce the MTMD Maximize Throughput of Multi-DAG with Deadline algorithm to improve the ratio of DAGs which can be accomplished within deadline. The urgency of DAG is changing among execution and determine the execution order of tasks. We can detect DAGs which will exceed the deadline by this algorithm and abandon these DAGs timely. Based on the MTMD algorithm, we put forward the CFS Cost Fairness Scheduling algorithm to reduce the unfairness of cost between different DAGs. The simulation results show that the MTMD algorithm outperforms three other algorithms and the CFS algorithm reduces the cost of all DAGs by 12.1i?ź% on average and reduces the unfairness among DAGs by 54.5i?ź% on average.
Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery
Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery
The ICD-9 terminology standardization task aims to standardize the colloquial terminology recorde... more The ICD-9 terminology standardization task aims to standardize the colloquial terminology recorded by doctors in medical records into the standard terminology defined in the ninth version of International Classification of Diseases (ICD-9). In this paper, we first propose a BERT and Text Similarity Based Method (BTSBM) that combines BERT classification model with text similarity calculation algorithm: 1) use the N-gram algorithm to generate a Candidate Standard Terminology Set (CSTS) for each colloquial terminology, which is used as the training dataset and test dataset for next step; 2) use the BERT classification model to classify the correct standard terminology. In this BTSBM method, if a larger-scale CSTS is taken as the test dataset, the training dataset also needs to maintain larger-scale. However, there is only one positive sample in each CSTS. Hence, expanding the scale will cause a serious imbalance in the ratio of positive and negative samples, which will significantly degrade system performance. While if we keep the test dataset relatively small, the CSTS Accuracy (CSTSA) will degrade significantly, which results a very low system performance ceiling. In order to address above problems, we then propose an optimized terminology standardization method, called as Advanced BERT and Text Similarity Based Method (ABTSBM), which 1) uses a large-scale initial CSTS to maintain a high CSTSA to ensure a high system performance ceiling, 2) denoises CSTS based on body structure to alleviate the imbalance of positive and negative samples without reducing the CSTSA, and 3) introduces the focal loss function to further promote a balance of positive and negative samples. Experiments show that, the precision of the ABTSBM method is up to 83.5%, which is 0.6% higher than BTSBM, while the computation cost of ABTSBM is 26.7% lower than BTSBM.
Algorithms and Architectures for Parallel Processing
With the development of cloud computing, the problem of scheduling workflow in cloud system attra... more With the development of cloud computing, the problem of scheduling workflow in cloud system attracts a large amount of attention. In general, the cloud workflow scheduling problem requires to consider a variety of optimization objectives with some constraints. Traditional workflow scheduling methods focus on single optimization goal like makespan and single constraint like deadline or budget. In this paper, we first make a unified formalization of the optimality problem of multi-constraint and multi-objective cloud workflow scheduling using pareto optimality theory. We also present a two-constraint and two-objective case study, considering deadline, budget constraints and energy consumption, reliability objectives. A general list scheduling algorithm and a tuning mechanism are designed to solve this problem. Through extensive experimental, it confirms the efficiency of the unified multi-constraint and multi-objective cloud workflow scheduling system.
Abstract. One of differences between cloud storage and previous storage is that there is a financ... more Abstract. One of differences between cloud storage and previous storage is that there is a financial contract between user and the cloud service provider (CSP). User pay for service in exchange for certain guarantees and the cloud is a liable entity. But some mechanisms need to ensure the liability of CSP. Some work use non-repudiation to realize it. Compared with these non-repudiation schemes, we use third party auditor not client to manage proofs and some metadata, which are security critical data in cloud security. It can provide a more security environment for these data. Against the big overhead in update process of current non-repudiation scheme, we propose three schemes to improve it.
Intelligent Computing Theories and Application
2021 International Conference on Communications, Information System and Computer Engineering (CISCE)
Function as a Service (FaaS) is a new popular cloud computing model in recent years, which has th... more Function as a Service (FaaS) is a new popular cloud computing model in recent years, which has the characteristics of automatic scaling, on-demand billing and easy maintenance. However, the SDKs provided by different public FaaS platforms are inconsistent, increasing the cost of learning and application migration for developers. This paper proposes to build a unified development framework by encapsulating the SDKs of each FaaS platform, which uses the factory pattern of object-oriented design mode to define a set of unified abstract classes. This framework can help developers to operate across different FaaS platforms with a unified set of SDK, and has good scalability.
2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 2018
In this paper, we study the problem of privacy-preserving SAT solving in the cloud computing para... more In this paper, we study the problem of privacy-preserving SAT solving in the cloud computing paradigm. We present a novel Husk formula based CNF obfuscation algorithm and its corresponding solution recovery algorithm, to prevent unauthorized third party in cloud obtaining sensitive information. By obfuscation, the CNF formula is transformed into another formula with over-approximated solution space. Solution of the original CNF can be extracted from the solution of obfuscated CNF by projection based solution recovery algorithm. By customizing the Husk formula, we make solution space of obfuscated CNF adjustable to control the ratio of false solution introduced by Obfuscation, to satisfying performance requiments. Theoretical analysis demonstrates that, with the aid of Cubic Husks formula, obfuscation algorithm can not only change the structure of CNF formula with polynomial time complexity, but also construct an adjustable over-approximated solution space. While solution recovery al...
Open source represents an important way in which today’s software is developed. The adoption of o... more Open source represents an important way in which today’s software is developed. The adoption of open source software continues to accelerate because of the compelling economic and productivity benefits they provide. Therefore, it is important to mitigate potential legal exposure by scanning and analyzing the source of code. However, it is not the only benefit of this kind of work. While the complexity and the size of software composition grow, it is also critical to analyze the source of code to ensure code traceability and manageability, especially for software with tremendous scale of code, such as operating system. In this paper, we argue that it is beneficial for code audit and code quality insurance to distinguish the source of system packages and manage them separately. Then we propose an efficient method for code source classification based on package change log info extraction. And we design and implement KyAnalyzer, which support source analyzing and distinguished managemen...
The performance of MapReduce greatly depends on its data splitting process which happens before t... more The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.
Advanced Data Mining and Applications
2016 5th International Conference on Computer Science and Network Technology (ICCSNT), 2016
Cloud data center is becoming more and more important for tenants because cloud resources and ass... more Cloud data center is becoming more and more important for tenants because cloud resources and associated services can be shared by multi-tenant. Although resource isolation and traffic control, which are effective ways to protect the QoS of tenants, have been used extensively, the topology of the network needs to be further studied extensively. In this paper, combining VN with SDN, a topology-aware VSDN embedding approach, named ToVSDN, is proposed to enhance the topology management. Extensive experimental evaluations have been conducted and the results verify that the proposed ToVSDN has improved embedding efficiency and obtained a higher revenue/cost ratio in comparison of correlative algorithms.
2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 2019
Network I/O workloads are dominating as one of the leading costs for most of the virtualized clou... more Network I/O workloads are dominating as one of the leading costs for most of the virtualized clouds. One way to improve the inter virtual machine (VM) inefficiency is to build shared memory channels between VMs co-located on the same physical node to by-pass traditional TCP/IP network stack, so that the overhead is reduced by shorter communication path and fewer kernel interactions. However, it is a key challenge for existing work to achieve high performance inter-VM communication while keeping the capability of VM live migration, and most of existing work are neither seamlessly agile in the presence of VM live migration nor transparent to upper users as well as to operating system kernels, which limits the application of current co-location aware shared-memory based approaches. In this paper, we present the design and implementation of XenVMC, an adaptive and transparent inter-VM communication system for high performance network I/O in virtualized clouds. With proposed dynamic co-l...
2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC), 2017
Software-defined networking (SDN) has separated control and forwarding planes, which is a powerfu... more Software-defined networking (SDN) has separated control and forwarding planes, which is a powerful approach to enable network virtualization by SDN hypervisors to share physical SDN networks among multiple virtual SDN in cloud data center. However, the centralized control and virtual network shared physical network are delicately weak and easy to failure if virtual topology embedding is not properly designed. This paper develops an enhanced virtual controller node mapping strategy to improve the survivability of virtual SDN. The strategy explicitly considers the network delay and the number of disjoint path between controller and forwarding devices to form survivable factor. Our proposed approach has been evaluated and the results show that the proposed technology has improved the survivability and network delay of virtual SDN while keeping the other within reasonable bounds.
2018 14th International Conference on Semantics, Knowledge and Grids (SKG), 2018
It is generally believed that neural network classifiers are vulnerable to misclassification atta... more It is generally believed that neural network classifiers are vulnerable to misclassification attacks. An adversary generates adversarial samples by adding small perturbations to the original samples. These adversarial samples will mislead classifiers, although they are almost the same as the original samples from the perspective of human observation. However, the existing misclassification attacks need either the details of classifier or a fake classifier to craft the adversarial samples. One may think a black box classifier is robust enough against misclassification attacks. We demonstrate that black box classifier is still vulnerable to our proposed misclassification attack. We conceal the details of classifier. The only thing an adversary can do is to query samples' classification results. We proposed a particle swarm optimization based misclassification attack. Using this attack an adversary can make black box classifiers yield erroneous results. The experiments show that Le...
Open source represents an important way in which today's software is developed. The adoption ... more Open source represents an important way in which today's software is developed. The adoption of open source software continues to accelerate because of the great potential it offers, such as productivity improvement, cost savings and quicker innovation. While the complexity and the size of software composition grow, it becomes difficult to effectively scan and track the code source, especially for software with tremendous scale of code, such as operating systems. So far, existing work on open source components mainly focus on how to mitigate potential license incompliance, to reduce potential security risks introduced by open source vulnerabilities, and to detect and match open source components in the code. To ensure code traceability and manageability for large scale mixed-source operating system, we believe it is beneficial to automatically distinguish sources of the system code in the granularity of software packages and manage them separately. However, according to the lite...
Stereo matching is an important research field of computer vision. Due to the dimension of cost a... more Stereo matching is an important research field of computer vision. Due to the dimension of cost aggregation, current neural network-based stereo methods are difficult to trade-off speed and accuracy. To this end, we integrate fast 2D stereo methods with accurate 3D networks to improve performance and reduce running time. We leverage a 2D encoder-decoder network to generate a rough disparity map and construct a disparity range to guide the 3D aggregation network, which can significantly improve the accuracy and reduce the computational cost. We use a stacked hourglass structure to refine the disparity from coarse to fine. We evaluated our method on three public datasets. According to the KITTI official website results, Our network can generate an accurate result in 80 ms on a modern GPU. Compared to other 2D stereo networks (AANet, DeepPruner, FADNet, etc.), our network has a big improvement in accuracy. Meanwhile, it is significantly faster than other 3D stereo networks (5× than PSM...
Stereo matching has attracted much attention in recent years. Traditional methods can quickly gen... more Stereo matching has attracted much attention in recent years. Traditional methods can quickly generate a disparity result, but the accuracy is low. On the contrary, methods based on neural networks can achieve a high accuracy level, but they are difficult to reach the real-time level. Therefore, this paper presents MCDRNet, which combines traditional methods with neural networks to achieve real-time and accurate stereo matching results. Concretely, our network first generates a rough disparity map based on the traditional ADCensus algorithm. Then we design a novel Multi-Scale Cascade Network to refine the disparity map from coarse to fine. We evaluate our best-trained model on the KITTI official website. The results show that our network is much faster than most current top-performing methods(31×than CSPN, 56×than GANet, etc.). Meanwhile, it is more accurate than traditional stereo methods(SGM, SPS-St) and other fast 2D convolution networks(Fast DS-CS, DispNetC, etc.), demonstrating...
In this paper, we propose a novel white-box attack against word-level CNN text classifier. On the... more In this paper, we propose a novel white-box attack against word-level CNN text classifier. On the one hand, we use an Euclidean distance and cosine distance combined metric to find the most semantically similar substitution when generating perturbations, which can effectively increase the attack success rate. We’ve increased global search success rate from 75.8% to 85.8%. On the other hand, we can control the dispersion of the location of the modified words in the adversarial examples by introducing the coefficient of variation(CV) factor, because greedy search sometimes has poor readability for the modified positions in adversarial examples are close. More dispersed modifications can increase human imperceptibility and text readability. We use the attack success rate to evaluate the validity of the attack method, and use CV value to measure the dispersion degree of the modified words in the generated adversarial examples. Finally, we use the combination of these two methods, which ...
2021 IEEE International Conference on Robotics and Automation (ICRA)
Algorithms and Architectures for Parallel Processing
More and more application workflows are computed in cloud and most of them can be expressed by Di... more More and more application workflows are computed in cloud and most of them can be expressed by Directed Acyclic Graph DAG. As Cloud resource providers, they should guarantee as many as possible DAGs be accomplished within their deadline when they face the overstep request of computer resource. In this paper, we define the urgency of DAG and introduce the MTMD Maximize Throughput of Multi-DAG with Deadline algorithm to improve the ratio of DAGs which can be accomplished within deadline. The urgency of DAG is changing among execution and determine the execution order of tasks. We can detect DAGs which will exceed the deadline by this algorithm and abandon these DAGs timely. Based on the MTMD algorithm, we put forward the CFS Cost Fairness Scheduling algorithm to reduce the unfairness of cost between different DAGs. The simulation results show that the MTMD algorithm outperforms three other algorithms and the CFS algorithm reduces the cost of all DAGs by 12.1i?ź% on average and reduces the unfairness among DAGs by 54.5i?ź% on average.
Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery
Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery
The ICD-9 terminology standardization task aims to standardize the colloquial terminology recorde... more The ICD-9 terminology standardization task aims to standardize the colloquial terminology recorded by doctors in medical records into the standard terminology defined in the ninth version of International Classification of Diseases (ICD-9). In this paper, we first propose a BERT and Text Similarity Based Method (BTSBM) that combines BERT classification model with text similarity calculation algorithm: 1) use the N-gram algorithm to generate a Candidate Standard Terminology Set (CSTS) for each colloquial terminology, which is used as the training dataset and test dataset for next step; 2) use the BERT classification model to classify the correct standard terminology. In this BTSBM method, if a larger-scale CSTS is taken as the test dataset, the training dataset also needs to maintain larger-scale. However, there is only one positive sample in each CSTS. Hence, expanding the scale will cause a serious imbalance in the ratio of positive and negative samples, which will significantly degrade system performance. While if we keep the test dataset relatively small, the CSTS Accuracy (CSTSA) will degrade significantly, which results a very low system performance ceiling. In order to address above problems, we then propose an optimized terminology standardization method, called as Advanced BERT and Text Similarity Based Method (ABTSBM), which 1) uses a large-scale initial CSTS to maintain a high CSTSA to ensure a high system performance ceiling, 2) denoises CSTS based on body structure to alleviate the imbalance of positive and negative samples without reducing the CSTSA, and 3) introduces the focal loss function to further promote a balance of positive and negative samples. Experiments show that, the precision of the ABTSBM method is up to 83.5%, which is 0.6% higher than BTSBM, while the computation cost of ABTSBM is 26.7% lower than BTSBM.
Algorithms and Architectures for Parallel Processing
With the development of cloud computing, the problem of scheduling workflow in cloud system attra... more With the development of cloud computing, the problem of scheduling workflow in cloud system attracts a large amount of attention. In general, the cloud workflow scheduling problem requires to consider a variety of optimization objectives with some constraints. Traditional workflow scheduling methods focus on single optimization goal like makespan and single constraint like deadline or budget. In this paper, we first make a unified formalization of the optimality problem of multi-constraint and multi-objective cloud workflow scheduling using pareto optimality theory. We also present a two-constraint and two-objective case study, considering deadline, budget constraints and energy consumption, reliability objectives. A general list scheduling algorithm and a tuning mechanism are designed to solve this problem. Through extensive experimental, it confirms the efficiency of the unified multi-constraint and multi-objective cloud workflow scheduling system.