Shuaiqiang Wang - Academia.edu (original) (raw)

Papers by Shuaiqiang Wang

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

As the heart of a search engine, the ranking system plays a crucial role in satisfying users' inf... more As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web-document, prohibit their deployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system; (3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the stateof-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engine's performance.

Proceedings of the 12th International Conference on Web Information Systems and Technologies, 2016

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007

Abstract Since it is too difficult to develop a feasible tool to execute the stepwise refinement ... more Abstract Since it is too difficult to develop a feasible tool to execute the stepwise refinement automatically, the applications of formal methods have mainly been limited to safety critical domains. With the development of the theory and practice of modeling by the integration of UML and formal methods, formal methods usually play a role of representing the behavior models. Thanks to the information provided by the architecture models, such as the concrete data structure, limit conditions, invariants and so on, the automatic refinement tools ...

Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, 2011

With an increasingly amount of information in web forums, quick comprehension of threads in web f... more With an increasingly amount of information in web forums, quick comprehension of threads in web forums has become a challenging research problem. To handle this issue, this paper investigates the task of Web Forum Thread Summarization (WFTS), aiming to give a brief statement of each thread that involving multiple dynamic topics. When applied to the task of WFTS, traditional summarization methods are cramped by topic dependencies, topic drifting and text sparseness. Consequently, we explore an unsupervised topic propagation model in this paper, the Post Propagation Model (PPM), to burst through these problems by simultaneously modeling the semantics and the reply relationship existing in each thread. Each post in PPM is considered as a mixture of topics, and a product of Dirichlet distributions in previous posts is employed to model each topic dependencies during the asynchronous discussion. Based on this model, the task of WFTS is accomplished by extracting most significant sentences in a thread. The experimental results on two different forum data sets show that WFTS based on the PPM outperforms several state-of-the-art summarization methods in terms of ROUGE metrics.

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007

Abstract Model transformation is touted to play a key role in model-driven development. The mappi... more Abstract Model transformation is touted to play a key role in model-driven development. The mapping relations between models are the foundation and basis for the transformation. On the basis of software architecture, this paper tries to provide a precise semantics for both components structuring and models mapping by using category theory. Morphism composition is used to trace the interconnections and mapping relations between component-based models, while consistency between the sorts/operations of component ...

Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10, 2010

One of the central issues in learning to rank for information retrieval is to develop algorithms ... more One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). Several such algorithms including SVM map and AdaRank have been proposed and their effectiveness has been verified. However, the relationships between the algorithms are not clear, and furthermore no comparisons have been conducted between them. In this paper, we conduct a study on the approach of directly optimizing evaluation measures in learning to rank for Information Retrieval (IR). We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study and analyze the existing algorithms of SVM map and AdaRank within the framework. The framework is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we show that we can derive new algorithms on the basis of this analysis and create one example algorithm called PermuRank. We have also conducted comparisons between SVM map , AdaRank, Per-muRank, and conventional methods of Ranking SVM and Rank-Boost, using benchmark datasets. Experimental results show that the methods based on direct optimization of evaluation measures can always outperform conventional methods of Ranking SVM and RankBoost. However, no significant difference exists among the performances of the direct optimization methods themselves.

2009 International Conference on Communication Software and Networks, 2009

User interface modeling methods based on design pattern and code generation methods are hot point... more User interface modeling methods based on design pattern and code generation methods are hot points in software engineering research field in recent years. However, there is no unified standard description until now. As a result, it leads to the great insufficiency in using nested component of multiple interface design patterns to construct complex user interface; meanwhile it cannot meet the need of displaying the content of a hierarchical structure in the limited area. In order to enhance the ability of the interface design pattern to support complex interface description, a complex interface modeling method is proposed in this paper. Based on abstracting the interface basic elements, standardized description of interface design pattern is realized; complex interface modeling and target code automatic generation are finally realized by customizing interface basic elements. Application research shows that this method can greatly support complex interface design and realization, and enhance the efficiency of user interface development.

Lecture Notes in Computer Science, 2010

Web spam techniques enable some web pages or sites to achieve undeserved relevance and importance... more Web spam techniques enable some web pages or sites to achieve undeserved relevance and importance. They can seriously deteriorate search engine ranking results. Combating web spam has become one of the top challenges for web search. This paper proposes to learn a discriminating function to detect web spam by genetic programming. The evolution computation uses multi-populations composed of some small-scale individuals and combines the selected best individuals in every population to gain a possible best ...

Information Retrieval Journal, 2015

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12, 2012

Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importanc... more Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importance of each instance for correcting the sampling bias of the probability distributions between training and test datasets. However, the weighting scheme of IWAL involves the distribution of the test data, which can be straightforwardly estimated in active learning by interactively querying users for labels of selected test instances, but difficult for conventional learning where there are no interactions with users, referred as passive learning. In this paper, we investigate the insufficient sampling bias problem, i.e., bias occurs only because of insufficient samples, but the sampling process is unbiased. In doing this, we present two assumptions on the sampling bias, based on which we propose a practical weighting scheme for the empirical loss function in conventional passive learning, and present IWPL, an importance weighted passive learning framework. Furthermore, we provide IWSVM, an importance weighted SVM for validation. Extensive experiments demonstrate significant advantages of IWSVM on benchmarks and synthetic datasets.

2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems, 2015

International Conference on Next Generation Web Services Practices, 2006

Abstract It is necessary to guarantee the validity of Web services and their composition. Traditi... more Abstract It is necessary to guarantee the validity of Web services and their composition. Traditional approaches transform the BPEL4WS specification into other formal models and then check them. Unfortunately, if we could not find the proper composition of services to fulfil the request but have to develop parts of services ourselves, the models used to verify are almost useless in other development steps. The B-method is a state model-based, formal specification notation that has strong structuring mechanisms and good tool ...

IEEE Transactions on Knowledge and Data Engineering, 2015

We propose CCRank, the first parallel framework for evolutionary algorithms (EA) based learning t... more We propose CCRank, the first parallel framework for evolutionary algorithms (EA) based learning to rank, aiming to significantly improve learning efficiency while maintain accuracy. CCRank is based on cooperative coevolution (CC), a divide-andconquer framework that has demonstrated high promise in function optimization for problems with large search space and complex structures. Moreover, CC naturally allows parallelization of sub-solutions to the decomposed sub-problems, which can substantially boost learning efficiency. With CCRank, we investigate parallel CC in the context of learning to rank. We implement CCRank with three EA-based learning to rank algorithms for demonstration. Extensive experiments on benchmarks in comparison with the state-of-the-art algorithms show the performance gains of CCRank in efficiency and accuracy.

Recently, ranking-oriented collaborative filtering (CF) algorithms have achieved great success in... more Recently, ranking-oriented collaborative filtering (CF) algorithms have achieved great success in recommender systems. They obtained state-of-the-art performances by estimating a preference ranking of items for each user rather than estimating the absolute ratings on unrated items (as conventional rating-oriented CF algorithms do). In this paper, we propose a new ranking-oriented CF algorithm, called ListCF. Following the memory-based CF framework, ListCF directly predicts a total order of items for each user based on similar users' probability distributions over permutations of the items, and thus differs from previous ranking-oriented memory-based CF algorithms that focus on predicting the pairwise preferences between items. One important advantage of ListCF lies in its ability of reducing the computational complexity of the training and prediction procedures while achieving the same or better ranking performances as compared to previous ranking-oriented memory-based CF algorithms. Extensive experiments on three benchmark datasets against several state-of-the-art baselines demonstrate the effectiveness of our proposal.

Journal of the Association for Information Science and Technology

Query classification is an important part of exploring the characteristics of web queries. Existi... more Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.

ACM Transactions on Intelligent Systems and Technology

Collaborative filtering (CF) is an effective technique addressing the information overload proble... more Collaborative filtering (CF) is an effective technique addressing the information overload problem. CF approaches generally fall into two categories: rating based and ranking based. The former makes recommendations based on historical rating scores of items and the latter based on their rankings. Ranking-based CF has demonstrated advantages in recommendation accuracy, being able to capture the preference similarity between users even if their rating scores differ significantly. In this study, we propose VSRank, a novel framework that seeks accuracy improvement of ranking-based CF through adaptation of the vector space model. In VSRank, we consider each user as a document and his or her pairwise relative preferences as terms. We then use a novel degree-specialty weighting scheme resembling TF-IDF to weight the terms. Extensive experiments on benchmarks in comparison with the state-of-the-art approaches demonstrate the promise of our approach.

Journal of the Association for Information Science and Technology, 2014

Automatic image annotation plays a critical role in modern keyword-based image retrieval systems.... more Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor-based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this article, we propose a novel approach for image annotation, which simultaneously improves both phases of the nearest-neighbor-based scheme. In the phase of neighbor search, different from existing work discovering the nearest neighbors with the predicted distance, we introduce a ranking-oriented neighbor search mechanism (RNSM), where the ordering of labeled images is optimized directly without going through the intermediate step of distance prediction. In the phase of keyword propagation, different from existing work using simple heuristic rules to select the propagated keywords, we present a learning-based keyword propagation strategy (LKPS), where a scoring function is learned to evaluate the relevance of keywords based on their multiple relations with the nearest neighbors. Extensive experiments on the Corel 5K data set and the MIR Flickr data set demonstrate the effectiveness of our approach.

Journal of the Association for Information Science and Technology, 2015

ACM Transactions on Intelligent Systems and Technology, 2014

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Proceedings of the 12th International Conference on Web Information Systems and Technologies, 2016

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007

Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, 2011

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007

Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10, 2010

2009 International Conference on Communication Software and Networks, 2009

Lecture Notes in Computer Science, 2010

Information Retrieval Journal, 2015

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12, 2012

2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems, 2015

International Conference on Next Generation Web Services Practices, 2006

IEEE Transactions on Knowledge and Data Engineering, 2015

Journal of the Association for Information Science and Technology

ACM Transactions on Intelligent Systems and Technology

Journal of the Association for Information Science and Technology, 2014

Journal of the Association for Information Science and Technology, 2015

ACM Transactions on Intelligent Systems and Technology, 2014