Qiang Qu | Peking University (original) (raw)

Papers by Qiang Qu

Database Systems for Advanced Applications

Attributed networks are used to model various networks, such as social networks, knowledge graphs... more Attributed networks are used to model various networks, such as social networks, knowledge graphs, and protein-protein interactions. Such networks are associated with rich attributes such as spatial locations (e.g., check-ins from social network users and positions of proteins). The community search in attributed networks have been intensively studied recently due to its wide applications in recommendation, marketing, biology, etc. In this paper, we study the problem of searching the most cohesive co-located community (\(\textsc {MC}^{3}\)), which returns communities that satisfy the following two properties: (i) structural cohesiveness: members in the community are connected the most intensively; (ii) spatial co-location: members are close to each other. The problem can be used for social network user behavior analysis, recommendation, disease predication etc. We first propose an index structure called \(\textsc {D}k\textsc {Q-tree}\) to integrate the spatial information and the local structure information together to accelerate the query processing. Then, based on this index structure we develop two efficient algorithms. The extensive experiments conducted on both real and synthetic datasets demonstrate the efficiency and effectiveness of the proposed methods.

Blockchain: Research and Applications

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Various networks have rich attributes such as texts (e.g., tweets) and locations (e.g., check-ins... more Various networks have rich attributes such as texts (e.g., tweets) and locations (e.g., check-ins). The community search in such attributed networks have been intensively studied recently due to its wide applications in recommendation, marketing, biology, etc. In this paper, we study the problem of searching the \underlineB est \underlineC o-located \underlineC ommunity (\BCC) in attributed networks, which returns a community that satisfies the following properties: i) structural cohesiveness: members in the community are densely connected, ii) spatial co-location: members are close to each other, and iii) quality optimality: the community has the best quality in terms of given attributes. The problem can be used in social network user behavior analysis, recommendation systems, disease predication, etc. We first propose an index structure called \DTree to integrate the spatial information, the local structure information, and the attribute information together to accelerate the query processing. Then, based on this index we develop an efficient algorithm. The experimental study conducted on both real and synthetic datasets demonstrate the efficiency and effectiveness of the proposed methods.

Frontiers of Computer Science

Privacy preservation is a primary concern in social networks which employ a variety of privacy pr... more Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age, location, education, interests, and others. The task of matching user identities across different social networks is considered a challenging task. In this work, we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data, i.e., user-name and friendship. Thus, we propose a framework, ExpandUIL, that includes three standalone algorithms based on (i) the percolation graph matching in ExpandFullName algorithm, (ii) a supervised machine learning algorithm that works with the graph embedding, and (iii) a combination of the two, ExpandUserLinkage algorithm. The proposed framework as a set of algorithms is significant as, (i) it is based on the network topology and requires only name feature of the nodes, (ii) it requires a considerably low initial seed, as low as one initial seed suffices, (iii) it is iterative and scalable with applicability to online incoming stream graphs, and (iv) it has an experimental proof of stability over a real ground-truth dataset. Experiments on real datasets, Instagram and VK social networks, show upto 75% recall for linked accounts with 96% accuracy using only one given seed pair.

Information Sciences

Abstract Influence Maximization (IM) plays an essential role in various social network applicatio... more Abstract Influence Maximization (IM) plays an essential role in various social network applications. One such application is viral marketing to trigger a large cascade of product adoption from a small number of users by utilizing “Word-of-Mouth” effect in social networks. IM aims to return a set of users that can influence the largest fraction of a network, such as the early user who demonstrates the good features of a product in marketing. The traditional IM algorithms treat all users equally and ignore semantic context associated with the users, though it has been studied previously. To consider the semantics, we introduce a semantics-aware influence maximization (SIM) problem. The SIM problem integrates semantic information of users with influence maximization by measuring influence spread based on semantic values under a given model, and it aims to find a set of users that maximizes the influence spread, shown to be NP-hard. Generalized Reverse Influence Set based framework for SIM problems (GRIS-SIM) is used to solve SIM with different semantics, which provides a ( 1 − 1 / e − e ) -approximation solution for each SIM instance. To our knowledge, the guarantee is state-of-the-art in the IM studies. GRIS-SIM enables auto-generation of sampling strategies for various social networks. In this study, we also present three sampling strategies that can be generated to achieve the best approximation guarantee, and one of the three is proved to be the optimal strategy by having the same performance guarantee within the optimal time. Furthermore, in order to show the generality and effectiveness of the proposed GRIS technique, we apply it into solving other IM problems (e.g., the distance-aware influence maximization, DAIM). Extensive experiments on both real-life and synthetic datasets demonstrate the effectiveness, efficiency, and scalability of our methods. The results on large real data show that GRIS-SIM is able to achieve 58% improvement on average in expected influence compared with rivals, and the method adopting GRIS can achieve 65% improvement on average.

Proceedings of the AAAI Conference on Artificial Intelligence

The recent artificial intelligence studies have witnessed great interest in abstractive text summ... more The recent artificial intelligence studies have witnessed great interest in abstractive text summarization. Although remarkable progress has been made by deep neural network based methods, generating plausible and high-quality abstractive summaries remains a challenging task. The human-like reading strategy is rarely explored in abstractive text summarization, which however is able to improve the effectiveness of the summarization by considering the process of reading comprehension and logical thinking. Motivated by the humanlike reading strategy that follows a hierarchical routine, we propose a novel Hybrid learning model for ive Text Summarization (HATS). The model consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. To verify the effectiveness of HATS, we conduct extensive experiments on two real-life dat...

ISPRS International Journal of Geo-Information

How can training performance data (e.g., running or walking routes) be collected, measured, and p... more How can training performance data (e.g., running or walking routes) be collected, measured, and published in a mobile program while preserving user privacy? This question is becoming important in the context of the growing use of reward-based location-based service (LBS) applications, which aim to promote employee training activities and to share such data with insurance companies in order to reduce the healthcare insurance costs of an organization. One of the main concerns of such applications is the privacy of user trajectories, because the applications normally collect user locations over time with identities. The leak of the identified trajectories often results in personal privacy breaches. For instance, a trajectory would expose user interest in places and behaviors in time by inference and linking attacks. This information can be used for spam advertisements or individual-based assaults. To the best of our knowledge, no existing studies can be directly applied to solve the problem while keeping data utility. In this paper, we identify the personal privacy problem in a reward-based LBS application and propose privacy architecture with a bounded perturbation technique to protect user's trajectory from the privacy breaches. Bounded perturbation uses global location set (GLS) to anonymize the trajectory data. In addition, the bounded perturbation will not generate any visiting points that are not possible to visit in real time. The experimental results on real-world datasets demonstrate that the proposed bounded perturbation can effectively anonymize location information while preserving data utility compared to the existing methods.

PloS one, 2018

Recommender systems are vulnerable to shilling attacks. Forged user-generated content data, such ... more Recommender systems are vulnerable to shilling attacks. Forged user-generated content data, such as user ratings and reviews, are used by attackers to manipulate recommendation rankings. Shilling attack detection in recommender systems is of great significance to maintain the fairness and sustainability of recommender systems. The current studies have problems in terms of the poor universality of algorithms, difficulty in selection of user profile attributes, and lack of an optimization mechanism. In this paper, a shilling behaviour detection structure based on abnormal group user findings and rating time series analysis is proposed. This paper adds to the current understanding in the field by studying the credibility evaluation model in-depth based on the rating prediction model to derive proximity-based predictions. A method for detecting suspicious ratings based on suspicious time windows and target item analysis is proposed. Suspicious rating time segments are determined by cons...

ACM Transactions on Knowledge Discovery from Data

The availability of trajectories tracking the geographical locations of people as a function of t... more The availability of trajectories tracking the geographical locations of people as a function of time offers an opportunity to study human behaviors. In this article, we study rationality from the perspective of user decision on visiting a point of interest (POI) which is represented as a trajectory. However, the analysis of rationality is challenged by a number of issues, for example, how to model a trajectory in terms of complex user decision processes? and how to detect hidden factors that have significant impact on the rational decision making? In this study, we propose Rationality Analysis Model (RAM) to analyze rationality from trajectories in terms of a set of impact factors. In order to automatically identify hidden factors, we propose a method, Collective Hidden Factor Retrieval (CHFR), which can also be generalized to parse multiple trajectories at the same time or parse individual trajectories of different time periods. Extensive experimental study is conducted on three la...

IEEE Transactions on Intelligent Transportation Systems

Metro systems have become one of the most important public transit services in cities. It is impo... more Metro systems have become one of the most important public transit services in cities. It is important to understand individual metro passengers’ spatio-temporal travel patterns. More specifically, for a specific passenger: what are the temporal patterns? what are the spatial patterns? is there any relationship between the temporal and spatial patterns? are the passenger’s travel patterns normal or special? Answering all these questions can help to improve metro services, such as evacuation policy making and marketing. Given a set of massive smart card data over a long period, how to effectively and systematically identify and understand the travel patterns of individual passengers in terms of space and time is a very challenging task. This paper proposes an effective data-mining procedure to better understand the travel patterns of individual metro passengers in Shenzhen, a modern and big city in China. First, we investigate the travel patterns in individual level and devise the method to retrieve them based on raw smart card transaction data, then use statistical-based and unsupervised clustering-based methods, to understand the hidden regularities and anomalies of the travel patterns. From a statistical-based point of view, we look into the passenger travel distribution patterns and find out the abnormal passengers based on the empirical knowledge. From unsupervised clustering point of view, we classify passengers in terms of the similarity of their travel patterns. To interpret the group behaviors, we also employ the bus transaction data. Moreover, the abnormal passengers are detected based on the clustering results. At last, we provide case studies and findings to demonstrate the effectiveness of the proposed scheme.

Transportation Research Part B: Methodological

With the development of information technology, crowdsourcing data from a crowd of cooperative ve... more With the development of information technology, crowdsourcing data from a crowd of cooperative vehicles and online social platforms have been becoming available. The crowdsourcing data, reflecting real-time context of road segments in transportation systems, enable vehicles to be routed adaptively in uncertain and dynamic traffic environments. We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network. To tackle this problem, we first propose a Crowdsourcing Dynamic Congestion Model. The model is based on topic-aware Gaussian Process considering the crowdsourced data collected from social platforms and probing vehicle traces that can effectively characterize both the dynamics and the uncertainty of road conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop efficient algorithms for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the entire transportation system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. Our approach is validated by real-life traffic and geo-tagged social network data from two large cities. Our congestion model is shown to be effective in modeling dynamic congestion conditions. Our routing algorithms also generate significantly faster routes compared to standard baselines, and approximate optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.

2015 IEEE Conference on Computational Intelligence and Games (CIG), 2015

This paper examines how current methods of reputation from applications such as social networks c... more This paper examines how current methods of reputation from applications such as social networks could be applied to single player games. This allows for a more realistic transfer of information between Non-Player Characters (NPCs) allowing them to act within the narrative with expected behaviors. Current information transfers are examined and demonstrate an omnipresent and instantaneous transmission of information between NPCs. Methods from social networks are surveyed and expected challenges based on the application area of games are examined, primarily the restrictions of narrative and time compression.

IEEE Transactions on Big Data, 2015

Data collection is required to be safe and efficient considering both data privacy and system per... more Data collection is required to be safe and efficient considering both data privacy and system performance. In this paper, we study a new problem: distributed data sharing with privacy-preserving requirements. Given a data demander requesting data from multiple distributed data providers, the objective is to enable the data demander to access the distributed data without knowing the privacy of any individual provider. The problem is challenged by two questions: how to transmit the data safely and accurately; and how to efficiently handle data streams? As the first study, we propose a practical method, Shadow Coding, to preserve the privacy in data transmission and ensure the recovery in data collection, which achieves privacy preserving computation in a data-recoverable, efficient, and scalable way. We also provide practical techniques to make Shadow Coding efficient and safe in data streams. Extensive experimental study on a large-scale real-life dataset offers insight into the performance of our schema. The proposed schema is also implemented as a pilot system in a city to collect distributed mobile phone data.

Lecture Notes in Computer Science, 2011

We propose a framework for efficient OLAP on information networks with a focus on the most intere... more We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called "T-OLAP"), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these techniques are weaved. To the best of our knowledge, this is the first work to give a framework study for topological OLAP on information networks. Experimental results demonstrate both the effectiveness and efficiency of our proposed framework.

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009

... [2] ACGillbert, Y.Kotidis, and S.Muthukrishnan. Surfing Wavelets on Streams: One-pass Summar... more ... [2] ACGillbert, Y.Kotidis, and S.Muthukrishnan. Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. VLDB, pages:79-88, 2001. ... In ICDE, Poster, pages:1391-1393, 2008. [16] Anirban Majumder, Rajeev Rastogi, and Sriram Vanama. ...

Database Systems for Advanced Applications, 2011

Proceedings of the VLDB Endowment, Aug 1, 2011

With ever-growing popularity of social networks, web and bio-networks, mining large frequent patt... more With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider-Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1− ϵ. Deviating from the existing edge-by-edge (ie, incremental) pattern-growth framework, ...