Laks Lakshmanan | University of British Columbia (original) (raw)

Papers by Laks Lakshmanan

Research paper thumbnail of The bang for the buck (fair competitive viral marketing from the host perspective)

Proceedings of the 19th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, Aug 11, 2013

ABSTRACT The key algorithmic problem in viral marketing is to identify a set of influential users... more ABSTRACT The key algorithmic problem in viral marketing is to identify a set of influential users (called seeds) in a social network, who, when convinced to adopt a product, shall influence other users in the network, leading to a large number of adoptions. When two or more players compete with similar products on the same network we talk about competitive viral marketing, which so far has been studied exclusively from the perspective of one of the competing players. In this paper we propose and study the novel problem of competitive viral marketing from the perspective of the host, i.e., the owner of the social network platform. The host sells viral marketing campaigns as a service to its customers, keeping control of the selection of seeds. Each company specifies its budget and the host allocates the seeds accordingly. From the host's perspective, it is important not only to choose the seeds to maximize the collective expected spread, but also to assign seeds to companies so that it guarantees the "bang for the buck" for all companies is nearly identical, which we formalize as the fair seed allocation problem. We propose a new propagation model capturing the competitive nature of viral marketing. Our model is intuitive and retains the desired properties of monotonicity and submodularity. We show that the fair seed allocation problem is NP-hard, and develop an efficient algorithm called Needy Greedy. We run experiments on three real-world social networks, showing that our algorithm is effective and scalable.

Research paper thumbnail of Composite recommendations: from items to packages

Frontiers of Computer Science

Classical recommender systems provide users with a list of recommendations where each recommendat... more Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or DVD. However, several applications can benefit from a system capable of recommending packages of items, in the form of sets. Sample applications include travel planning with a limited budget (price or time) and twitter users wanting to select worthwhile tweeters to follow, given that they can deal with only a bounded number of tweets. In these contexts, there is a need for a system that can recommend the top-k packages for the user to choose from.

Research paper thumbnail of Exploratory mining and pruning optimizations of constrained associations rules

Proceedings of the 1998 Acm Sigmod International Conference, Jun 1, 1998

From the standpoint of supporting human-centered discovery of knowledge, the present-day model of... more From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules su ers from the following serious shortcomings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In e ect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraint-based, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of constraint constructs, including domain, class, and SQL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satis ed by the antecedent and consequent of a mined association. In this paper, we mainly focus on the technical challenges in guaranteeing a level of performance that is commensurate with the selectivities of the constraints in an association query. To this end, we introduce and analyze two properties of constraints that are critical to pruning: anti-monotonicity and succinctness. We then develop characterizations of various constraints into four categories, according to these properties. Finally, we describe a mining algorithm called CAP, which achieves a maximized degree of pruning for all categories of constraints. Experimental results indicate that CAP can run much faster, in some cases as much as 80 times, than several basic algorithms. This demonstrates how important the succinctness and anti-monotonicity properties are, in delivering the performance guarantee.

Research paper thumbnail of A Declarative Semantics for Behavioral Inheritance and Conflict Resolution

Ilps Islp Naclp Slp, Oct 1, 1995

We propose a novel semantics for object-oriented deductive databases in the direction of F-logic ... more We propose a novel semantics for object-oriented deductive databases in the direction of F-logic to logically account for behavioral inheritance, con ict resolution in multiple inheritance hierarchies, and overriding. We introduce the ideas of withdrawal, locality, and inheritability of properties (i.e., methods and signatures). Exploiting these ideas, we develop a declarative semantics of behavioral inheritance and overriding without having to resort to non-monotonic reasoning. Con ict resolution in our model can be achieved both via speci cation and by detection. The possibility of speci cation based con ict resolution through withdrawal allows users to de ne inheritance preference. We present a formal account of the semantics of our language by de ning a model theory, proof theory and a xpoint theory. We also show that the di erent characterizations of our language are equivalent.

Research paper thumbnail of Show Me the Money: Dynamic Recommendations for Revenue Maximization

Recommender Systems (RS) play a vital role in applications such as e-commerce and on-demand conte... more Recommender Systems (RS) play a vital role in applications such as e-commerce and on-demand content streaming. Research on RS has mainly focused on the customer perspective, i.e., accurate prediction of user preferences and maximization of user utilities. As a result, most existing techniques are not explicitly built for revenue maximization, the primary business goal of enterprises. In this work, we explore and exploit a novel connection between RS and the profitability of a business. As recommendations can be seen as an information channel between a business and its customers, it is interesting and important to investigate how to make strategic dynamic recommendations leading to maximum possible revenue. To this end, we propose a novel \model that takes into account a variety of factors including prices, valuations, saturation effects, and competition amongst products. Under this model, we study the problem of finding revenue-maximizing recommendation strategies over a finite time horizon. We show that this problem is NP-hard, but approximation guarantees can be obtained for a slightly relaxed version, by establishing an elegant connection to matroid theory. Given the prohibitively high complexity of the approximation algorithm, we also design intelligent heuristics for the original problem. Finally, we conduct extensive experiments on two real and synthetic datasets and demonstrate the efficiency, scalability, and effectiveness our algorithms, and that they significantly outperform several intuitive baselines.

Research paper thumbnail of The Generalized MDL Approach for Summarization

Proceedings of the 28th International Conference on Very Large Data Bases, Aug 20, 2002

Research paper thumbnail of TopRecs + : Pushing the Envelope on Recommender Systems

Research paper thumbnail of HeteroMF: recommendation in heterogeneous information networks using context dependent factor models

Proceedings of the 22nd International Conference on World Wide Web, May 13, 2013

Research paper thumbnail of Minimization of tree pattern queries

Proceedings of the 2001 Acm Sigmod International Conference, May 1, 2001

Research paper thumbnail of CAST (A Context-Aware Story-Teller for Streaming Social Content)

Proceedings of the 23rd Acm International Conference, Nov 3, 2014

Research paper thumbnail of Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret

Social advertisement is one of the fastest growing sectors in the digital advertisement landscape... more Social advertisement is one of the fastest growing sectors in the digital advertisement landscape: ads in the form of promoted posts are shown in the feed of users of a social networking platform, along with normal social posts; if a user clicks on a promoted post, the host (social network owner) is paid a fixed amount from the advertiser. In this context, allocating ads to users is typically performed by maximizing click-through-rate, i.e., the likelihood that the user will click on the ad. However, this simple strategy fails to leverage the fact the ads can propagate virally through the network, from endorsing users to their followers. In this paper, we study the problem of allocating ads to users through the viral-marketing lens. Advertisers approach the host with a budget in return for the marketing campaign service provided by the host. We show that allocation that takes into account the propensity of ads for viral propagation can achieve significantly better performance. However, uncontrolled virality could be undesirable for the host as it creates room for exploitation by the advertisers: hoping to tap uncontrolled virality, an advertiser might declare a lower budget for its marketing campaign, aiming at the same large outcome with a smaller cost. This creates a challenging trade-off: on the one hand, the host aims at leveraging virality and the network effect to improve advertising efficacy, while on the other hand the host wants to avoid giving away free service due to uncontrolled virality. We formalize this as the problem of ad allocation with minimum regret, which we show is NP-hard and inapproximable w.r.t. any factor. However, we devise an algorithm that provides approximation guarantees w.r.t. the total budget of all advertisers. We develop a scalable version of our approximation algorithm, which we extensively test on four real-world data sets, confirming that our algorithm delivers high quality solutions, is scalable, and significantly outperforms several natural baselines.

Research paper thumbnail of Industry applications of data mining: challenges and opportunities

Proceedings 14th International Conference on Data Engineering, 2000

Research paper thumbnail of Structural query optimization---a uniform framework for semantic query optimization in deductive databases

Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '91, 1991

Page 1. STRUCTURAL QUERY OPTIMIZATION — A UNIFORM FRAMEWORK FOR SEMANTIC QUERY OPTIMIZATION IN DE... more Page 1. STRUCTURAL QUERY OPTIMIZATION — A UNIFORM FRAMEWORK FOR SEMANTIC QUERY OPTIMIZATION IN DEDUCTIVE DATABASES Laks VS Lakshrnanan~l and H&tin J. Hern4ndez+2 t Dept. of Computer Science, Concordia University, Montreal, Quebec ...

Research paper thumbnail of Superfiniteness of query answers in deductive databases: An automata-theoretic approach

Lecture Notes in Computer Science, 1992

ABSTRACT

Research paper thumbnail of Validating Network Value of Influencers by Means of Explanations

2013 IEEE 13th International Conference on Data Mining, 2013

Recently, there has been significant interest in social influence analysis. One of the central pr... more Recently, there has been significant interest in social influence analysis. One of the central problems in this area is the problem of identifying influencers, such that by convincing these users to perform a certain action (like buying a new product), a large number of other users get influenced to follow the action. The client of such an application is essentially a marketer who would target these influencers for marketing a given new product, say by providing free samples or discounts. It is natural that before committing resources for targeting an influencer the marketer would be interested in validating the influence (or network value) of influencers returned. This requires digging deeper into such analytical questions as: who are their followers, on what actions (or products) they are influential, etc. However, the current approaches to identifying influencers largely work as a black box in this respect. The goal of this paper is to open up the black box, address these questions and provide informative and crisp explanations for validating the network value of influencers. We formulate the problem of providing explanations (called PROXI) as a discrete optimization problem of feature selection. We show that PROXI is not only NP-hard to solve exactly, it is NP-hard to approximate within any reasonable factor. Nevertheless, we show interesting properties of the objective function and develop an intuitive greedy heuristic. We perform detailed experimental analysis on two real world datasets-Twitter and Flixster, and show that our approach is useful in generating concise and insightful explanations of the influence distribution of users and that our greedy algorithm is effective and efficient with respect to several baselines.

Research paper thumbnail of Quotient Cube

VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002

Research paper thumbnail of On querying spreadsheets

Proceedings 14th International Conference on Data Engineering, 1998

Considers the problem of querying the data in applications such as spreadsheets and word processo... more Considers the problem of querying the data in applications such as spreadsheets and word processors. This problem has several motivations from the perspective of data integration, interoperability and OLAP. We provide an architecture for realizing interoperability among such diverse applications and address the challenges that arise specifically in the context of querying data stored in spreadsheet applications. A fundamental challenge

Research paper thumbnail of Adding structure to top-k

Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, 2011

Keyword based search interfaces are extremely popular as a means for efficiently discovering item... more Keyword based search interfaces are extremely popular as a means for efficiently discovering items of interest from a huge collection, as evidenced by the success of search engines like Google and Bing. However, most of the current search services still return results as a flat ranked list of items. Considering the huge number of items which can match a query, this list based interface can be very difficult for the user to explore and find important items relevant to their search needs. In this work, we consider a search scenario in which each item is annotated with a set of keywords. E.g., in Web 2.0 enabled systems such as flickr and del.icio.us, it is common for users to tag items with keywords. Based on this annotation information, we can automatically group query result items into different expansions of the query corresponding to subsets of keywords. We formulate and motivate this problem within a top-k query processing framework, but as that of finding the top-k most important expansions. Then we study additional desirable properties for the set of expansions returned, and formulate the problem as an optimization problem of finding the best k expansions satisfying all the desirable properties. We propose several efficient algorithms for this problem. Our problem is similar in spirit to recent works on automatic facets generation, but has the important difference and advantage that we don't need to assume the existence of pre-defined categorical hierarchy which is critical for these works. Through extensive experiments on both real and synthetic datasets, we show our proposed algorithms are both effective and efficient.

Research paper thumbnail of Modeling impression discounting in large-scale recommender systems

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, 2014

ABSTRACT Recommender systems have become very important for many online activities, such as watch... more ABSTRACT Recommender systems have become very important for many online activities, such as watching movies, shopping for products, and connecting with friends on social networks. User behavioral analysis and user feedback (both explicit and implicit) modeling are crucial for the improvement of any online recommender system. Widely adopted recommender systems at LinkedIn such as "People You May Know" and "Endorsements" are evolving by analyzing user behaviors on impressed recommendation items. In this paper, we address modeling impression discounting of recommended items, that is, how to model user's no-action feedback on impressed recommended items. The main contributions of this paper include (1) large-scale analysis of impression data from LinkedIn and KDD Cup; (2) novel anti-noise regression techniques, and its application to learn four different impression discounting functions including linear decay, inverse decay, exponential decay, and quadratic decay; (3) applying these impression discounting functions to LinkedIn's "People You May Know" and "Endorsements" recommender systems.

Research paper thumbnail of On Query Spreadsheets

International Conference on Data Engineering, 1998

Research paper thumbnail of The bang for the buck (fair competitive viral marketing from the host perspective)

Proceedings of the 19th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, Aug 11, 2013

ABSTRACT The key algorithmic problem in viral marketing is to identify a set of influential users... more ABSTRACT The key algorithmic problem in viral marketing is to identify a set of influential users (called seeds) in a social network, who, when convinced to adopt a product, shall influence other users in the network, leading to a large number of adoptions. When two or more players compete with similar products on the same network we talk about competitive viral marketing, which so far has been studied exclusively from the perspective of one of the competing players. In this paper we propose and study the novel problem of competitive viral marketing from the perspective of the host, i.e., the owner of the social network platform. The host sells viral marketing campaigns as a service to its customers, keeping control of the selection of seeds. Each company specifies its budget and the host allocates the seeds accordingly. From the host's perspective, it is important not only to choose the seeds to maximize the collective expected spread, but also to assign seeds to companies so that it guarantees the "bang for the buck" for all companies is nearly identical, which we formalize as the fair seed allocation problem. We propose a new propagation model capturing the competitive nature of viral marketing. Our model is intuitive and retains the desired properties of monotonicity and submodularity. We show that the fair seed allocation problem is NP-hard, and develop an efficient algorithm called Needy Greedy. We run experiments on three real-world social networks, showing that our algorithm is effective and scalable.

Research paper thumbnail of Composite recommendations: from items to packages

Frontiers of Computer Science

Classical recommender systems provide users with a list of recommendations where each recommendat... more Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or DVD. However, several applications can benefit from a system capable of recommending packages of items, in the form of sets. Sample applications include travel planning with a limited budget (price or time) and twitter users wanting to select worthwhile tweeters to follow, given that they can deal with only a bounded number of tweets. In these contexts, there is a need for a system that can recommend the top-k packages for the user to choose from.

Research paper thumbnail of Exploratory mining and pruning optimizations of constrained associations rules

Proceedings of the 1998 Acm Sigmod International Conference, Jun 1, 1998

From the standpoint of supporting human-centered discovery of knowledge, the present-day model of... more From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules su ers from the following serious shortcomings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In e ect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraint-based, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of constraint constructs, including domain, class, and SQL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satis ed by the antecedent and consequent of a mined association. In this paper, we mainly focus on the technical challenges in guaranteeing a level of performance that is commensurate with the selectivities of the constraints in an association query. To this end, we introduce and analyze two properties of constraints that are critical to pruning: anti-monotonicity and succinctness. We then develop characterizations of various constraints into four categories, according to these properties. Finally, we describe a mining algorithm called CAP, which achieves a maximized degree of pruning for all categories of constraints. Experimental results indicate that CAP can run much faster, in some cases as much as 80 times, than several basic algorithms. This demonstrates how important the succinctness and anti-monotonicity properties are, in delivering the performance guarantee.

Research paper thumbnail of A Declarative Semantics for Behavioral Inheritance and Conflict Resolution

Ilps Islp Naclp Slp, Oct 1, 1995

We propose a novel semantics for object-oriented deductive databases in the direction of F-logic ... more We propose a novel semantics for object-oriented deductive databases in the direction of F-logic to logically account for behavioral inheritance, con ict resolution in multiple inheritance hierarchies, and overriding. We introduce the ideas of withdrawal, locality, and inheritability of properties (i.e., methods and signatures). Exploiting these ideas, we develop a declarative semantics of behavioral inheritance and overriding without having to resort to non-monotonic reasoning. Con ict resolution in our model can be achieved both via speci cation and by detection. The possibility of speci cation based con ict resolution through withdrawal allows users to de ne inheritance preference. We present a formal account of the semantics of our language by de ning a model theory, proof theory and a xpoint theory. We also show that the di erent characterizations of our language are equivalent.

Research paper thumbnail of Show Me the Money: Dynamic Recommendations for Revenue Maximization

Recommender Systems (RS) play a vital role in applications such as e-commerce and on-demand conte... more Recommender Systems (RS) play a vital role in applications such as e-commerce and on-demand content streaming. Research on RS has mainly focused on the customer perspective, i.e., accurate prediction of user preferences and maximization of user utilities. As a result, most existing techniques are not explicitly built for revenue maximization, the primary business goal of enterprises. In this work, we explore and exploit a novel connection between RS and the profitability of a business. As recommendations can be seen as an information channel between a business and its customers, it is interesting and important to investigate how to make strategic dynamic recommendations leading to maximum possible revenue. To this end, we propose a novel \model that takes into account a variety of factors including prices, valuations, saturation effects, and competition amongst products. Under this model, we study the problem of finding revenue-maximizing recommendation strategies over a finite time horizon. We show that this problem is NP-hard, but approximation guarantees can be obtained for a slightly relaxed version, by establishing an elegant connection to matroid theory. Given the prohibitively high complexity of the approximation algorithm, we also design intelligent heuristics for the original problem. Finally, we conduct extensive experiments on two real and synthetic datasets and demonstrate the efficiency, scalability, and effectiveness our algorithms, and that they significantly outperform several intuitive baselines.

Research paper thumbnail of The Generalized MDL Approach for Summarization

Proceedings of the 28th International Conference on Very Large Data Bases, Aug 20, 2002

Research paper thumbnail of TopRecs + : Pushing the Envelope on Recommender Systems

Research paper thumbnail of HeteroMF: recommendation in heterogeneous information networks using context dependent factor models

Proceedings of the 22nd International Conference on World Wide Web, May 13, 2013

Research paper thumbnail of Minimization of tree pattern queries

Proceedings of the 2001 Acm Sigmod International Conference, May 1, 2001

Research paper thumbnail of CAST (A Context-Aware Story-Teller for Streaming Social Content)

Proceedings of the 23rd Acm International Conference, Nov 3, 2014

Research paper thumbnail of Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret

Social advertisement is one of the fastest growing sectors in the digital advertisement landscape... more Social advertisement is one of the fastest growing sectors in the digital advertisement landscape: ads in the form of promoted posts are shown in the feed of users of a social networking platform, along with normal social posts; if a user clicks on a promoted post, the host (social network owner) is paid a fixed amount from the advertiser. In this context, allocating ads to users is typically performed by maximizing click-through-rate, i.e., the likelihood that the user will click on the ad. However, this simple strategy fails to leverage the fact the ads can propagate virally through the network, from endorsing users to their followers. In this paper, we study the problem of allocating ads to users through the viral-marketing lens. Advertisers approach the host with a budget in return for the marketing campaign service provided by the host. We show that allocation that takes into account the propensity of ads for viral propagation can achieve significantly better performance. However, uncontrolled virality could be undesirable for the host as it creates room for exploitation by the advertisers: hoping to tap uncontrolled virality, an advertiser might declare a lower budget for its marketing campaign, aiming at the same large outcome with a smaller cost. This creates a challenging trade-off: on the one hand, the host aims at leveraging virality and the network effect to improve advertising efficacy, while on the other hand the host wants to avoid giving away free service due to uncontrolled virality. We formalize this as the problem of ad allocation with minimum regret, which we show is NP-hard and inapproximable w.r.t. any factor. However, we devise an algorithm that provides approximation guarantees w.r.t. the total budget of all advertisers. We develop a scalable version of our approximation algorithm, which we extensively test on four real-world data sets, confirming that our algorithm delivers high quality solutions, is scalable, and significantly outperforms several natural baselines.

Research paper thumbnail of Industry applications of data mining: challenges and opportunities

Proceedings 14th International Conference on Data Engineering, 2000

Research paper thumbnail of Structural query optimization---a uniform framework for semantic query optimization in deductive databases

Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '91, 1991

Page 1. STRUCTURAL QUERY OPTIMIZATION — A UNIFORM FRAMEWORK FOR SEMANTIC QUERY OPTIMIZATION IN DE... more Page 1. STRUCTURAL QUERY OPTIMIZATION — A UNIFORM FRAMEWORK FOR SEMANTIC QUERY OPTIMIZATION IN DEDUCTIVE DATABASES Laks VS Lakshrnanan~l and H&tin J. Hern4ndez+2 t Dept. of Computer Science, Concordia University, Montreal, Quebec ...

Research paper thumbnail of Superfiniteness of query answers in deductive databases: An automata-theoretic approach

Lecture Notes in Computer Science, 1992

ABSTRACT

Research paper thumbnail of Validating Network Value of Influencers by Means of Explanations

2013 IEEE 13th International Conference on Data Mining, 2013

Recently, there has been significant interest in social influence analysis. One of the central pr... more Recently, there has been significant interest in social influence analysis. One of the central problems in this area is the problem of identifying influencers, such that by convincing these users to perform a certain action (like buying a new product), a large number of other users get influenced to follow the action. The client of such an application is essentially a marketer who would target these influencers for marketing a given new product, say by providing free samples or discounts. It is natural that before committing resources for targeting an influencer the marketer would be interested in validating the influence (or network value) of influencers returned. This requires digging deeper into such analytical questions as: who are their followers, on what actions (or products) they are influential, etc. However, the current approaches to identifying influencers largely work as a black box in this respect. The goal of this paper is to open up the black box, address these questions and provide informative and crisp explanations for validating the network value of influencers. We formulate the problem of providing explanations (called PROXI) as a discrete optimization problem of feature selection. We show that PROXI is not only NP-hard to solve exactly, it is NP-hard to approximate within any reasonable factor. Nevertheless, we show interesting properties of the objective function and develop an intuitive greedy heuristic. We perform detailed experimental analysis on two real world datasets-Twitter and Flixster, and show that our approach is useful in generating concise and insightful explanations of the influence distribution of users and that our greedy algorithm is effective and efficient with respect to several baselines.

Research paper thumbnail of Quotient Cube

VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002

Research paper thumbnail of On querying spreadsheets

Proceedings 14th International Conference on Data Engineering, 1998

Considers the problem of querying the data in applications such as spreadsheets and word processo... more Considers the problem of querying the data in applications such as spreadsheets and word processors. This problem has several motivations from the perspective of data integration, interoperability and OLAP. We provide an architecture for realizing interoperability among such diverse applications and address the challenges that arise specifically in the context of querying data stored in spreadsheet applications. A fundamental challenge

Research paper thumbnail of Adding structure to top-k

Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, 2011

Keyword based search interfaces are extremely popular as a means for efficiently discovering item... more Keyword based search interfaces are extremely popular as a means for efficiently discovering items of interest from a huge collection, as evidenced by the success of search engines like Google and Bing. However, most of the current search services still return results as a flat ranked list of items. Considering the huge number of items which can match a query, this list based interface can be very difficult for the user to explore and find important items relevant to their search needs. In this work, we consider a search scenario in which each item is annotated with a set of keywords. E.g., in Web 2.0 enabled systems such as flickr and del.icio.us, it is common for users to tag items with keywords. Based on this annotation information, we can automatically group query result items into different expansions of the query corresponding to subsets of keywords. We formulate and motivate this problem within a top-k query processing framework, but as that of finding the top-k most important expansions. Then we study additional desirable properties for the set of expansions returned, and formulate the problem as an optimization problem of finding the best k expansions satisfying all the desirable properties. We propose several efficient algorithms for this problem. Our problem is similar in spirit to recent works on automatic facets generation, but has the important difference and advantage that we don't need to assume the existence of pre-defined categorical hierarchy which is critical for these works. Through extensive experiments on both real and synthetic datasets, we show our proposed algorithms are both effective and efficient.

Research paper thumbnail of Modeling impression discounting in large-scale recommender systems

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, 2014

ABSTRACT Recommender systems have become very important for many online activities, such as watch... more ABSTRACT Recommender systems have become very important for many online activities, such as watching movies, shopping for products, and connecting with friends on social networks. User behavioral analysis and user feedback (both explicit and implicit) modeling are crucial for the improvement of any online recommender system. Widely adopted recommender systems at LinkedIn such as "People You May Know" and "Endorsements" are evolving by analyzing user behaviors on impressed recommendation items. In this paper, we address modeling impression discounting of recommended items, that is, how to model user's no-action feedback on impressed recommended items. The main contributions of this paper include (1) large-scale analysis of impression data from LinkedIn and KDD Cup; (2) novel anti-noise regression techniques, and its application to learn four different impression discounting functions including linear decay, inverse decay, exponential decay, and quadratic decay; (3) applying these impression discounting functions to LinkedIn's "People You May Know" and "Endorsements" recommender systems.

Research paper thumbnail of On Query Spreadsheets

International Conference on Data Engineering, 1998