Deepa Paranjpe - Academia.edu (original) (raw)

Papers by Deepa Paranjpe

Research paper thumbnail of Generic Text Summarization Using Wordnet for Novelty and Hard

Research paper thumbnail of A Bayesian Technique for Estimating the Credibility of Question Answerers

Proceedings of the 2008 SIAM International Conference on Data Mining, 2008

We address the problem of ranking question answerers according to their credibility, characterize... more We address the problem of ranking question answerers according to their credibility, characterized here by the probability that a given question answerer (user) will be awarded a best answer on a question given the answerer's question-answering history. This probability (represented by θ) is considered to be a hidden variable that can only be estimated statistically from specific observations associated with the user, namely the number b of best answers awarded, associated with the number n of questions answered. The more specific problem addressed is the potentially high degree of uncertainty associated with such credibility estimates when they are based on small numbers of answers. We address this problem by a kind of Bayesian smoothing. The credibility estimate will consist of a mixture of the overall population statistics and those of the specific user. The greater the number of questions asked, the greater will be the contribution of the specific user statistics relative to those of the overall population. We use the Predictive Stochastic Complexity (PSC) as an accuracy measure to evaluate several methods that can be used for the estimation. We compare our technique (Bayesian Smoothing (BS)) with maximum a priori (MAP) estimation, maximum likelihood (ML) estimation and Laplace smoothing.

Research paper thumbnail of Statistical credibility metric for online question answerers

Statistical credibility metric for online question answerers

Research paper thumbnail of Learning document aboutness from implicit user feedback and document structure

Learning document aboutness from implicit user feedback and document structure

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09, 2009

Research paper thumbnail of A structure-sensitive framework for text categorization

A structure-sensitive framework for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05, 2005

Page 1. A Structure-sensitive Framework For Text Categorization Ganesh Ramakrishnan ∗ IBM IRL, Ne... more Page 1. A Structure-sensitive Framework For Text Categorization Ganesh Ramakrishnan ∗ IBM IRL, New Delhi, India ganramkr@in.ibm.com Deepa Paranjpe † Shankar Nagar, Nagpur, India adeepa@cse.iitb.ac.in Byron Dom Yahoo! Inc., Sunnyvale, CA bdom@yahoo-inc.com ...

Research paper thumbnail of Extracting events and event descriptions from Twitter

Extracting events and event descriptions from Twitter

Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011

Page 1. Extracting Events and Event Descriptions from Twitter Ana-Maria Popescu Yahoo! Labs Sunny... more Page 1. Extracting Events and Event Descriptions from Twitter Ana-Maria Popescu Yahoo! Labs Sunnyvale, CA, 94089 amp@yahoo-inc.com Marco Pennacchiotti Yahoo! Labs Sunnyvale, CA, 94089 pennac@yahoo-inc.com Deepa Arun Paranjpe Yahoo! ...

Research paper thumbnail of Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering, 2000

Most existing representative works in semi-supervised clustering do not sufficiently solve the vi... more Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiveness of clustering. In this paper, we propose an adaptive Semi-supervised Clustering Kernel Method based on Metric learning (SCKMM) to mitigate the above problems. Specifically, we first construct an objective function from pairwise constraints to automatically estimate the parameter of the Gaussian kernel. Then, we use pairwise constraint-based K-means approach to solve the violation issue of constraints and to cluster the data. Furthermore, we introduce metric learning into nonlinear semi-supervised clustering to improve separability of the data for clustering. Finally, we perform clustering and metric learning simultaneously. Experimental results on a number of real-world data sets validate the effectiveness of the proposed method.

Research paper thumbnail of Passage scoring for question answering via bayesian inference on lexical relations

Proceedings of the TREC, 2003

Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy prob... more Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy problems in Question Answering (QA), systems coupled with taggers, query classifiers, and answer extractors in complex and ad-hoc ways. We seek to make QA ...

Research paper thumbnail of Generic Text Summarization using Wordnet for

TREC2003, 2003

This paper presents a Random Walk approach to text summarization using the Wordnet for text repre... more This paper presents a Random Walk approach to text summarization using the Wordnet for text representation. For the HARD track, the specified corpus is indexed using a standard indexing engine-lucene and the initial passage set is retrieved by querying the index. The collection of passages is considered to be a document. In Novelty, the documents are as directly supplied by NIST. In either case, the document is used to extract a" relevant" sub-graph from the wordnet graph. Weights are assigned to each node of this sub-graph ...

Research paper thumbnail of Semisupervised Clustering with Metric Learning using Relative Comparisons

Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering, 2008

Semisupervised clustering algorithms partition a given data set using limited supervision from th... more Semisupervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depends on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y

Research paper thumbnail of Generic Text Summarization Using Wordnet for Novelty and Hard

Research paper thumbnail of A Bayesian Technique for Estimating the Credibility of Question Answerers

Proceedings of the 2008 SIAM International Conference on Data Mining, 2008

We address the problem of ranking question answerers according to their credibility, characterize... more We address the problem of ranking question answerers according to their credibility, characterized here by the probability that a given question answerer (user) will be awarded a best answer on a question given the answerer's question-answering history. This probability (represented by θ) is considered to be a hidden variable that can only be estimated statistically from specific observations associated with the user, namely the number b of best answers awarded, associated with the number n of questions answered. The more specific problem addressed is the potentially high degree of uncertainty associated with such credibility estimates when they are based on small numbers of answers. We address this problem by a kind of Bayesian smoothing. The credibility estimate will consist of a mixture of the overall population statistics and those of the specific user. The greater the number of questions asked, the greater will be the contribution of the specific user statistics relative to those of the overall population. We use the Predictive Stochastic Complexity (PSC) as an accuracy measure to evaluate several methods that can be used for the estimation. We compare our technique (Bayesian Smoothing (BS)) with maximum a priori (MAP) estimation, maximum likelihood (ML) estimation and Laplace smoothing.

Research paper thumbnail of Statistical credibility metric for online question answerers

Statistical credibility metric for online question answerers

Research paper thumbnail of Learning document aboutness from implicit user feedback and document structure

Learning document aboutness from implicit user feedback and document structure

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09, 2009

Research paper thumbnail of A structure-sensitive framework for text categorization

A structure-sensitive framework for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05, 2005

Page 1. A Structure-sensitive Framework For Text Categorization Ganesh Ramakrishnan ∗ IBM IRL, Ne... more Page 1. A Structure-sensitive Framework For Text Categorization Ganesh Ramakrishnan ∗ IBM IRL, New Delhi, India ganramkr@in.ibm.com Deepa Paranjpe † Shankar Nagar, Nagpur, India adeepa@cse.iitb.ac.in Byron Dom Yahoo! Inc., Sunnyvale, CA bdom@yahoo-inc.com ...

Research paper thumbnail of Extracting events and event descriptions from Twitter

Extracting events and event descriptions from Twitter

Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011

Page 1. Extracting Events and Event Descriptions from Twitter Ana-Maria Popescu Yahoo! Labs Sunny... more Page 1. Extracting Events and Event Descriptions from Twitter Ana-Maria Popescu Yahoo! Labs Sunnyvale, CA, 94089 amp@yahoo-inc.com Marco Pennacchiotti Yahoo! Labs Sunnyvale, CA, 94089 pennac@yahoo-inc.com Deepa Arun Paranjpe Yahoo! ...

Research paper thumbnail of Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering, 2000

Most existing representative works in semi-supervised clustering do not sufficiently solve the vi... more Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiveness of clustering. In this paper, we propose an adaptive Semi-supervised Clustering Kernel Method based on Metric learning (SCKMM) to mitigate the above problems. Specifically, we first construct an objective function from pairwise constraints to automatically estimate the parameter of the Gaussian kernel. Then, we use pairwise constraint-based K-means approach to solve the violation issue of constraints and to cluster the data. Furthermore, we introduce metric learning into nonlinear semi-supervised clustering to improve separability of the data for clustering. Finally, we perform clustering and metric learning simultaneously. Experimental results on a number of real-world data sets validate the effectiveness of the proposed method.

Research paper thumbnail of Passage scoring for question answering via bayesian inference on lexical relations

Proceedings of the TREC, 2003

Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy prob... more Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy problems in Question Answering (QA), systems coupled with taggers, query classifiers, and answer extractors in complex and ad-hoc ways. We seek to make QA ...

Research paper thumbnail of Generic Text Summarization using Wordnet for

TREC2003, 2003

This paper presents a Random Walk approach to text summarization using the Wordnet for text repre... more This paper presents a Random Walk approach to text summarization using the Wordnet for text representation. For the HARD track, the specified corpus is indexed using a standard indexing engine-lucene and the initial passage set is retrieved by querying the index. The collection of passages is considered to be a document. In Novelty, the documents are as directly supplied by NIST. In either case, the document is used to extract a" relevant" sub-graph from the wordnet graph. Weights are assigned to each node of this sub-graph ...

Research paper thumbnail of Semisupervised Clustering with Metric Learning using Relative Comparisons

Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering, 2008

Semisupervised clustering algorithms partition a given data set using limited supervision from th... more Semisupervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depends on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y