naama Zwerdling - Academia.edu (original) (raw)
Papers by naama Zwerdling
Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018
Social media platforms such as blogs, wikis and file sharing have become very popular in enterpri... more Social media platforms such as blogs, wikis and file sharing have become very popular in enterprises. Despite their effectiveness in increasing collaboration in the organization, employees are overloaded with information originating from these many sources and find it hard to orient themselves in the stream of events occurring in their organizational news feed. In this paper we identify what makes an event in an organizational social media platform important to employees. Once important factors of an event to an employee are identified, the stream of events can be personalized and prioritized based on those and thus reduce the overload and assist in work efficiency. Through interviews and two extensive user surveys, the first hypothetical and the second empirical, we identified which factors of an event make it important and compare results from the hypothetical and empirical surveys.
ArXiv, 2020
We present a simple unsupervised approach for answer identification in organizational group chat.... more We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.
While discussing a concrete controversial topic, most humans will find it challenging to swiftly ... more While discussing a concrete controversial topic, most humans will find it challenging to swiftly raise a diverse set of convincing and relevant claims that should set the basis of their arguments. Here, we demonstrate the initial capabilities of a system that, given a controversial topic, can automatically pinpoint relevant claims in Wikipedia, determine their polarity with respect to the given topic, and articulate them per the user's request.
ArXiv, 2019
Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...
Proceedings of the AAAI Conference on Artificial Intelligence, 2020
Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...
Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
Data balancing is a known technique for improving the performance of classification tasks. In thi... more Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-viageneration framework termed BalaGen. Bala-Gen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goaloriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19-23 September 2015, Oslo, Norway, 2015
Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012
Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012
Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10, 2010
Communications in Computer and Information Science, 2012
Proceedings of the third ACM conference on Recommender systems - RecSys '09, 2009
ACM Transactions on Information Systems, 2008
Users tend to store huge amounts of files, of various formats, on their personal computers. As a ... more Users tend to store huge amounts of files, of various formats, on their personal computers. As a result, finding a specific, desired file within the file system is a challenging task. This article addresses the desktop search problem by considering various techniques for ranking results of a search query over the file system. First, basic ranking techniques, which are based on various file features (e.g., file name, access date, file size, etc.), are considered and their effectiveness is empirically analyzed. Next, two learning-based ranking schemes are presented, and are shown to be significantly more effective than the basic ranking methods. Finally, a novel ranking technique, based on query selectiveness, is considered for use during the cold-start period of the system. This method is also shown to be empirically effective, even though it does not involve any learning.
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
This work investigates personalized social search based on the user's social relations-search res... more This work investigates personalized social search based on the user's social relations-search results are re-ranked according to their relations with individuals in the user's social network. We study the effectiveness of several social network types for personalization: (1) Familiarity-based network of people related to the user through explicit familiarity connection; (2) Similarity-based network of people "similar" to the user as reflected by their social activity; (3) Overall network that provides both relationship types. For comparison we also experiment with Topic-based personalization that is based on the user's related terms, aggregated from several social applications. We evaluate the contribution of the different personalization strategies by an off-line study and by a user survey within our organization. In the off-line study we apply bookmark-based evaluation, suggested recently, that exploits data gathered from a social bookmarking system to evaluate personalized retrieval. In the on-line study we analyze the feedback of 240 employees exposed to the alternative personalization approaches. Our main results show that both in the off-line study and in the user survey social network based personalization significantly outperforms non-personalized social search. Additionally, as reflected by the user survey, all three SN-based strategies significantly outperform the Topic-based strategy.
Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018
Social media platforms such as blogs, wikis and file sharing have become very popular in enterpri... more Social media platforms such as blogs, wikis and file sharing have become very popular in enterprises. Despite their effectiveness in increasing collaboration in the organization, employees are overloaded with information originating from these many sources and find it hard to orient themselves in the stream of events occurring in their organizational news feed. In this paper we identify what makes an event in an organizational social media platform important to employees. Once important factors of an event to an employee are identified, the stream of events can be personalized and prioritized based on those and thus reduce the overload and assist in work efficiency. Through interviews and two extensive user surveys, the first hypothetical and the second empirical, we identified which factors of an event make it important and compare results from the hypothetical and empirical surveys.
ArXiv, 2020
We present a simple unsupervised approach for answer identification in organizational group chat.... more We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.
While discussing a concrete controversial topic, most humans will find it challenging to swiftly ... more While discussing a concrete controversial topic, most humans will find it challenging to swiftly raise a diverse set of convincing and relevant claims that should set the basis of their arguments. Here, we demonstrate the initial capabilities of a system that, given a controversial topic, can automatically pinpoint relevant claims in Wikipedia, determine their polarity with respect to the given topic, and articulate them per the user's request.
ArXiv, 2019
Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...
Proceedings of the AAAI Conference on Artificial Intelligence, 2020
Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...
Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
Data balancing is a known technique for improving the performance of classification tasks. In thi... more Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-viageneration framework termed BalaGen. Bala-Gen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goaloriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19-23 September 2015, Oslo, Norway, 2015
Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012
Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012
Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10, 2010
Communications in Computer and Information Science, 2012
Proceedings of the third ACM conference on Recommender systems - RecSys '09, 2009
ACM Transactions on Information Systems, 2008
Users tend to store huge amounts of files, of various formats, on their personal computers. As a ... more Users tend to store huge amounts of files, of various formats, on their personal computers. As a result, finding a specific, desired file within the file system is a challenging task. This article addresses the desktop search problem by considering various techniques for ranking results of a search query over the file system. First, basic ranking techniques, which are based on various file features (e.g., file name, access date, file size, etc.), are considered and their effectiveness is empirically analyzed. Next, two learning-based ranking schemes are presented, and are shown to be significantly more effective than the basic ranking methods. Finally, a novel ranking technique, based on query selectiveness, is considered for use during the cold-start period of the system. This method is also shown to be empirically effective, even though it does not involve any learning.
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
This work investigates personalized social search based on the user's social relations-search res... more This work investigates personalized social search based on the user's social relations-search results are re-ranked according to their relations with individuals in the user's social network. We study the effectiveness of several social network types for personalization: (1) Familiarity-based network of people related to the user through explicit familiarity connection; (2) Similarity-based network of people "similar" to the user as reflected by their social activity; (3) Overall network that provides both relationship types. For comparison we also experiment with Topic-based personalization that is based on the user's related terms, aggregated from several social applications. We evaluate the contribution of the different personalization strategies by an off-line study and by a user survey within our organization. In the off-line study we apply bookmark-based evaluation, suggested recently, that exploits data gathered from a social bookmarking system to evaluate personalized retrieval. In the on-line study we analyze the feedback of 240 employees exposed to the alternative personalization approaches. Our main results show that both in the off-line study and in the user survey social network based personalization significantly outperforms non-personalized social search. Additionally, as reflected by the user survey, all three SN-based strategies significantly outperform the Topic-based strategy.