naama Zwerdling - Academia.edu (original) (raw)

Papers by naama Zwerdling

Research paper thumbnail of Orient Me!

Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018

Social media platforms such as blogs, wikis and file sharing have become very popular in enterpri... more Social media platforms such as blogs, wikis and file sharing have become very popular in enterprises. Despite their effectiveness in increasing collaboration in the organization, employees are overloaded with information originating from these many sources and find it hard to orient themselves in the stream of events occurring in their organizational news feed. In this paper we identify what makes an event in an organizational social media platform important to employees. Once important factors of an event to an employee are identified, the stream of events can be personalized and prioritized based on those and thus reduce the overload and assist in work efficiency. Through interviews and two extensive user surveys, the first hypothetical and the second empirical, we identified which factors of an event make it important and compare results from the hypothetical and empirical surveys.

Research paper thumbnail of Answer Identification in Collaborative Organizational Group Chat

ArXiv, 2020

We present a simple unsupervised approach for answer identification in organizational group chat.... more We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.

Research paper thumbnail of Claims on demand – an initial demonstration of a system for automatic detection and polarity identification of context dependent claims in massive corpora

While discussing a concrete controversial topic, most humans will find it challenging to swiftly ... more While discussing a concrete controversial topic, most humans will find it challenging to swiftly raise a diverse set of convincing and relevant claims that should set the basis of their arguments. Here, we demonstrate the initial capabilities of a system that, given a controversial topic, can automatically pinpoint relevant claims in Wikipedia, determine their polarity with respect to the given topic, and articulate them per the user's request.

Research paper thumbnail of Not Enough Data? Deep Learning to the Rescue!

ArXiv, 2019

Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...

Research paper thumbnail of Do Not Have Enough Data? Deep Learning to the Rescue!

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...

Research paper thumbnail of Balancing via Generation for Multi-Class Text Classification Improvement

Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Data balancing is a known technique for improving the performance of classification tasks. In thi... more Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-viageneration framework termed BalaGen. Bala-Gen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goaloriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.

Research paper thumbnail of What is Your Organization 'Like'?

Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016

Research paper thumbnail of Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009

Research paper thumbnail of Social Media-Based Expertise Evidence

ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19-23 September 2015, Oslo, Norway, 2015

Research paper thumbnail of Entity oriented search and exploration for cultural heritage collections

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

Research paper thumbnail of Towards expressive exploratory search over entity-relationship data

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

Research paper thumbnail of Social media recommendation based on people and tags

Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10, 2010

Research paper thumbnail of CULTURA: A Metadata-Rich Environment to Support the Enhanced Interrogation of Cultural Collections

Communications in Computer and Information Science, 2012

Research paper thumbnail of Personalized recommendation of social software items based on social relations

Proceedings of the third ACM conference on Recommender systems - RecSys '09, 2009

Research paper thumbnail of Method and System for Providing Relationships in Search Results

Research paper thumbnail of Content Analysis Simulator for Improving Site Findability in Information Retrieval Systems

Research paper thumbnail of On ranking techniques for desktop search

ACM Transactions on Information Systems, 2008

Users tend to store huge amounts of files, of various formats, on their personal computers. As a ... more Users tend to store huge amounts of files, of various formats, on their personal computers. As a result, finding a specific, desired file within the file system is a challenging task. This article addresses the desktop search problem by considering various techniques for ranking results of a search query over the file system. First, basic ranking techniques, which are based on various file features (e.g., file name, access date, file size, etc.), are considered and their effectiveness is empirically analyzed. Next, two learning-based ranking schemes are presented, and are shown to be significantly more effective than the basic ranking methods. Finally, a novel ranking technique, based on query selectiveness, is considered for use during the cold-start period of the system. This method is also shown to be empirically effective, even though it does not involve any learning.

Research paper thumbnail of Social networks and discovery in the enterprise (SaND)

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009

Research paper thumbnail of Personalized social search based on the user's social network

This work investigates personalized social search based on the user's social relations-search res... more This work investigates personalized social search based on the user's social relations-search results are re-ranked according to their relations with individuals in the user's social network. We study the effectiveness of several social network types for personalization: (1) Familiarity-based network of people related to the user through explicit familiarity connection; (2) Similarity-based network of people "similar" to the user as reflected by their social activity; (3) Overall network that provides both relationship types. For comparison we also experiment with Topic-based personalization that is based on the user's related terms, aggregated from several social applications. We evaluate the contribution of the different personalization strategies by an off-line study and by a user survey within our organization. In the off-line study we apply bookmark-based evaluation, suggested recently, that exploits data gathered from a social bookmarking system to evaluate personalized retrieval. In the on-line study we analyze the feedback of 240 employees exposed to the alternative personalization approaches. Our main results show that both in the off-line study and in the user survey social network based personalization significantly outperforms non-personalized social search. Additionally, as reflected by the user survey, all three SN-based strategies significantly outperform the Topic-based strategy.

Research paper thumbnail of Orient Me!

Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018

Social media platforms such as blogs, wikis and file sharing have become very popular in enterpri... more Social media platforms such as blogs, wikis and file sharing have become very popular in enterprises. Despite their effectiveness in increasing collaboration in the organization, employees are overloaded with information originating from these many sources and find it hard to orient themselves in the stream of events occurring in their organizational news feed. In this paper we identify what makes an event in an organizational social media platform important to employees. Once important factors of an event to an employee are identified, the stream of events can be personalized and prioritized based on those and thus reduce the overload and assist in work efficiency. Through interviews and two extensive user surveys, the first hypothetical and the second empirical, we identified which factors of an event make it important and compare results from the hypothetical and empirical surveys.

Research paper thumbnail of Answer Identification in Collaborative Organizational Group Chat

ArXiv, 2020

We present a simple unsupervised approach for answer identification in organizational group chat.... more We present a simple unsupervised approach for answer identification in organizational group chat. In recent years, organizational group chat is on the rise enabling asynchronous text-based collaboration between co-workers in different locations and time zones. Finding answers to questions is often critical for work efficiency. However, group chat is characterized by intertwined conversations and 'always on' availability, making it hard for users to pinpoint answers to questions they care about in real-time or search for answers in retrospective. In addition, structural and lexical characteristics differ between chat groups, making it hard to find a 'one model fits all' approach. Our Kernel Density Estimation (KDE) based clustering approach termed Ans-Chat implicitly learns discussion patterns as a means for answer identification, thus eliminating the need to channel-specific tagging. Empirical evaluation shows that this solution outperforms other approached.

Research paper thumbnail of Claims on demand – an initial demonstration of a system for automatic detection and polarity identification of context dependent claims in massive corpora

While discussing a concrete controversial topic, most humans will find it challenging to swiftly ... more While discussing a concrete controversial topic, most humans will find it challenging to swiftly raise a diverse set of convincing and relevant claims that should set the basis of their arguments. Here, we demonstrate the initial capabilities of a system that, given a controversial topic, can automatically pinpoint relevant claims in Wikipedia, determine their polarity with respect to the given topic, and articulate them per the user's request.

Research paper thumbnail of Not Enough Data? Deep Learning to the Rescue!

ArXiv, 2019

Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...

Research paper thumbnail of Do Not Have Enough Data? Deep Learning to the Rescue!

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

Based on recent advances in natural language modeling and those in text generation capabilities, ... more Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically ...

Research paper thumbnail of Balancing via Generation for Multi-Class Text Classification Improvement

Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Data balancing is a known technique for improving the performance of classification tasks. In thi... more Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-viageneration framework termed BalaGen. Bala-Gen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goaloriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.

Research paper thumbnail of What is Your Organization 'Like'?

Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016

Research paper thumbnail of Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009

Research paper thumbnail of Social Media-Based Expertise Evidence

ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19-23 September 2015, Oslo, Norway, 2015

Research paper thumbnail of Entity oriented search and exploration for cultural heritage collections

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

Research paper thumbnail of Towards expressive exploratory search over entity-relationship data

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

Research paper thumbnail of Social media recommendation based on people and tags

Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10, 2010

Research paper thumbnail of CULTURA: A Metadata-Rich Environment to Support the Enhanced Interrogation of Cultural Collections

Communications in Computer and Information Science, 2012

Research paper thumbnail of Personalized recommendation of social software items based on social relations

Proceedings of the third ACM conference on Recommender systems - RecSys '09, 2009

Research paper thumbnail of Method and System for Providing Relationships in Search Results

Research paper thumbnail of Content Analysis Simulator for Improving Site Findability in Information Retrieval Systems

Research paper thumbnail of On ranking techniques for desktop search

ACM Transactions on Information Systems, 2008

Users tend to store huge amounts of files, of various formats, on their personal computers. As a ... more Users tend to store huge amounts of files, of various formats, on their personal computers. As a result, finding a specific, desired file within the file system is a challenging task. This article addresses the desktop search problem by considering various techniques for ranking results of a search query over the file system. First, basic ranking techniques, which are based on various file features (e.g., file name, access date, file size, etc.), are considered and their effectiveness is empirically analyzed. Next, two learning-based ranking schemes are presented, and are shown to be significantly more effective than the basic ranking methods. Finally, a novel ranking technique, based on query selectiveness, is considered for use during the cold-start period of the system. This method is also shown to be empirically effective, even though it does not involve any learning.

Research paper thumbnail of Social networks and discovery in the enterprise (SaND)

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009

Research paper thumbnail of Personalized social search based on the user's social network

This work investigates personalized social search based on the user's social relations-search res... more This work investigates personalized social search based on the user's social relations-search results are re-ranked according to their relations with individuals in the user's social network. We study the effectiveness of several social network types for personalization: (1) Familiarity-based network of people related to the user through explicit familiarity connection; (2) Similarity-based network of people "similar" to the user as reflected by their social activity; (3) Overall network that provides both relationship types. For comparison we also experiment with Topic-based personalization that is based on the user's related terms, aggregated from several social applications. We evaluate the contribution of the different personalization strategies by an off-line study and by a user survey within our organization. In the off-line study we apply bookmark-based evaluation, suggested recently, that exploits data gathered from a social bookmarking system to evaluate personalized retrieval. In the on-line study we analyze the feedback of 240 employees exposed to the alternative personalization approaches. Our main results show that both in the off-line study and in the user survey social network based personalization significantly outperforms non-personalized social search. Additionally, as reflected by the user survey, all three SN-based strategies significantly outperform the Topic-based strategy.