Noriko Tomuro | DePaul University (original) (raw)

Papers by Noriko Tomuro

Research paper thumbnail of Identifying the optimal segmentors for mass classification in mammograms

Proceedings of SPIE, Mar 20, 2015

In this paper, we present the results of our investigation on identifying the optimal segmentor(s... more In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.

Research paper thumbnail of Building multiple weak segmentors for strong mass segmentation in mammogram

Proceedings of SPIE, Mar 3, 2011

This paper proposes to build multiple segmentations for identifying mass contours for a suspiciou... more This paper proposes to build multiple segmentations for identifying mass contours for a suspicious mass in a mammogram. In this study, by using various parameter settings of the image enhancement functions, we perform multiple segmentations for each suspicious mass (region of interest (ROI)), and multiple mass contours are generated. Each of such segmentations is called a "weak segmentor", since there is no single image enhancement which produces the optimal segmentation for all mass images. Then for each image, we select the contour which has the highest overlapping ratio as the final segmentation (i.e., the "strong segmentor"). The results show that the overall success rate (81.22%) of the strong segmentor was higher than that of any single weak segmentor. This indicates that using multiple weak segmentors is an effective method to generate a strong mass segmentation for mammograms.

Research paper thumbnail of Objective

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Jun 29, 2021

Objective is workshop aims at promoting and exploring the possibilities for research and practic... more Objective is workshop aims at promoting and exploring the possibilities for research and practical applications involving natural language processing (NLP) and games. e main objective is to provide a forum for researchers and practitioners to discuss and share ideas regarding how the NLP research community can contribute to games research and vice versa. For example, games could benefit from NLP's sophisticated human language technologies in designing natural and engaging dialogues to bring novel game experiences, or in processing texts to conduct formal game studies. Conversely, NLP could benefit from games in obtaining language resources (such as construction of a thesaurus through a crowdsourcing game), or in learning the linguistic characteristics of game users as compared to those of other domains. e workshop welcomes the participation of both academics and industry practitioners interested in the use of NLP in games or vice versa.

Research paper thumbnail of Organizers

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Jun 29, 2021

Research paper thumbnail of Objective

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

This workshop aims at promoting and exploring the possibilities for research and practical applic... more This workshop aims at promoting and exploring the possibilities for research and practical applications involving natural language processing (NLP) and games.

Research paper thumbnail of Squibs and Discussions: Nonminirnal Derivations in Unification-based Parsing

Computational Linguistics, 2001

Shieber's abstract parsing algorithm (Shieber 1992)for unification grammars is an extension o... more Shieber's abstract parsing algorithm (Shieber 1992)for unification grammars is an extension of Earley's algorithm (Earley 1970)for context-free grammars to feature structures. In this paper, we show that, under certain conditions, Shieber ' s algorithm produces what we call a nonminimal derivation: a parse tree which contains additional features that are not in the licensing productions. While Shieber's definition of parse tree allows for such nonminimal derivations, we claim that they should be viewed as invalid. We describe the sources of the nonminimal derivation problem, and propose a precise definition of minimal parse tree, as well as a modification to Shieber's algorithm which ensures minimality, although at some computational cost.

Research paper thumbnail of Sentiment Analysis with Cognitive Attention Supervision

Proceedings of the Canadian Conference on Artificial Intelligence, 2021

Neural network-based language models such as BERT (Bidirectional Encoder Representations from Tra... more Neural network-based language models such as BERT (Bidirectional Encoder Representations from Transformers) use attention mechanisms to create contextualized representations of inputs, conceptually analogous to humans reading words in context. For the task of classifying the sentiment of texts, we ask whether BERT's attention can be informed by human cognitive data. During training, we supervise attention with eye-tracking and/or brain imaging data and combine binary sentiment classification loss with these attention losses. We find that attention supervision can be used to manipulate BERT attention to be more similar to the ground truth human data, but that there are no significant differences in sentiment classification accuracy. However, models with cognitive attention supervision more frequently misclassify different samples from the baseline models-they more often make different errors-and the errors from models with supervised attention have a higher ratio of false negatives.

Research paper thumbnail of Use of a Large Image Repository to Enhance Domain Dataset for Flyer Classification

Lecture Notes in Computer Science, 2015

This paper describes our exploratory work on supplementing our dataset of images extracted from r... more This paper describes our exploratory work on supplementing our dataset of images extracted from real estate flyers with images from a large general image repository to enhance the breadth of the samples and create a classification model which would perform well for totally unseen, new instances. We selected some images from the Scene UNderstanding (SUN) database which are annotated with the scene categories that seem to match with our flyer images, and added them to our flyer dataset. We ran a series of experiments with various configurations of flyer vs. SUN data mix. The results showed that the classification models trained with a mixture of SUN and flyer images produced comparable accuracies as the models trained solely with flyer images. This suggests that we were able to create a model which is scalable to unseen, new data without sacrificing the accuracy of the data at hand.

Research paper thumbnail of Genre-based image classification using ensemble learning for online flyers

Seventh International Conference on Digital Image Processing (ICDIP 2015), 2015

This paper presents an image classification model developed to classify images embedded in commer... more This paper presents an image classification model developed to classify images embedded in commercial real estate flyers. It is a component in a larger, multimodal system which uses texts as well as images in the flyers to automatically classify them by the property types. The role of the image classifier in the system is to provide the genres of the embedded images (map, schematic drawing, aerial photo, etc.), which to be combined with the texts in the flyer to do the overall classification. In this work, we used an ensemble learning approach and developed a model where the outputs of an ensemble of support vector machines (SVMs) are combined by a k-nearest neighbor (KNN) classifier. In this model, the classifiers in the ensemble are strong classifiers, each of which is trained to predict a given/assigned genre. Not only is our model intuitive by taking advantage of the mutual distinctness of the image genres, it is also scalable. We tested the model using over 3000 images extracted from online real estate flyers. The result showed that our model outperformed the baseline classifiers by a large margin.

Research paper thumbnail of Identifying the optimal segmentors for mass classification in mammograms

SPIE Proceedings, 2015

In this paper, we present the results of our investigation on identifying the optimal segmentor(s... more In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.

Research paper thumbnail of An Approach to Modeling Facial Expressions Used in American Sign Language

American Sign Language (ASL) is the natural and living language of the Deaf Community in North Am... more American Sign Language (ASL) is the natural and living language of the Deaf Community in North America. In addition to hand gestures, facial expressions are a key component of communicating in ASL. We present a method for reproducing facial expressions through computer graphic animation. Directions for further research are suggested. Keywords: Animation, American Sign Language, Facial Expressions

Research paper thumbnail of Articles Question Answering from Frequently Asked Question Files Experiences with the FAQ FINDER System

■ This article describes FAQ FINDER, a natural language question-answering system that uses files... more ■ This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FIND-ER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FIND-ER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system’s performance and show that a combination of semantic and statistical techniques works better than any single approach.

Research paper thumbnail of Automatic Summarization of Privacy Policies using Ensemble Learning

Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, 2016

When customers purchase a product or sign up for service from a company, they often are required ... more When customers purchase a product or sign up for service from a company, they often are required to agree to a Privacy Policy or Terms of Service agreement. Many of these policies are lengthy, and a typical customer agrees to them without reading them carefully if at all. To address this problem, we have developed a prototype automatic text summarization system which is specifically designed for privacy policies. Our system generates a summary of a policy statement by identifying important sentences from the statement, categorizing these sentences by which of 5 "statement categories" the sentence addresses, and displaying to a user a list of the sentences which match each category. Our system incorporates keywords identified by a human domain expert and rules that were obtained by machine learning, and they are combined in an ensemble architecture. We have tested our system on a sample corpus of privacy statements, and preliminary results are promising.

Research paper thumbnail of Relation Classification with Cognitive Attention Supervision

Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 2021

Many current language models such as BERT utilize attention mechanisms to transform sequence repr... more Many current language models such as BERT utilize attention mechanisms to transform sequence representations. We ask whether we can influence BERT's attention with human reading patterns by using eye-tracking and brain imaging data. We fine-tune BERT for relation extraction with auxiliary attention supervision in which BERT's attention weights are supervised by cognitive data. Through a variety of metrics we find that this attention supervision can be used to increase similarity between model attention distributions over sequences and the cognitive data without significantly affecting classification performance while making unique errors from the baseline. In particular, models with cognitive attention supervision more often correctly classified samples misclassified by the baseline.

Research paper thumbnail of Reports of the Workshops Held at the Tenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine

The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina S... more The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina State University in Raleigh, North Carolina. The workshop program included five workshops covering a wide range of topics. The titles of the workshops held Friday were Games and Natural Language Processing, and Artificial Intelligence in Adversarial Real-Time Games. The titles of the workshops held Saturday were Diversity in Games Research, Experimental Artificial Intelligence in Games, and Musical Metacreation. This article presents short summaries of those events.

Research paper thumbnail of Efficient Lazy Unification

ABSTRACT Parsing with unification grammars is inefficient due to the intractable complexity of th... more ABSTRACT Parsing with unification grammars is inefficient due to the intractable complexity of the algorithm. In implemented systems, performance suffers even more by additional overhead of processing large feature-value structures (FSs) as the base data structure. In particular, copying and unification of FSs has been identified to be the most expensive operation.

Research paper thumbnail of Domain adaptation of coreference resolution for radiology reports

Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, Jun 8, 2012

In this paper we explore the applicability of existing coreference resolution systems to a biomed... more In this paper we explore the applicability of existing coreference resolution systems to a biomedical genre: radiology reports. Analysis revealed that, due to the idiosyncrasies of the domain, both the formulation of the problem of coreference resolution and its solution need significant domain adaptation work. We reformulated the task and developed an unsupervised algorithm based on heuristics for coreference resolution in radiology reports. The algorithm is shown to perform well on a test dataset of 150 manually annotated radiology reports.

Research paper thumbnail of The Use of Question Types to Match Questions in FAQFinder

Pieee, 2002

One useful way to find the answer to a question is to search a library of previously-answered que... more One useful way to find the answer to a question is to search a library of previously-answered questions. This is the idea behind FAQFinder, a Web-based natural language questionanswering system which uses Frequently Asked Questions (FAQ) files to answer users' questions. FAQFinder tries to answer a user's question by retrieving a similar FAQ question, if one exists, and its answer. FAQFinder uses several metrics to judge the similarity of user and FAQ questions. In this paper, we discuss a metric based on question type, which we recently added to the system. We discuss the taxonomy of question types used, and present experimental results which indicate that the incorporation of question type information has substantially improved FAQFinder's performance. http://faqfinder.ics.uci.edu.

Research paper thumbnail of Ques-tion answering from Frequently-Asked Question Files

Aim, 1997

This article describes FAQ FINDER, a natural language question-answering system that uses files o... more This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FIND-ER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FIND-ER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system's performance and show that a combination of semantic and statistical techniques works better than any single approach. Articles

Research paper thumbnail of Investigation on Feature Selection to Improve Classification of Abdominal Organs in CT Images

This paper presents the preliminary result on feature selection for the purpose of classifying so... more This paper presents the preliminary result on feature selection for the purpose of classifying soft tissues of abdominal organs in computer tomography (CT) images. From the images in the dataset, texture features were first extracted, and the most relevant features were identified based on the Information Gain measure. Then a Decision Tree classifier was used to select the optimal subset of features. The initial experiments indicated that, by removing the combinations of the descriptors and distances which have the lowest Information Gain, as much as 83% of the original features were removed without sacrificing the classification accuracy at all, for the overall dataset or any individual organ, or even improving it significantly for some organs.

Research paper thumbnail of Identifying the optimal segmentors for mass classification in mammograms

Proceedings of SPIE, Mar 20, 2015

In this paper, we present the results of our investigation on identifying the optimal segmentor(s... more In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.

Research paper thumbnail of Building multiple weak segmentors for strong mass segmentation in mammogram

Proceedings of SPIE, Mar 3, 2011

This paper proposes to build multiple segmentations for identifying mass contours for a suspiciou... more This paper proposes to build multiple segmentations for identifying mass contours for a suspicious mass in a mammogram. In this study, by using various parameter settings of the image enhancement functions, we perform multiple segmentations for each suspicious mass (region of interest (ROI)), and multiple mass contours are generated. Each of such segmentations is called a "weak segmentor", since there is no single image enhancement which produces the optimal segmentation for all mass images. Then for each image, we select the contour which has the highest overlapping ratio as the final segmentation (i.e., the "strong segmentor"). The results show that the overall success rate (81.22%) of the strong segmentor was higher than that of any single weak segmentor. This indicates that using multiple weak segmentors is an effective method to generate a strong mass segmentation for mammograms.

Research paper thumbnail of Objective

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Jun 29, 2021

Objective is workshop aims at promoting and exploring the possibilities for research and practic... more Objective is workshop aims at promoting and exploring the possibilities for research and practical applications involving natural language processing (NLP) and games. e main objective is to provide a forum for researchers and practitioners to discuss and share ideas regarding how the NLP research community can contribute to games research and vice versa. For example, games could benefit from NLP's sophisticated human language technologies in designing natural and engaging dialogues to bring novel game experiences, or in processing texts to conduct formal game studies. Conversely, NLP could benefit from games in obtaining language resources (such as construction of a thesaurus through a crowdsourcing game), or in learning the linguistic characteristics of game users as compared to those of other domains. e workshop welcomes the participation of both academics and industry practitioners interested in the use of NLP in games or vice versa.

Research paper thumbnail of Organizers

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Jun 29, 2021

Research paper thumbnail of Objective

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

This workshop aims at promoting and exploring the possibilities for research and practical applic... more This workshop aims at promoting and exploring the possibilities for research and practical applications involving natural language processing (NLP) and games.

Research paper thumbnail of Squibs and Discussions: Nonminirnal Derivations in Unification-based Parsing

Computational Linguistics, 2001

Shieber's abstract parsing algorithm (Shieber 1992)for unification grammars is an extension o... more Shieber's abstract parsing algorithm (Shieber 1992)for unification grammars is an extension of Earley's algorithm (Earley 1970)for context-free grammars to feature structures. In this paper, we show that, under certain conditions, Shieber ' s algorithm produces what we call a nonminimal derivation: a parse tree which contains additional features that are not in the licensing productions. While Shieber's definition of parse tree allows for such nonminimal derivations, we claim that they should be viewed as invalid. We describe the sources of the nonminimal derivation problem, and propose a precise definition of minimal parse tree, as well as a modification to Shieber's algorithm which ensures minimality, although at some computational cost.

Research paper thumbnail of Sentiment Analysis with Cognitive Attention Supervision

Proceedings of the Canadian Conference on Artificial Intelligence, 2021

Neural network-based language models such as BERT (Bidirectional Encoder Representations from Tra... more Neural network-based language models such as BERT (Bidirectional Encoder Representations from Transformers) use attention mechanisms to create contextualized representations of inputs, conceptually analogous to humans reading words in context. For the task of classifying the sentiment of texts, we ask whether BERT's attention can be informed by human cognitive data. During training, we supervise attention with eye-tracking and/or brain imaging data and combine binary sentiment classification loss with these attention losses. We find that attention supervision can be used to manipulate BERT attention to be more similar to the ground truth human data, but that there are no significant differences in sentiment classification accuracy. However, models with cognitive attention supervision more frequently misclassify different samples from the baseline models-they more often make different errors-and the errors from models with supervised attention have a higher ratio of false negatives.

Research paper thumbnail of Use of a Large Image Repository to Enhance Domain Dataset for Flyer Classification

Lecture Notes in Computer Science, 2015

This paper describes our exploratory work on supplementing our dataset of images extracted from r... more This paper describes our exploratory work on supplementing our dataset of images extracted from real estate flyers with images from a large general image repository to enhance the breadth of the samples and create a classification model which would perform well for totally unseen, new instances. We selected some images from the Scene UNderstanding (SUN) database which are annotated with the scene categories that seem to match with our flyer images, and added them to our flyer dataset. We ran a series of experiments with various configurations of flyer vs. SUN data mix. The results showed that the classification models trained with a mixture of SUN and flyer images produced comparable accuracies as the models trained solely with flyer images. This suggests that we were able to create a model which is scalable to unseen, new data without sacrificing the accuracy of the data at hand.

Research paper thumbnail of Genre-based image classification using ensemble learning for online flyers

Seventh International Conference on Digital Image Processing (ICDIP 2015), 2015

This paper presents an image classification model developed to classify images embedded in commer... more This paper presents an image classification model developed to classify images embedded in commercial real estate flyers. It is a component in a larger, multimodal system which uses texts as well as images in the flyers to automatically classify them by the property types. The role of the image classifier in the system is to provide the genres of the embedded images (map, schematic drawing, aerial photo, etc.), which to be combined with the texts in the flyer to do the overall classification. In this work, we used an ensemble learning approach and developed a model where the outputs of an ensemble of support vector machines (SVMs) are combined by a k-nearest neighbor (KNN) classifier. In this model, the classifiers in the ensemble are strong classifiers, each of which is trained to predict a given/assigned genre. Not only is our model intuitive by taking advantage of the mutual distinctness of the image genres, it is also scalable. We tested the model using over 3000 images extracted from online real estate flyers. The result showed that our model outperformed the baseline classifiers by a large margin.

Research paper thumbnail of Identifying the optimal segmentors for mass classification in mammograms

SPIE Proceedings, 2015

In this paper, we present the results of our investigation on identifying the optimal segmentor(s... more In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.

Research paper thumbnail of An Approach to Modeling Facial Expressions Used in American Sign Language

American Sign Language (ASL) is the natural and living language of the Deaf Community in North Am... more American Sign Language (ASL) is the natural and living language of the Deaf Community in North America. In addition to hand gestures, facial expressions are a key component of communicating in ASL. We present a method for reproducing facial expressions through computer graphic animation. Directions for further research are suggested. Keywords: Animation, American Sign Language, Facial Expressions

Research paper thumbnail of Articles Question Answering from Frequently Asked Question Files Experiences with the FAQ FINDER System

■ This article describes FAQ FINDER, a natural language question-answering system that uses files... more ■ This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FIND-ER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FIND-ER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system’s performance and show that a combination of semantic and statistical techniques works better than any single approach.

Research paper thumbnail of Automatic Summarization of Privacy Policies using Ensemble Learning

Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, 2016

When customers purchase a product or sign up for service from a company, they often are required ... more When customers purchase a product or sign up for service from a company, they often are required to agree to a Privacy Policy or Terms of Service agreement. Many of these policies are lengthy, and a typical customer agrees to them without reading them carefully if at all. To address this problem, we have developed a prototype automatic text summarization system which is specifically designed for privacy policies. Our system generates a summary of a policy statement by identifying important sentences from the statement, categorizing these sentences by which of 5 "statement categories" the sentence addresses, and displaying to a user a list of the sentences which match each category. Our system incorporates keywords identified by a human domain expert and rules that were obtained by machine learning, and they are combined in an ensemble architecture. We have tested our system on a sample corpus of privacy statements, and preliminary results are promising.

Research paper thumbnail of Relation Classification with Cognitive Attention Supervision

Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 2021

Many current language models such as BERT utilize attention mechanisms to transform sequence repr... more Many current language models such as BERT utilize attention mechanisms to transform sequence representations. We ask whether we can influence BERT's attention with human reading patterns by using eye-tracking and brain imaging data. We fine-tune BERT for relation extraction with auxiliary attention supervision in which BERT's attention weights are supervised by cognitive data. Through a variety of metrics we find that this attention supervision can be used to increase similarity between model attention distributions over sequences and the cognitive data without significantly affecting classification performance while making unique errors from the baseline. In particular, models with cognitive attention supervision more often correctly classified samples misclassified by the baseline.

Research paper thumbnail of Reports of the Workshops Held at the Tenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine

The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina S... more The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina State University in Raleigh, North Carolina. The workshop program included five workshops covering a wide range of topics. The titles of the workshops held Friday were Games and Natural Language Processing, and Artificial Intelligence in Adversarial Real-Time Games. The titles of the workshops held Saturday were Diversity in Games Research, Experimental Artificial Intelligence in Games, and Musical Metacreation. This article presents short summaries of those events.

Research paper thumbnail of Efficient Lazy Unification

ABSTRACT Parsing with unification grammars is inefficient due to the intractable complexity of th... more ABSTRACT Parsing with unification grammars is inefficient due to the intractable complexity of the algorithm. In implemented systems, performance suffers even more by additional overhead of processing large feature-value structures (FSs) as the base data structure. In particular, copying and unification of FSs has been identified to be the most expensive operation.

Research paper thumbnail of Domain adaptation of coreference resolution for radiology reports

Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, Jun 8, 2012

In this paper we explore the applicability of existing coreference resolution systems to a biomed... more In this paper we explore the applicability of existing coreference resolution systems to a biomedical genre: radiology reports. Analysis revealed that, due to the idiosyncrasies of the domain, both the formulation of the problem of coreference resolution and its solution need significant domain adaptation work. We reformulated the task and developed an unsupervised algorithm based on heuristics for coreference resolution in radiology reports. The algorithm is shown to perform well on a test dataset of 150 manually annotated radiology reports.

Research paper thumbnail of The Use of Question Types to Match Questions in FAQFinder

Pieee, 2002

One useful way to find the answer to a question is to search a library of previously-answered que... more One useful way to find the answer to a question is to search a library of previously-answered questions. This is the idea behind FAQFinder, a Web-based natural language questionanswering system which uses Frequently Asked Questions (FAQ) files to answer users' questions. FAQFinder tries to answer a user's question by retrieving a similar FAQ question, if one exists, and its answer. FAQFinder uses several metrics to judge the similarity of user and FAQ questions. In this paper, we discuss a metric based on question type, which we recently added to the system. We discuss the taxonomy of question types used, and present experimental results which indicate that the incorporation of question type information has substantially improved FAQFinder's performance. http://faqfinder.ics.uci.edu.

Research paper thumbnail of Ques-tion answering from Frequently-Asked Question Files

Aim, 1997

This article describes FAQ FINDER, a natural language question-answering system that uses files o... more This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FIND-ER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FIND-ER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system's performance and show that a combination of semantic and statistical techniques works better than any single approach. Articles

Research paper thumbnail of Investigation on Feature Selection to Improve Classification of Abdominal Organs in CT Images

This paper presents the preliminary result on feature selection for the purpose of classifying so... more This paper presents the preliminary result on feature selection for the purpose of classifying soft tissues of abdominal organs in computer tomography (CT) images. From the images in the dataset, texture features were first extracted, and the most relevant features were identified based on the Information Gain measure. Then a Decision Tree classifier was used to select the optimal subset of features. The initial experiments indicated that, by removing the combinations of the descriptors and distances which have the lowest Information Gain, as much as 83% of the original features were removed without sacrificing the classification accuracy at all, for the overall dataset or any individual organ, or even improving it significantly for some organs.