Fumito Masui | Kitami Institute of Technology (original) (raw)

Papers by Fumito Masui

Research paper thumbnail of Extracting References to the Future from News using Morphosemantic Patterns

In this paper we investigate future reference sentences in newspapers and Web news. We propose a... more In this paper we investigate future reference sentences in newspapers and Web news. We propose a novel method for extraction of such sentences using automatically obtained patterns consisting of semantic roles and morphological information. We performed a series of experiments, in which we first extract future reference expressions from sentences using a novel algorithm for automatic extraction of sophisticated sentence patterns. Then we verify the validity of such patterns by applying them in classification of future referring sentences. Finally we use the optimized classifier to retrieve new future-referring sentences from the Web. The results show that it was possible to fully automatically retrieve future sentences with performance significantly higher than state of the art.

Research paper thumbnail of Brute Force Works Best Against Bullying

Research paper thumbnail of Effectiveness of Relative Expressions for Trend Information Extraction

Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, 2006

Research paper thumbnail of Recognizing article errors using prepositional information

Systems and Computers in Japan, 2006

In this paper, we propose a prepositional model that uses prepositional information to detect art... more In this paper, we propose a prepositional model that uses prepositional information to detect article errors often seen in English sentences written by Japanese learners of English. The conventional methods for detecting article errors include a statistical model that is based on statistics obtained from an electronic corpus created from documents such as English newspapers. However, the usage of the articles has many exceptions, and thus the performance of the statistical model is not yet sufficient. Hence, in the prepositional model, the performance of the statistical model is improved by using prepositional information, and errors are detected while also taking into account exceptional usages of the articles. In an experiment, it was verified that the performance of the prepositional model (F−measure = 0.72) is a huge improvement over the performance of the statistical model (F−measure = 0.53) in dealing with article errors in prepositional phrases. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(12): 17–26, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20527

Research paper thumbnail of A method for rating English texts by reading level for Japanese learners of English

Systems and Computers in Japan, 2005

It has been recognized that existing methods for rating English texts by reading level are mostly... more It has been recognized that existing methods for rating English texts by reading level are mostly aimed at native speakers of English and therefore are not completely appropriate for Japanese learners of the language. Here we propose a method for rating English texts by reading level specifically targeted at Japanese learners of the language. To rate the reading level of a text for a Japanese learner of English, our method takes two types of information regarding a given text into account, namely, vocabulary and grammatical structure. Specifically, we rate the reading level of a text by using a vocabulary list and parser to extract particularly difficult vocabulary items or grammatical structures as features. To rate a text's reading level, two types of model are used: multiple regression and neural networks. Our experiments show that the proposed methods rate the reading level of a text with the following levels of accuracy: an average of 75% accuracy for multiple regression and 81.3% when using neural networks. These constitute improvements on existing methods. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(6): 1–13, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20326

Research paper thumbnail of Recognizing article errors in the writing of Japanese learners of English

Systems and Computers in Japan, 2005

In this paper, the authors propose a method to recognize article errors often seen in English tex... more In this paper, the authors propose a method to recognize article errors often seen in English text written by Japanese learners of English. In this method, article errors are recognized based on the statistic extracted from an electronic corpus such as English-language newspapers. The authors' method is different from earlier methods in that there is no need to create a dictionary or rules for article error recognition. The results of experiments confirm that the performance of the authors' method is equivalent or superior to earlier methods (F − measure = 0.76). In addition, the authors' method is shown to be superior to earlier methods insofar as (1) the effort to create a dictionary or rules is not needed; (2) there are no limits on the input text; and (3) the ratio of Recall to Precision can be adjusted. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(7): 54–63, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20153

Research paper thumbnail of Detecting Article Errors Based on the Mass Count Distinction

This paper proposes a method for detecting errors concerning article usage and singular/plural us... more This paper proposes a method for detecting errors concerning article usage and singular/plural usage based on the mass count distinction. Although the mass count distinction is particularly important in detecting these errors, it has been pointed out that it is hard to make heuristic rules for distinguishing mass and count nouns. To solve the problem, first, instances of mass and count nouns are automatically collected from a corpus exploiting surface information in the proposed method. Then, words surrounding the mass (count) instances are weighted based on their frequencies. Finally, the weighted words are used for distinguishing mass and count nouns. After distinguishing mass and count nouns, the above errors can be detected by some heuristic rules. Experiments show that the proposed method distinguishes mass and count nouns in the writing of Japanese learners of English with an accuracy of 93% and that 65% of article errors are detected with a precision of 70%.

Research paper thumbnail of Question Answering System Based on Expanded Answer Types and Multi-Scores

In this paper, we describe our Question Answering System. We proposed to use 200 answer types to ... more In this paper, we describe our Question Answering System. We proposed to use 200 answer types to abate answer types' ambiguity in Query Analysis. And as the score of extracting correct answers, we proposed TF IDF and a word distance between an answer candidate and a weighty word from a query.

Research paper thumbnail of Characterization of List-Type Question Answering and its Evaluation Measures

Research paper thumbnail of MAIMAI: A Question Answering System at NTCIR3 QAC-1

This paper describes an question answering system based on syntactic information. Our system extr... more This paper describes an question answering system based on syntactic information. Our system extracts answer candidates by ranking of score which shows similarity of syntactic structure. Syntactic structure is estimated based on answer type, density of weighty words, distance between words and depth of parse tree. To analyze syntactic structure, morphological analysis, named entity extraction and parser are utilized.

Research paper thumbnail of Are open-domain question answering technologies useful for information access dialogues?---an empirical study and a proposal of a novel challenge

ACM Transactions on Asian Language Information Processing, 2005

There are strong expectations for the use of question answering technologies in information acces... more There are strong expectations for the use of question answering technologies in information access dialogues, such as for information gathering and browsing. In this paper, we empirically examine what kinds of abilities are needed for question answering systems in such situations, and propose a challenge for evaluating those abilities objectively and quantitatively. We also show that existing technologies have the

Research paper thumbnail of An evaluation of question answering challenge (QAC-1) at the NTCIR workshop 3

Research paper thumbnail of Question Answering Challenge for Information Access Dialogue - Overview of NTCIR4 QAC2 Subtask 3

We describe an overview of Question Answering Challenge (QAC) 2 Subtask 3, a novel challenge for ... more We describe an overview of Question Answering Challenge (QAC) 2 Subtask 3, a novel challenge for evaluating open-domain question answering technologies, at the NTCIR Workshop 4. In QAC2 Subtask 3, question answering systems are supposed to be used interactively to answer a series of related questions, whereas in the conventional setting, systems answer isolated questions one by one. Such an interaction occurs in the case of gathering information for a report on a specific topic, or when browsing information of interest to the user. In this paper, first, we explain the design of the challenge. Reporting the results of the run conducted and techniques employed there, we then show that existing technologies have the potential to address this challenge.

Research paper thumbnail of Named Entity Extraction Tool NExT for Text Processing

Research paper thumbnail of Handling Information Access Dialogue through QA Technologies - A novel challenge for open-domain question answering

A novel challenge for evaluating open-domain question answering technologies is proposed. In this... more A novel challenge for evaluating open-domain question answering technologies is proposed. In this challenge, question answering systems are supposed to be used interactively to answer a series of related questions, whereas in the conventional setting, systems answer isolated questions one by one. Such an interaction occurs in the case of gathering information for a report on a specific topic, or when browsing information of interest to the user. In this paper, first, we explain the design of the challenge. We then discuss its reality and show how the capabilities measured by the challenge are useful and important in practical situations, and that the difficulty of the challenge is proper for evaluating the current state of open-domain question answering technologies.

Research paper thumbnail of Named Entity Extraction Tool: NExT

In this paper, we describe Named Entity extraction Tool (NExT) which has been developed to suppor... more In this paper, we describe Named Entity extraction Tool (NExT) which has been developed to support and encourage NLP researchers working in the area of Information Extraction. NExT system is implemented in pattern based approach and is intended to have features of easy maintenance and expansion of extraction pattern rules.

Research paper thumbnail of Question Answering Challenge (QAC-1): An Evaluation of Question Answering Tasks at the NTCIR Workshop 3

... QAC1-3011-01: “`0a%b @c0dfehg i Wp&rq8s8tvu2w x7yvÄ3ÇÅ6v 29!A (Joe Hisaishi was a mus... more ... QAC1-3011-01: “`0a%b @c0dfehg i Wp&rq8s8tvu2w x7yvÄ3ÇÅ6v 29!A (Joe Hisaishi was a music director for which of Hayao Miyazaki ¢ s films?)” QAC1-3011-02: “É0ÑXÖ x2y Ä 38Å 62 9vA (What is the name of the film directed by Takeshi Kitano?)” 3.2 Support information ...

Research paper thumbnail of Are Open-domain Question Answering Technologies Useful for Information Access Dialogues

There are strong expectations for the use of question answering technologies in information acces... more There are strong expectations for the use of question answering technologies in information access dialogues, such as for information gathering and browsing. In this paper, we empirically examine what kinds of abilities are needed for question answering systems in such situations, and propose a challenge for evaluating those abilities objectively and quantitatively. We also show that existing technologies have the

Research paper thumbnail of An Overview of the 4th Question Answering Challenge (QAC-4) at NTCIR Workshop 6

In QAC-4, we defined question answering task using any type of question, mainly focused on non-fa... more In QAC-4, we defined question answering task using any type of question, mainly focused on non-factoid questions. There are 8 participants and 14 runs from these participants. In the evaluation, four kinds of criterion were used for some portion of participants answer set. The evaluation results showed some of the participant systems could focus on the area which correct answer contents exist but have tendency to fail to extract correct answer areas. It is caused by complex question types and difficulty of correct answer scope extraction.

Research paper thumbnail of WoZ Simulation of Interactive Question Answering

QACIAD (Question Answering Challenge for Information Access Dialogue) is an evaluation framework ... more QACIAD (Question Answering Challenge for Information Access Dialogue) is an evaluation framework for measuring interactive question answering (QA) technologies. It assumes that users interactively collect information using a QA system for writing a report on a given topic and evaluates, among other things, the capabilities needed under such circumstances. This paper reports an experiment for examining the assumptions made by QACIAD. In this experiment, dialogues under the situation that QACIAD assumes are collected using WoZ (Wizard of Oz) simulating, which is frequently used for collecting dialogue data for designing speech dialogue systems, and then analyzed. The results indicate that the setting of QACIAD is real and appropriate and that one of the important capabilities for future interactive QA systems is providing cooperative and helpful responses.

Research paper thumbnail of Extracting References to the Future from News using Morphosemantic Patterns

In this paper we investigate future reference sentences in newspapers and Web news. We propose a... more In this paper we investigate future reference sentences in newspapers and Web news. We propose a novel method for extraction of such sentences using automatically obtained patterns consisting of semantic roles and morphological information. We performed a series of experiments, in which we first extract future reference expressions from sentences using a novel algorithm for automatic extraction of sophisticated sentence patterns. Then we verify the validity of such patterns by applying them in classification of future referring sentences. Finally we use the optimized classifier to retrieve new future-referring sentences from the Web. The results show that it was possible to fully automatically retrieve future sentences with performance significantly higher than state of the art.

Research paper thumbnail of Brute Force Works Best Against Bullying

Research paper thumbnail of Effectiveness of Relative Expressions for Trend Information Extraction

Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, 2006

Research paper thumbnail of Recognizing article errors using prepositional information

Systems and Computers in Japan, 2006

In this paper, we propose a prepositional model that uses prepositional information to detect art... more In this paper, we propose a prepositional model that uses prepositional information to detect article errors often seen in English sentences written by Japanese learners of English. The conventional methods for detecting article errors include a statistical model that is based on statistics obtained from an electronic corpus created from documents such as English newspapers. However, the usage of the articles has many exceptions, and thus the performance of the statistical model is not yet sufficient. Hence, in the prepositional model, the performance of the statistical model is improved by using prepositional information, and errors are detected while also taking into account exceptional usages of the articles. In an experiment, it was verified that the performance of the prepositional model (F−measure = 0.72) is a huge improvement over the performance of the statistical model (F−measure = 0.53) in dealing with article errors in prepositional phrases. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(12): 17–26, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20527

Research paper thumbnail of A method for rating English texts by reading level for Japanese learners of English

Systems and Computers in Japan, 2005

It has been recognized that existing methods for rating English texts by reading level are mostly... more It has been recognized that existing methods for rating English texts by reading level are mostly aimed at native speakers of English and therefore are not completely appropriate for Japanese learners of the language. Here we propose a method for rating English texts by reading level specifically targeted at Japanese learners of the language. To rate the reading level of a text for a Japanese learner of English, our method takes two types of information regarding a given text into account, namely, vocabulary and grammatical structure. Specifically, we rate the reading level of a text by using a vocabulary list and parser to extract particularly difficult vocabulary items or grammatical structures as features. To rate a text's reading level, two types of model are used: multiple regression and neural networks. Our experiments show that the proposed methods rate the reading level of a text with the following levels of accuracy: an average of 75% accuracy for multiple regression and 81.3% when using neural networks. These constitute improvements on existing methods. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(6): 1–13, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20326

Research paper thumbnail of Recognizing article errors in the writing of Japanese learners of English

Systems and Computers in Japan, 2005

In this paper, the authors propose a method to recognize article errors often seen in English tex... more In this paper, the authors propose a method to recognize article errors often seen in English text written by Japanese learners of English. In this method, article errors are recognized based on the statistic extracted from an electronic corpus such as English-language newspapers. The authors' method is different from earlier methods in that there is no need to create a dictionary or rules for article error recognition. The results of experiments confirm that the performance of the authors' method is equivalent or superior to earlier methods (F − measure = 0.76). In addition, the authors' method is shown to be superior to earlier methods insofar as (1) the effort to create a dictionary or rules is not needed; (2) there are no limits on the input text; and (3) the ratio of Recall to Precision can be adjusted. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(7): 54–63, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20153

Research paper thumbnail of Detecting Article Errors Based on the Mass Count Distinction

This paper proposes a method for detecting errors concerning article usage and singular/plural us... more This paper proposes a method for detecting errors concerning article usage and singular/plural usage based on the mass count distinction. Although the mass count distinction is particularly important in detecting these errors, it has been pointed out that it is hard to make heuristic rules for distinguishing mass and count nouns. To solve the problem, first, instances of mass and count nouns are automatically collected from a corpus exploiting surface information in the proposed method. Then, words surrounding the mass (count) instances are weighted based on their frequencies. Finally, the weighted words are used for distinguishing mass and count nouns. After distinguishing mass and count nouns, the above errors can be detected by some heuristic rules. Experiments show that the proposed method distinguishes mass and count nouns in the writing of Japanese learners of English with an accuracy of 93% and that 65% of article errors are detected with a precision of 70%.

Research paper thumbnail of Question Answering System Based on Expanded Answer Types and Multi-Scores

In this paper, we describe our Question Answering System. We proposed to use 200 answer types to ... more In this paper, we describe our Question Answering System. We proposed to use 200 answer types to abate answer types' ambiguity in Query Analysis. And as the score of extracting correct answers, we proposed TF IDF and a word distance between an answer candidate and a weighty word from a query.

Research paper thumbnail of Characterization of List-Type Question Answering and its Evaluation Measures

Research paper thumbnail of MAIMAI: A Question Answering System at NTCIR3 QAC-1

This paper describes an question answering system based on syntactic information. Our system extr... more This paper describes an question answering system based on syntactic information. Our system extracts answer candidates by ranking of score which shows similarity of syntactic structure. Syntactic structure is estimated based on answer type, density of weighty words, distance between words and depth of parse tree. To analyze syntactic structure, morphological analysis, named entity extraction and parser are utilized.

Research paper thumbnail of Are open-domain question answering technologies useful for information access dialogues?---an empirical study and a proposal of a novel challenge

ACM Transactions on Asian Language Information Processing, 2005

There are strong expectations for the use of question answering technologies in information acces... more There are strong expectations for the use of question answering technologies in information access dialogues, such as for information gathering and browsing. In this paper, we empirically examine what kinds of abilities are needed for question answering systems in such situations, and propose a challenge for evaluating those abilities objectively and quantitatively. We also show that existing technologies have the

Research paper thumbnail of An evaluation of question answering challenge (QAC-1) at the NTCIR workshop 3

Research paper thumbnail of Question Answering Challenge for Information Access Dialogue - Overview of NTCIR4 QAC2 Subtask 3

We describe an overview of Question Answering Challenge (QAC) 2 Subtask 3, a novel challenge for ... more We describe an overview of Question Answering Challenge (QAC) 2 Subtask 3, a novel challenge for evaluating open-domain question answering technologies, at the NTCIR Workshop 4. In QAC2 Subtask 3, question answering systems are supposed to be used interactively to answer a series of related questions, whereas in the conventional setting, systems answer isolated questions one by one. Such an interaction occurs in the case of gathering information for a report on a specific topic, or when browsing information of interest to the user. In this paper, first, we explain the design of the challenge. Reporting the results of the run conducted and techniques employed there, we then show that existing technologies have the potential to address this challenge.

Research paper thumbnail of Named Entity Extraction Tool NExT for Text Processing

Research paper thumbnail of Handling Information Access Dialogue through QA Technologies - A novel challenge for open-domain question answering

A novel challenge for evaluating open-domain question answering technologies is proposed. In this... more A novel challenge for evaluating open-domain question answering technologies is proposed. In this challenge, question answering systems are supposed to be used interactively to answer a series of related questions, whereas in the conventional setting, systems answer isolated questions one by one. Such an interaction occurs in the case of gathering information for a report on a specific topic, or when browsing information of interest to the user. In this paper, first, we explain the design of the challenge. We then discuss its reality and show how the capabilities measured by the challenge are useful and important in practical situations, and that the difficulty of the challenge is proper for evaluating the current state of open-domain question answering technologies.

Research paper thumbnail of Named Entity Extraction Tool: NExT

In this paper, we describe Named Entity extraction Tool (NExT) which has been developed to suppor... more In this paper, we describe Named Entity extraction Tool (NExT) which has been developed to support and encourage NLP researchers working in the area of Information Extraction. NExT system is implemented in pattern based approach and is intended to have features of easy maintenance and expansion of extraction pattern rules.

Research paper thumbnail of Question Answering Challenge (QAC-1): An Evaluation of Question Answering Tasks at the NTCIR Workshop 3

... QAC1-3011-01: “`0a%b @c0dfehg i Wp&rq8s8tvu2w x7yvÄ3ÇÅ6v 29!A (Joe Hisaishi was a mus... more ... QAC1-3011-01: “`0a%b @c0dfehg i Wp&rq8s8tvu2w x7yvÄ3ÇÅ6v 29!A (Joe Hisaishi was a music director for which of Hayao Miyazaki ¢ s films?)” QAC1-3011-02: “É0ÑXÖ x2y Ä 38Å 62 9vA (What is the name of the film directed by Takeshi Kitano?)” 3.2 Support information ...

Research paper thumbnail of Are Open-domain Question Answering Technologies Useful for Information Access Dialogues

There are strong expectations for the use of question answering technologies in information acces... more There are strong expectations for the use of question answering technologies in information access dialogues, such as for information gathering and browsing. In this paper, we empirically examine what kinds of abilities are needed for question answering systems in such situations, and propose a challenge for evaluating those abilities objectively and quantitatively. We also show that existing technologies have the

Research paper thumbnail of An Overview of the 4th Question Answering Challenge (QAC-4) at NTCIR Workshop 6

In QAC-4, we defined question answering task using any type of question, mainly focused on non-fa... more In QAC-4, we defined question answering task using any type of question, mainly focused on non-factoid questions. There are 8 participants and 14 runs from these participants. In the evaluation, four kinds of criterion were used for some portion of participants answer set. The evaluation results showed some of the participant systems could focus on the area which correct answer contents exist but have tendency to fail to extract correct answer areas. It is caused by complex question types and difficulty of correct answer scope extraction.

Research paper thumbnail of WoZ Simulation of Interactive Question Answering

QACIAD (Question Answering Challenge for Information Access Dialogue) is an evaluation framework ... more QACIAD (Question Answering Challenge for Information Access Dialogue) is an evaluation framework for measuring interactive question answering (QA) technologies. It assumes that users interactively collect information using a QA system for writing a report on a given topic and evaluates, among other things, the capabilities needed under such circumstances. This paper reports an experiment for examining the assumptions made by QACIAD. In this experiment, dialogues under the situation that QACIAD assumes are collected using WoZ (Wizard of Oz) simulating, which is frequently used for collecting dialogue data for designing speech dialogue systems, and then analyzed. The results indicate that the setting of QACIAD is real and appropriate and that one of the important capabilities for future interactive QA systems is providing cooperative and helpful responses.