Kensuke Mitsuzawa - Academia.edu (original) (raw)

Uploads

Papers by Kensuke Mitsuzawa

Research paper thumbnail of An Analysis of Negative-opinion on Customer Comments using CRF

Research paper thumbnail of Clause-level Negative-opinion Analysis for Classifying Reviews on Multiple Domains

Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, 2018

Today, vast amounts of reviews are posted on the internet. Businesses must extract negative opini... more Today, vast amounts of reviews are posted on the internet. Businesses must extract negative opinions of products and services from reviews to improve their products and services. Still, some issues related to automatic extraction of negative sentiment must be addressed if they intend to use the information to improve their products and services. (1) Reviews are usually long texts. Finding only negative opinions for improvement is therefore time-consuming. (2) Many studies proposed about sentiment classification using machine learning. When we use the machine learning technique, we must prepare a lot of training data of the same product and services as test data. It is high cost. As described herein, we propose a clause-level sentiment classification method using Conditional Random Field (CRF) to address the issue (1). Also, we describe experiments of sentiment classification on reviews of multiple domains for the issue (2).

Research paper thumbnail of Corpus : a Japanese Corpus from New Opinion Survey Service 15 : 15 – 16 : 05 – Novel Incentives and Workflows for Annotation

The gap between supply of and demand for Language Resources continues to impede progress in lingu... more The gap between supply of and demand for Language Resources continues to impede progress in linguistic research and technology development, even in the face of immense international effort to create the requisite data and tools. This deficiency affects all languages in some way, even those with worldwide economic and political influence. Moreover, for most of the world’s 7000 linguistic varieties the absence is acute. Current approaches cannot hope to meet the resource demand for even a reasonable subset of the languages currently spoken because they seek to document phenomena of great variability principally using resources, such as national funding, that are highly constrained in terms of amount, duration and scope. This paper describes efforts to augment the traditional incentives of monetary compensation with alternate incentives in order to elicit greater contributions of linguistic data, metadata and annotation. It also touches on the adjustments to workforces, workflows and p...

Research paper thumbnail of NAIST at 2013 CoNLL grammatical error correction shared task

This paper describes the Nara Institute of Science and Technology (NAIST) error correction system... more This paper describes the Nara Institute of Science and Technology (NAIST) error correction system in the CoNLL 2013 Shared Task. We constructed three systems: a system based on the Treelet Language Model for verb form and subjectverb agreement errors; a classifier trained on both learner and native corpora for noun number errors; a statistical machine translation (SMT)-based model for preposition and determiner errors. As for subject-verb agreement errors, we show that the Treelet Language Model-based approach can correct errors in which the target verb is distant from its subject. Our system ranked fourth on the official run.

Research paper thumbnail of FKC Corpus : a Japanese Corpus from New Opinion Survey Service

In this paper, we present the FKC corpus which is from Fuman Kaitori Center (FKC). The FKC is a J... more In this paper, we present the FKC corpus which is from Fuman Kaitori Center (FKC). The FKC is a Japanese consumer opinion data collection and analysis service. The main advantage of the FKC is the system that awards greater points to user input containing more information, which encourages users to input categorical information. Thanks to this system, the FKC corpus has consumers’ opinions with abundant category and user demographics, and is considered to serve multiple NLP tasks: opinion mining, document classification, author inferring and sentiment classification. The FKC corpus consists of 254,683 posts coming from 25,092 users. All posts are checked by annotators who are working for the FKC in crowdsourcing. The posts in the FKC corpus mainly comes from mobile devices, and one third of them are about products or events related to daily life. We also show some correlations between point incentive and users’ motivation which keeps posting their opinions with abundant category inf...

Research paper thumbnail of Sentence Boundary Detection on Line Breaks in Japanese

For NLP, sentence boundary detection (SBD) is an essential task to decompose a text into sentence... more For NLP, sentence boundary detection (SBD) is an essential task to decompose a text into sentences. Most of the previous studies have used a simple rule that uses only typical characters as sentence boundaries. However, some characters may or may not be sentence boundaries depending on the context. We focused on line breaks in them. We newly constructed annotated corpora, implemented sentence boundary detectors, and analyzed performance of SBD in several settings.

Research paper thumbnail of An Analysis of Negative-opinion on Customer Comments using CRF

Research paper thumbnail of Clause-level Negative-opinion Analysis for Classifying Reviews on Multiple Domains

Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, 2018

Today, vast amounts of reviews are posted on the internet. Businesses must extract negative opini... more Today, vast amounts of reviews are posted on the internet. Businesses must extract negative opinions of products and services from reviews to improve their products and services. Still, some issues related to automatic extraction of negative sentiment must be addressed if they intend to use the information to improve their products and services. (1) Reviews are usually long texts. Finding only negative opinions for improvement is therefore time-consuming. (2) Many studies proposed about sentiment classification using machine learning. When we use the machine learning technique, we must prepare a lot of training data of the same product and services as test data. It is high cost. As described herein, we propose a clause-level sentiment classification method using Conditional Random Field (CRF) to address the issue (1). Also, we describe experiments of sentiment classification on reviews of multiple domains for the issue (2).

Research paper thumbnail of Corpus : a Japanese Corpus from New Opinion Survey Service 15 : 15 – 16 : 05 – Novel Incentives and Workflows for Annotation

The gap between supply of and demand for Language Resources continues to impede progress in lingu... more The gap between supply of and demand for Language Resources continues to impede progress in linguistic research and technology development, even in the face of immense international effort to create the requisite data and tools. This deficiency affects all languages in some way, even those with worldwide economic and political influence. Moreover, for most of the world’s 7000 linguistic varieties the absence is acute. Current approaches cannot hope to meet the resource demand for even a reasonable subset of the languages currently spoken because they seek to document phenomena of great variability principally using resources, such as national funding, that are highly constrained in terms of amount, duration and scope. This paper describes efforts to augment the traditional incentives of monetary compensation with alternate incentives in order to elicit greater contributions of linguistic data, metadata and annotation. It also touches on the adjustments to workforces, workflows and p...

Research paper thumbnail of NAIST at 2013 CoNLL grammatical error correction shared task

This paper describes the Nara Institute of Science and Technology (NAIST) error correction system... more This paper describes the Nara Institute of Science and Technology (NAIST) error correction system in the CoNLL 2013 Shared Task. We constructed three systems: a system based on the Treelet Language Model for verb form and subjectverb agreement errors; a classifier trained on both learner and native corpora for noun number errors; a statistical machine translation (SMT)-based model for preposition and determiner errors. As for subject-verb agreement errors, we show that the Treelet Language Model-based approach can correct errors in which the target verb is distant from its subject. Our system ranked fourth on the official run.

Research paper thumbnail of FKC Corpus : a Japanese Corpus from New Opinion Survey Service

In this paper, we present the FKC corpus which is from Fuman Kaitori Center (FKC). The FKC is a J... more In this paper, we present the FKC corpus which is from Fuman Kaitori Center (FKC). The FKC is a Japanese consumer opinion data collection and analysis service. The main advantage of the FKC is the system that awards greater points to user input containing more information, which encourages users to input categorical information. Thanks to this system, the FKC corpus has consumers’ opinions with abundant category and user demographics, and is considered to serve multiple NLP tasks: opinion mining, document classification, author inferring and sentiment classification. The FKC corpus consists of 254,683 posts coming from 25,092 users. All posts are checked by annotators who are working for the FKC in crowdsourcing. The posts in the FKC corpus mainly comes from mobile devices, and one third of them are about products or events related to daily life. We also show some correlations between point incentive and users’ motivation which keeps posting their opinions with abundant category inf...

Research paper thumbnail of Sentence Boundary Detection on Line Breaks in Japanese

For NLP, sentence boundary detection (SBD) is an essential task to decompose a text into sentence... more For NLP, sentence boundary detection (SBD) is an essential task to decompose a text into sentences. Most of the previous studies have used a simple rule that uses only typical characters as sentence boundaries. However, some characters may or may not be sentence boundaries depending on the context. We focused on line breaks in them. We newly constructed annotated corpora, implemented sentence boundary detectors, and analyzed performance of SBD in several settings.