Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings (original) (raw)

Toward Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings

Lecture Notes in Computer Science, 2006

This paper investigates a scheme for joint segmentation and classification of dialog acts (DAs) of the ICSI Meeting Corpus based on hidden-event language models and a maximum entropy classifier for the modeling of word boundary types. Specifically, the modeling of the boundary types takes into account dependencies between the duration of a pause and its surrounding words. Results for the proposed method compare favorably with our previous work on the same task.

Automatic Dialog Act Segmentation and Classification in Multiparty Meetings

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005

We explore the two related tasks of dialog act (DA) segmentation and DA classification for speech from the ICSI Meeting Corpus. We employ simple lexical and prosodic knowledge sources, and compare results for human-transcribed versus automatically recognized words. Since there is little previous work on DA segmentation and classification in the meeting domain, our study provides baseline performance rates for both tasks. We introduce a range of metrics for use in evaluation, each of which measures different aspects of interest. Results show that both tasks are difficult, particularly for a fully automatic system. We find that a very simple prosodic model aids performance over lexical information alone, especially for segmentation. Both tasks, but particularly word-based segmentation, are degraded by word recognition errors. Finally, while classification results for meeting data show some similarities to previous results for telephone conversations, findings also suggest a potential difference with respect to the effect of modeling DA context.

A* based joint segmentation and classification of dialog acts in multiparty meetings

IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 2005

We investigate the use of the A* algorithm for joint segmentation and classification of dialog acts (DAs) of the ICSI Meeting Corpus. For the heuristic search a probabilistic framework is used that is based on DA-specific N-gram language models. Furthermore, two new metrics for performance evaluation are motivated and described and the influence of different metrics for performance evaluation is demonstrated. The proposed method is evaluated on both traditional and new metrics, and compared with our previous work on the same task.

On speaker-specific prosodic models for automatic dialog act segmentation of multi-party meetings

We explore speaker-specific prosodic modeling for dialog act segmentation of speech from the ICSI Meeting Corpus. We ask whether features beyond pauses help individual speakers, and whether some speakers benefit from prosody models trained on only their speech. We find positive results for both questions, although the second is more complex. Feature analysis reveals that duration is the most used feature type, followed by pause and pitch features. Results also suggest a difference between native and nonnative speakers in feature usage patterns. We conclude that features beyond pauses are useful for dialog act segmentation in natural conversation, and that for some speakers, speaker-specific training yields further gains.

Speaker Adaptation of Language Models for Automatic Dialog Act Segmentation of Meetings

Dialog act (DA) segmentation in meeting speech is important for meeting understanding. In this paper, we explore speaker adaptation of hidden event language models (LMs) for DA segmentation using the ICSI Meeting Corpus. Speaker adaptation is performed using a linear combination of the generic speakerindependent LM and an LM trained on only the data from individual speakers. We test the method on 20 frequent speakers, on both reference word transcripts and the output of automatic speech recognition. Results indicate improvements for 17 speakers on reference transcripts, and for 15 speakers on automatic transcripts. Overall, the speaker-adapted LM yields statistically significant improvement over the baseline LM for both test conditions.

Text Based Dialog Act Classification for Multiparty Meetings

Lecture Notes in Computer Science, 2006

This paper compares the performance of various machine learning approaches and their combination for dialog act (DA) classification of meetings data. For this task, boosting and three other text based approaches previously described in the literature are used. To further improve the classification performance, various combination schemes take into account the results of the individual classifiers. All classification methods are evaluated on the ICSI Meeting Corpus based on both reference transcripts and the output of a speech-to-text (STT) system. The results indicate that both the proposed boosting based approach and a method relying on maximum entropy substantially outperform the use of mini language models and a simple scheme relying on cue phrases. The best performance was achieved by combining methods with a multilayer perceptron.

Using hidden Markov models for topic segmentation of meeting transcripts

2008

In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and P k metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.

Confidence Measures for Semi-Automatic Labeling of Dialog Acts

2007

This paper deals with semi-supervised classifier training for automatic Dialog Acts (DAs) recognition. In our previous works, we have designed a dialog act recognition system for reservation applications in the Czech language. In this work, we propose to retrain this system on another corpus, for another task (broadcast news speech), in a different language (French) and with another set of dialog acts. This is realized using a semi-supervised approach based on the Expectation-Maximization (EM) algorithm. We show that, in the proposed experimental setup, the use of confidence measures to filter out incorrectly recognized dialog acts is required to improve the results. Two confidence measures are thus proposed and evaluated on the French broadcast news corpus. Experimental results confirm the interest of this approach for the task of training automatic dialog act classifiers.

A Probabilistic Model of Meetings that Combines Words and Discourse Features

In order to determine the points at which meeting discourse changes from one topic to another, probabilistic models were used to approximate the process through which meeting transcripts were produced. Gibbs sampling was used to estimate the values of random variables in the models, including the locations of topic boundaries. The paper shows how discourse features were integrated into the Bayesian model, and reports empirical evaluations of the benefit obtained through the inclusion of each feature and of the suitability of alternative models of the placement of topic boundaries. It demonstrates how multiple cues to segmentation can be combined in a principled way, and empirical tests show a clear improvement over previous work.

Text segmentation of spoken meeting transcripts

International Journal of Speech Technology, 2008

Text segmentation has played an important role in information retrieval as well as natural language processing. Current segmentation methods are well suited for written and structured texts making use of their distinctive macrolevel structures; however text segmentation of transcribed multi-party conversation presents a different challenge given its ill-formed sentences and the lack of macro-level text units. This paper describes an algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries.