Krista Lagus | University of Helsinki (original) (raw)

Papers by Krista Lagus

Research paper thumbnail of Adaptive Dialogue Systems - Interaction with Interact

Technological development has made computer interaction more common and also commercially feasibl... more Technological development has made computer interaction more common and also commercially feasible, and the number of interactive systems has grown rapidly. At the same time, the systems should be able to adapt to various situations and various users, so as to provide the most efficient and helpful mode of interaction. The aim of the Interact project is to explore natural human-computer interaction and to develop dialogue models which will allow users to interact with the computer in a natural and robust way. The paper describes the innovative goals of the project and presents ways that the Interact system supports adaptivity on different system design and interaction management levels.

Research paper thumbnail of Generalizability of the WEBSOM Method to Document Collections of Various Types

European Congress on Intelligent Techniques and Soft Computing - link, 1998

WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize c... more WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of

Research paper thumbnail of Map of WSOM'97 Abstracts - Alternative Index

s --- Alternative IndexKrista LagusHelsinki University of TechnologyNeural Networks Research Cent... more s --- Alternative IndexKrista LagusHelsinki University of TechnologyNeural Networks Research CentreP. O. Box 2200, FIN-02015 HUT, FinlandAbstractThe collection of abstracts of WSOM'97 articles was organized using a simplified form ofthe WEBSOM method. The result offers an approximate visual index to the workshop. Theresulting map shows no clear general organizing criterion; instead the grounds for clusteringcertain documents seem to vary accross

Research paper thumbnail of Text mining with the WEB-SOM

Research paper thumbnail of Text Retrieval Using Self Organized Document Maps

Neural Processing Letters, 2002

A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized i... more A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This report describes how a document map

Research paper thumbnail of Looking at our data-perspectives from mindfulness apps and quantified self as a daily practice

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014

Research paper thumbnail of Miten hermoverkkomallit selittävät kielen oppimista

Research paper thumbnail of Automated pagination of the generalized newspaper using simulated annealing

Research paper thumbnail of Generalizability of the WEBSOM method to document collections of various types

WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize c... more WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of material poses, as well as on identifying properties of a text collection that a ect the choices made at each progessing stage. Properties such as the size of the document collection, the size of the vocabulary, the domain, the style of writing, and the language are considered.

Research paper thumbnail of Map of WSOM’97 abstracts—alternative index

The collection of abstracts of WSOM'97 articles was organized using a simpli ed form of the WEBSO... more The collection of abstracts of WSOM'97 articles was organized using a simpli ed form of the WEBSOM method. The result o ers an approximate visual index to the workshop. The resulting map shows no clear general organizing criterion; instead the grounds for clustering certain documents seem to vary accross areas, sometimes even within a single node across pairs of articles. It is suggested that this results from the inherent large dimensionality of the document space.

Research paper thumbnail of Retrieving a user language model from an unsupervised document map

The retrieval is based on a feature vector representation [2] of a sample text or an approximativ... more The retrieval is based on a feature vector representation [2] of a sample text or an approximative intermediate speech transcription. This so-called document vector will be compared to an index of pre-trained language models and the best models are retrieved. The index is not just a list of language models but itself a smooth topological representa-tion of language topics and

Research paper thumbnail of Unsupervised Word Categorization Using Self-Organizing Maps and Automatically Extracted Morphs

Lecture Notes in Computer Science, 2006

Automatic creation of syntactic and semantic word categorizations is a challenging problem for hi... more Automatic creation of syntactic and semantic word categorizations is a challenging problem for highly inflecting languages due to excessive data sparsity. Moreover, the study of colloquial language resources requires the utilization of fully corpus-based tools. We present a completely automated approach for producing word categorizations for morphologically rich languages. Self-Organizing Map (SOM) is utilized for clustering words based on the morphological properties of the context words. These properties are extracted using an automated morphological segmentation algorithm called Morfessor. Our experiments on a colloquial Finnish corpus of stories told by young children show that utilizing unsupervised morphs as features leads to clearly improved clusterings when compared to the use of whole context words as features.

Research paper thumbnail of Text Retrieval Using Self-Organized Document Maps

Neural Processing Letters - NPL, 2002

A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized i... more A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection [3] show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive.

Research paper thumbnail of Large vocabulary statistical language modeling for continuous speech recognition in Finnish

Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speec... more Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speech recognition (LVCSR) system. The development of the standard SLM methods has been strongly affected by the goals of LVCSR in English. The structure of Finnish is substantially different from English, so if the standard SLMs are directly applied, the success is by no means granted. In this paper we describe our first attempts of building a LVCSR for Finnish and the new SLMs that we have tried. One of our objective has been the indexing and recognition of broadcast news, so special issues of our interest are topic detection, word stemming and modeling words that are poorly covered in the training data. Our new methods are based on neural computing using the self-organizing map (SOM) which has recently been shown to successfully extract and approximate latent semantic structures from massive text collections.

Research paper thumbnail of Text mining with the WEBSOM

Acta Polytechnica Scandinavica, Mathematics and …, 2000

TKK Text Mining with the WEBSOM. Krista Lagus. Dissertation for the degree of Doctor of Science i... more TKK Text Mining with the WEBSOM. Krista Lagus. Dissertation for the degree of Doctor of Science in Technology to be presented with ...

Research paper thumbnail of Data analysis of conceptual similarities of Finnish verbs

Cognitive Science, 2002

The study of the conceptual representations thatunderlie the use of language is a problem motivat... more The study of the conceptual representations thatunderlie the use of language is a problem motivatedfrom both a cognitive research point of view andthat of construing language models for various languageprocessing tasks. In this work, we organized600 Finnish verbs using the SOM algorithm. Threeexperiments were conducted using dierent featuresto encode the verbs: morphosyntactic properties,individual nouns, and noun categories in the contextof

Research paper thumbnail of MODELING COMMUNITIES OF EXPERTS

Finding ways in which communities of experts can benefit from each other is a question shared by ... more Finding ways in which communities of experts can benefit from each other is a question shared by the machine learning community and social sciences alike. Considerable research in machine learning methods has shown that communities of experts can provide consistently better classifications and decisions than single experts in various tasks and domains. Our aim is to extend the perspective on communities of experts to cover the wider context of socio-cognitive research. In particular, we discuss the sociocognitive research on the formation and use of expertise in relation to the modeling of concept formation, integration and use in human and artificial agents. We present three case studies related to problem solving and decision making in environmental policy, medical care, and consumer research. We present a methodological framework for the computational modeling of the phenomena described above. A specific emphasis is on unsupervised statistical machine learning of heterogeneous conceptual spaces in multi-agent systems and on the application of such conceptual expert knowledge.

Research paper thumbnail of Kuntotiedot kartalle - erilaiset hyvä- ja huonokuntoisten ryhmät näkyviin

Research paper thumbnail of ICA and SOM in text document analysis

Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02, 2002

In this study we show experimental results on using Independent Component Analysis (ICA) and the ... more In this study we show experimental results on using Independent Component Analysis (ICA) and the Self-Organizing Map (SOM) in document analysis. Our documents are segments of spoken dialogues carried out over the telephone in a customer service, transcribed into text. The task is to analyze the topics of the discussions, and to group the discussions into meaningful subsets. The quality of the grouping is studied by comparing to a manual topical classification of the documents.

Research paper thumbnail of Using Correlation Dimension for Analysing Text Data

Lecture Notes in Computer Science, 2010

In this article, we study the scale-dependent dimensionality properties and overall structure of ... more In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A comparison is also made with speech data. The results reflect some of the typical properties of the data and the use of our method in improving various data analysis applications is discussed.

Research paper thumbnail of Adaptive Dialogue Systems - Interaction with Interact

Technological development has made computer interaction more common and also commercially feasibl... more Technological development has made computer interaction more common and also commercially feasible, and the number of interactive systems has grown rapidly. At the same time, the systems should be able to adapt to various situations and various users, so as to provide the most efficient and helpful mode of interaction. The aim of the Interact project is to explore natural human-computer interaction and to develop dialogue models which will allow users to interact with the computer in a natural and robust way. The paper describes the innovative goals of the project and presents ways that the Interact system supports adaptivity on different system design and interaction management levels.

Research paper thumbnail of Generalizability of the WEBSOM Method to Document Collections of Various Types

European Congress on Intelligent Techniques and Soft Computing - link, 1998

WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize c... more WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of

Research paper thumbnail of Map of WSOM'97 Abstracts - Alternative Index

s --- Alternative IndexKrista LagusHelsinki University of TechnologyNeural Networks Research Cent... more s --- Alternative IndexKrista LagusHelsinki University of TechnologyNeural Networks Research CentreP. O. Box 2200, FIN-02015 HUT, FinlandAbstractThe collection of abstracts of WSOM'97 articles was organized using a simplified form ofthe WEBSOM method. The result offers an approximate visual index to the workshop. Theresulting map shows no clear general organizing criterion; instead the grounds for clusteringcertain documents seem to vary accross

Research paper thumbnail of Text mining with the WEB-SOM

Research paper thumbnail of Text Retrieval Using Self Organized Document Maps

Neural Processing Letters, 2002

A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized i... more A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This report describes how a document map

Research paper thumbnail of Looking at our data-perspectives from mindfulness apps and quantified self as a daily practice

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014

Research paper thumbnail of Miten hermoverkkomallit selittävät kielen oppimista

Research paper thumbnail of Automated pagination of the generalized newspaper using simulated annealing

Research paper thumbnail of Generalizability of the WEBSOM method to document collections of various types

WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize c... more WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of material poses, as well as on identifying properties of a text collection that a ect the choices made at each progessing stage. Properties such as the size of the document collection, the size of the vocabulary, the domain, the style of writing, and the language are considered.

Research paper thumbnail of Map of WSOM’97 abstracts—alternative index

The collection of abstracts of WSOM'97 articles was organized using a simpli ed form of the WEBSO... more The collection of abstracts of WSOM'97 articles was organized using a simpli ed form of the WEBSOM method. The result o ers an approximate visual index to the workshop. The resulting map shows no clear general organizing criterion; instead the grounds for clustering certain documents seem to vary accross areas, sometimes even within a single node across pairs of articles. It is suggested that this results from the inherent large dimensionality of the document space.

Research paper thumbnail of Retrieving a user language model from an unsupervised document map

The retrieval is based on a feature vector representation [2] of a sample text or an approximativ... more The retrieval is based on a feature vector representation [2] of a sample text or an approximative intermediate speech transcription. This so-called document vector will be compared to an index of pre-trained language models and the best models are retrieved. The index is not just a list of language models but itself a smooth topological representa-tion of language topics and

Research paper thumbnail of Unsupervised Word Categorization Using Self-Organizing Maps and Automatically Extracted Morphs

Lecture Notes in Computer Science, 2006

Automatic creation of syntactic and semantic word categorizations is a challenging problem for hi... more Automatic creation of syntactic and semantic word categorizations is a challenging problem for highly inflecting languages due to excessive data sparsity. Moreover, the study of colloquial language resources requires the utilization of fully corpus-based tools. We present a completely automated approach for producing word categorizations for morphologically rich languages. Self-Organizing Map (SOM) is utilized for clustering words based on the morphological properties of the context words. These properties are extracted using an automated morphological segmentation algorithm called Morfessor. Our experiments on a colloquial Finnish corpus of stories told by young children show that utilizing unsupervised morphs as features leads to clearly improved clusterings when compared to the use of whole context words as features.

Research paper thumbnail of Text Retrieval Using Self-Organized Document Maps

Neural Processing Letters - NPL, 2002

A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized i... more A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection [3] show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive.

Research paper thumbnail of Large vocabulary statistical language modeling for continuous speech recognition in Finnish

Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speec... more Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speech recognition (LVCSR) system. The development of the standard SLM methods has been strongly affected by the goals of LVCSR in English. The structure of Finnish is substantially different from English, so if the standard SLMs are directly applied, the success is by no means granted. In this paper we describe our first attempts of building a LVCSR for Finnish and the new SLMs that we have tried. One of our objective has been the indexing and recognition of broadcast news, so special issues of our interest are topic detection, word stemming and modeling words that are poorly covered in the training data. Our new methods are based on neural computing using the self-organizing map (SOM) which has recently been shown to successfully extract and approximate latent semantic structures from massive text collections.

Research paper thumbnail of Text mining with the WEBSOM

Acta Polytechnica Scandinavica, Mathematics and …, 2000

TKK Text Mining with the WEBSOM. Krista Lagus. Dissertation for the degree of Doctor of Science i... more TKK Text Mining with the WEBSOM. Krista Lagus. Dissertation for the degree of Doctor of Science in Technology to be presented with ...

Research paper thumbnail of Data analysis of conceptual similarities of Finnish verbs

Cognitive Science, 2002

The study of the conceptual representations thatunderlie the use of language is a problem motivat... more The study of the conceptual representations thatunderlie the use of language is a problem motivatedfrom both a cognitive research point of view andthat of construing language models for various languageprocessing tasks. In this work, we organized600 Finnish verbs using the SOM algorithm. Threeexperiments were conducted using dierent featuresto encode the verbs: morphosyntactic properties,individual nouns, and noun categories in the contextof

Research paper thumbnail of MODELING COMMUNITIES OF EXPERTS

Finding ways in which communities of experts can benefit from each other is a question shared by ... more Finding ways in which communities of experts can benefit from each other is a question shared by the machine learning community and social sciences alike. Considerable research in machine learning methods has shown that communities of experts can provide consistently better classifications and decisions than single experts in various tasks and domains. Our aim is to extend the perspective on communities of experts to cover the wider context of socio-cognitive research. In particular, we discuss the sociocognitive research on the formation and use of expertise in relation to the modeling of concept formation, integration and use in human and artificial agents. We present three case studies related to problem solving and decision making in environmental policy, medical care, and consumer research. We present a methodological framework for the computational modeling of the phenomena described above. A specific emphasis is on unsupervised statistical machine learning of heterogeneous conceptual spaces in multi-agent systems and on the application of such conceptual expert knowledge.

Research paper thumbnail of Kuntotiedot kartalle - erilaiset hyvä- ja huonokuntoisten ryhmät näkyviin

Research paper thumbnail of ICA and SOM in text document analysis

Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02, 2002

In this study we show experimental results on using Independent Component Analysis (ICA) and the ... more In this study we show experimental results on using Independent Component Analysis (ICA) and the Self-Organizing Map (SOM) in document analysis. Our documents are segments of spoken dialogues carried out over the telephone in a customer service, transcribed into text. The task is to analyze the topics of the discussions, and to group the discussions into meaningful subsets. The quality of the grouping is studied by comparing to a manual topical classification of the documents.

Research paper thumbnail of Using Correlation Dimension for Analysing Text Data

Lecture Notes in Computer Science, 2010

In this article, we study the scale-dependent dimensionality properties and overall structure of ... more In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A comparison is also made with speech data. The results reflect some of the typical properties of the data and the use of our method in improving various data analysis applications is discussed.