Sven Schmeier - Academia.edu (original) (raw)
Papers by Sven Schmeier
Lecture Notes in Computer Science, 2010
In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the... more In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -single and multi word terms -as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -a combination of metadata and document processing.
Proceedings of the sixth …, 2000
Customer care in technical domains is increasingly based on e-mail communication, allowing for th... more Customer care in technical domains is increasingly based on e-mail communication, allowing for the reproduction of approved solutions. Identifying the customer's problem is often time-consuming, as the problem space changes if new products are launched. This paper describes a new approach to the classification of e-mail requests based on shallow text processing and machine learning techniques. It is implemented within an assistance system for call center agents that is used in a commercial setting.
Human translators are the key to evaluating machine translation (MT) quality and also to addressi... more Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. Usually, human judgments come in the form of ranking outputs of different translation systems and recently, post-edits of MT output have come into focus. This paper describes the results of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing. Translation outputs from three domains and six translation directions generated by five distinct translation systems have been analysed with the goal of getting relevant insights for further improvement of MT quality and applicability.
Theory and Applications of Natural Language Processing, 2012
In the following, we present an approach using interactive topic graph extraction for the explora... more In the following, we present an approach using interactive topic graph extraction for the exploration of web content. The initial information request, in the form of a query topic description, is issued online by a user to the system. The topic graph is then constructed from N web snippets that are produced by a standard search engine. We consider the extraction of a topic graph to be a specific empirical collocation extraction task, where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. This topic graph can then be further analyzed by users so that they can request additional background information with the help of interesting nodes and pairs of nodes in the topic graph, e.g., explicit relationships extracted from Wikipedia or those automatically extracted from additional Web content as well as conceptual information of the topic in form of semantically oriented clusters of descriptive phrases. This information is presented to the users, who can investigate the identified information nuggets to refine their information search. An initial user evaluation shows that our approach is especially helpful for finding new interesting information on topics about which the user has only a vague idea or no idea, at all.
We present MobEx, a mobile touchable application for exploratory search on the mobile web. The sy... more We present MobEx, a mobile touchable application for exploratory search on the mobile web. The system has been implemented for operation on a tablet computer, i.e. an Apple iPad, and on a mobile device, i.e. Apple iPhone or iPod touch. Starting from a topic issued by the user the system collects web snippets that have been determined by a standard search engine in a first step and extracts associated topics to the initial query in an unsupervised way on-demand and highly performant. This process is recursive in priciple as it furthermore determines other topics associated to the newly found ones and so forth. As a result MobEx creates a dense web of associated topics that is presented to the user as an interactive topic graph. We consider the extraction of topics as a specific empirical collocation extraction task where collocations are extracted between chunks combined with the cluster descriptions of an online clustering algorithm. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. These syntactically-oriented chunk pairs are then semantically ranked and filtered using the cluster descriptions created by a Singular Value Decomposition (SVD) approach. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.
The classification of texts is one of the most important cross-application technologies in inform... more The classification of texts is one of the most important cross-application technologies in information management. It is relevant to many tasks, including text filtering, information retrieval, and information extraction. In this paper, we describe the development and performance of a robust text classification suite, XM-XtraClass, suitable for industrial use. The classification components use modern machine learning algorithms (Support Vector Machines (SVM) and a rule induction technique based on boosting) as well as more conventional techniques. Thanks to careful optimisations and tests, these components are arguably the most accurate currently available commercially and are also very robust on real-world corpora.
ABSTRACT Customer support departments of large companies are often faced with large amounts of cu... more ABSTRACT Customer support departments of large companies are often faced with large amounts of customer requests about the same issue. These requests are usually answered by using preformulated text blocks. However, choosing the right text from a large number of text blocks can be challenging for the customer support agent, especially when the text blocks are thematically related. Optimizing this process using the power of language and knowledge technologies can save resources and improve customer satisfaction. We present a joint project between OMQ GmbH (www.omq.de) and the Language Technology lab of the DFKI GmbH (www.dfki.de) (German Research Center for Artificial Intelligence), in which, starting from the customer support system developed by OMQ, we addressed two major challenges: First, the classification of incoming customer requests into previously defined problem cases; second, the identification of new problem cases in a set of unclassified customer requests. The two tasks were approached using linguistic and statistical methods combined with machine learning techniques.
AnswerBus (http://www.answerbus.com/, (2,3)) is a Web-based open-domain Question-Answering (QA) s... more AnswerBus (http://www.answerbus.com/, (2,3)) is a Web-based open-domain Question-Answering (QA) system. It successfully uses NLP/IR techniques and reaches very high correct answer rate. Although it is not designed for TREC, it still correctly answers over 70% of TREC-8 questions with Web resources. The question remains whether the techniques for a web-based QA system can be deployed to a local archive. In
KI - Künstliche Intelligenz, 2012
ABSTRACT Customer support departments of large companies are often faced with large amounts of cu... more ABSTRACT Customer support departments of large companies are often faced with large amounts of customer requests about the same issue. These requests are usually answered by using preformulated text blocks. However, choosing the right text from a large number of text blocks can be challenging for the customer support agent, especially when the text blocks are thematically related. Optimizing this process using the power of language and knowledge technologies can save resources and improve customer satisfaction. We present a joint project between OMQ GmbH (www.omq.de) and the Language Technology lab of the DFKI GmbH (www.dfki.de) (German Research Center for Artificial Intelligence), in which, starting from the customer support system developed by OMQ, we addressed two major challenges: First, the classification of incoming customer requests into previously defined problem cases; second, the identification of new problem cases in a set of unclassified customer requests. The two tasks were approached using linguistic and statistical methods combined with machine learning techniques.
Lecture Notes in Computer Science, 2010
In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the... more In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -single and multi word terms -as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -a combination of metadata and document processing.
This paper describes SPMED, a system for robust and accurate linguistic parsing of medical docume... more This paper describes SPMED, a system for robust and accurate linguistic parsing of medical documents which is used in several industrial products. The basic design criterion of the system is of providing a set of basic powerful, robust, and generic linguistic knowledge sources and modules which can easily customized for processing different tasks in a flexible manner. The main application is seen in linguistic analysis of medical documents, yet the technology is easily applicable to other domains
Lecture Notes in Computer Science, 2010
In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the... more In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -single and multi word terms -as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -a combination of metadata and document processing.
Proceedings of the sixth …, 2000
Customer care in technical domains is increasingly based on e-mail communication, allowing for th... more Customer care in technical domains is increasingly based on e-mail communication, allowing for the reproduction of approved solutions. Identifying the customer's problem is often time-consuming, as the problem space changes if new products are launched. This paper describes a new approach to the classification of e-mail requests based on shallow text processing and machine learning techniques. It is implemented within an assistance system for call center agents that is used in a commercial setting.
Human translators are the key to evaluating machine translation (MT) quality and also to addressi... more Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. Usually, human judgments come in the form of ranking outputs of different translation systems and recently, post-edits of MT output have come into focus. This paper describes the results of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing. Translation outputs from three domains and six translation directions generated by five distinct translation systems have been analysed with the goal of getting relevant insights for further improvement of MT quality and applicability.
Theory and Applications of Natural Language Processing, 2012
In the following, we present an approach using interactive topic graph extraction for the explora... more In the following, we present an approach using interactive topic graph extraction for the exploration of web content. The initial information request, in the form of a query topic description, is issued online by a user to the system. The topic graph is then constructed from N web snippets that are produced by a standard search engine. We consider the extraction of a topic graph to be a specific empirical collocation extraction task, where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. This topic graph can then be further analyzed by users so that they can request additional background information with the help of interesting nodes and pairs of nodes in the topic graph, e.g., explicit relationships extracted from Wikipedia or those automatically extracted from additional Web content as well as conceptual information of the topic in form of semantically oriented clusters of descriptive phrases. This information is presented to the users, who can investigate the identified information nuggets to refine their information search. An initial user evaluation shows that our approach is especially helpful for finding new interesting information on topics about which the user has only a vague idea or no idea, at all.
We present MobEx, a mobile touchable application for exploratory search on the mobile web. The sy... more We present MobEx, a mobile touchable application for exploratory search on the mobile web. The system has been implemented for operation on a tablet computer, i.e. an Apple iPad, and on a mobile device, i.e. Apple iPhone or iPod touch. Starting from a topic issued by the user the system collects web snippets that have been determined by a standard search engine in a first step and extracts associated topics to the initial query in an unsupervised way on-demand and highly performant. This process is recursive in priciple as it furthermore determines other topics associated to the newly found ones and so forth. As a result MobEx creates a dense web of associated topics that is presented to the user as an interactive topic graph. We consider the extraction of topics as a specific empirical collocation extraction task where collocations are extracted between chunks combined with the cluster descriptions of an online clustering algorithm. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. These syntactically-oriented chunk pairs are then semantically ranked and filtered using the cluster descriptions created by a Singular Value Decomposition (SVD) approach. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.
The classification of texts is one of the most important cross-application technologies in inform... more The classification of texts is one of the most important cross-application technologies in information management. It is relevant to many tasks, including text filtering, information retrieval, and information extraction. In this paper, we describe the development and performance of a robust text classification suite, XM-XtraClass, suitable for industrial use. The classification components use modern machine learning algorithms (Support Vector Machines (SVM) and a rule induction technique based on boosting) as well as more conventional techniques. Thanks to careful optimisations and tests, these components are arguably the most accurate currently available commercially and are also very robust on real-world corpora.
ABSTRACT Customer support departments of large companies are often faced with large amounts of cu... more ABSTRACT Customer support departments of large companies are often faced with large amounts of customer requests about the same issue. These requests are usually answered by using preformulated text blocks. However, choosing the right text from a large number of text blocks can be challenging for the customer support agent, especially when the text blocks are thematically related. Optimizing this process using the power of language and knowledge technologies can save resources and improve customer satisfaction. We present a joint project between OMQ GmbH (www.omq.de) and the Language Technology lab of the DFKI GmbH (www.dfki.de) (German Research Center for Artificial Intelligence), in which, starting from the customer support system developed by OMQ, we addressed two major challenges: First, the classification of incoming customer requests into previously defined problem cases; second, the identification of new problem cases in a set of unclassified customer requests. The two tasks were approached using linguistic and statistical methods combined with machine learning techniques.
AnswerBus (http://www.answerbus.com/, (2,3)) is a Web-based open-domain Question-Answering (QA) s... more AnswerBus (http://www.answerbus.com/, (2,3)) is a Web-based open-domain Question-Answering (QA) system. It successfully uses NLP/IR techniques and reaches very high correct answer rate. Although it is not designed for TREC, it still correctly answers over 70% of TREC-8 questions with Web resources. The question remains whether the techniques for a web-based QA system can be deployed to a local archive. In
KI - Künstliche Intelligenz, 2012
ABSTRACT Customer support departments of large companies are often faced with large amounts of cu... more ABSTRACT Customer support departments of large companies are often faced with large amounts of customer requests about the same issue. These requests are usually answered by using preformulated text blocks. However, choosing the right text from a large number of text blocks can be challenging for the customer support agent, especially when the text blocks are thematically related. Optimizing this process using the power of language and knowledge technologies can save resources and improve customer satisfaction. We present a joint project between OMQ GmbH (www.omq.de) and the Language Technology lab of the DFKI GmbH (www.dfki.de) (German Research Center for Artificial Intelligence), in which, starting from the customer support system developed by OMQ, we addressed two major challenges: First, the classification of incoming customer requests into previously defined problem cases; second, the identification of new problem cases in a set of unclassified customer requests. The two tasks were approached using linguistic and statistical methods combined with machine learning techniques.
Lecture Notes in Computer Science, 2010
In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the... more In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -single and multi word terms -as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -a combination of metadata and document processing.
This paper describes SPMED, a system for robust and accurate linguistic parsing of medical docume... more This paper describes SPMED, a system for robust and accurate linguistic parsing of medical documents which is used in several industrial products. The basic design criterion of the system is of providing a set of basic powerful, robust, and generic linguistic knowledge sources and modules which can easily customized for processing different tasks in a flexible manner. The main application is seen in linguistic analysis of medical documents, yet the technology is easily applicable to other domains