Stanisław Szpakowicz - Academia.edu (original) (raw)
Papers by Stanisław Szpakowicz
Group Decision and Negotiation, 2007
We present an analysis of partial automation of content analysis using machine learning methods. ... more We present an analysis of partial automation of content analysis using machine learning methods. We use a decision-tree induction system to learn from manually categorized negotiation transcripts of electronic buyer–seller negotiations. The data we use were gathered using the Web-based negotiation support systems Inspire and SimpleNS. We experiment with various ways of representing the data to find the solution that gives the best results. The experiments show that we can identify, in relatively small data sets, linguistic features of interest for the detection of negotiation behaviour and negotiation-specific topics.
Lecture Notes in Computer Science, 2004
We propose a parser based on ideas from the Minimalist Programme. The parser supports free word o... more We propose a parser based on ideas from the Minimalist Programme. The parser supports free word order languages and simulates a human listener who necessarily begins sentence analysis before all the words in the sentence have become available. We first sketch the problems that free word order languages pose. Next we discuss an existing framework for minimalist parsing, and show how it is difficult to make it work for free word order languages and simulate realistic syntactic conditions. We briefly describe a formalism and a parsing algorithm that elegantly overcome these difficulties, and we illustrate them with detailed examples from Latin, a language whose word order freedom causes it to exhibit seemingly difficult discontinuous noun phrase situations.
Lecture Notes in Computer Science, 2003
Abstract. Morris and Hirst [10] present a method of linking significant words that are about the ... more Abstract. Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text ...
Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences HICSS-94, 1994
Page 1. Negotiation in Distributed Artificial Intelligence: Drawing from Human Experience Gregory... more Page 1. Negotiation in Distributed Artificial Intelligence: Drawing from Human Experience Gregory E. Kersten School of Business Carleton University Ottawa, Ontario, Canada gregory@business.carleton.ca Abstract Distributed ...
Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998
Semantic relationships among words and phrases are often marked by explicit syntactic or lexical ... more Semantic relationships among words and phrases are often marked by explicit syntactic or lexical clues that help recognize such relationships in texts. Within complex nominals, however, few overt clues are available. Systems that analyze such nominals must compensate for the lack of surface clues with other information. One way is to load the system with lexical semantics for nouns or adjectives. This merely shifts the problem elsewhere: how do we define the lexical semantics and build large semantic lexicons?
Lecture Notes in Computer Science, 2010
Extractive text summarization is the process of selecting relevant sentences from a collection of... more Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection. The question arises just how good extractive summarization can ever be. Without generating language to express the gist of a text -its abstract -can we expect to make summaries which are both readable and informative? In search for an answer, we employed a corpus partially labelled with Summary Content Units: snippets which convey the main ideas in the document collection. Starting from this corpus, we created SCU-optimal summaries for extractive summarization. We support the claim of optimality with a series of experiments.
Lecture Notes in Computer Science, 1996
The control of forest fires is a complex domain that requires a variety of knowledge and skills i... more The control of forest fires is a complex domain that requires a variety of knowledge and skills in decision making and planning under uncertainty. It poses a challenging problem for the design of simulation and support systems, and therefore acts as a good testbed for the application of intelligent system methodologies. One such a methodology is restructurable modelling. It is
Lecture Notes in Computer Science, 1996
There have been several proposals for a logic of decision making. However, none has been accompan... more There have been several proposals for a logic of decision making. However, none has been accompanied by a working implementation of a logic-based decision simulation and support system. The Negoplan system has been used to represent and simulate sequential decision making problems and has demonstrated its effectiveness in a number of real-world applications. In this paper we view Negoplan as
Current Issues in Linguistic Theory, 2004
We have implemented a system that measures semantic similarity using a computerized 1987 Roget&am... more We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus , and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet -based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget 's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is qui te close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget 's and WordNet to answer 80 TOEFL , 50 ESL and 300 Reader 's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of the questions respectively.
Lecture Notes in Computer Science, 2009
ABSTRACT Rank weight functions had been shown to increase the accuracy of measures of semantic re... more ABSTRACT Rank weight functions had been shown to increase the accuracy of measures of semantic relatedness for Polish. We present a generalised ranking principle and demonstrate its effect on a range of established measures of semantic relatedness, and on a different language. The results confirm that the generalised transformation method based on ranking brings an improvement over several well-known measures.
Lecture Notes in Computer Science, 1991
Without Abstract
Lecture Notes in Computer Science, 2001
Natural language text analysis presupposes the encoding of morphological phenomena. In this artic... more Natural language text analysis presupposes the encoding of morphological phenomena. In this article, we present some particularities of Modern Greek and the way these are encoded in the presented electronic lexicon. The project plan of its development combined both simple planning algorithms and more elaborate ones for the generation and recognition processes. The resulted lexicon exhibits fast access to its contents and easy content management. It is re-usable and modular enough to support existing NLP applications.
Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics, 1991
ABSTRACT
Lecture Notes in Computer Science, 2011
The clustering of related words is crucial for a variety of Natural Language Processing applicati... more The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget's Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget's Thesaurus automatically, and doing so with high confidence.
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2006
Different evaluation measures assess different characteristics of machine learning algorithms. Th... more Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate between researchers. Although most measures in use today focus on a classifier's ability to identify classes correctly, we suggest that, in certain cases, other properties, such as failure avoidance or class discrimination may also be useful. We suggest the application of measures which evaluate such properties. These measures -Youden's index, likelihood, Discriminant power -are used in medical diagnosis. We show that these measures are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of the proposed measures.
Lecture Notes in Computer Science, 2006
The paper shows how to construct language patterns that signal influence strategies and tactical ... more The paper shows how to construct language patterns that signal influence strategies and tactical moves corresponding to such strategies. We apply corpus analysis methods to the extraction of certain multiword patterns from the text data of electronic negotiations. The patterns thus acquired become features in the task of classifying those texts. A series of machine learning experiments predicts the negotiation outcome from the texts associated with first halves of negotiations. We compare the results with the classification of complete negotiations.
Lecture Notes in Computer Science, 2005
Lecture Notes in Computer Science, 1998
The evaluation of a large implemented natural language processing system involves more than its a... more The evaluation of a large implemented natural language processing system involves more than its application to a common performance task. Such tasks have been used in the message understanding conferences (MUCs), text retrieval conferences (TRECs) as well as in speech technology and machine translation workshops. It is useful to compare the performance of different systems in a predefined application, but a detailed evaluation must take into account the specificity of the system.
Lecture Notes in Computer Science, 2003
We propose an algorithm that will augment the structure of WordNet with links between the noun an... more We propose an algorithm that will augment the structure of WordNet with links between the noun and verb hierarchies, by using word definitions extracted from Longman's Dictionary of Contemporary English. The results obtained show that a simple algorithm gives promising ...
Group Decision and Negotiation, 2007
We present an analysis of partial automation of content analysis using machine learning methods. ... more We present an analysis of partial automation of content analysis using machine learning methods. We use a decision-tree induction system to learn from manually categorized negotiation transcripts of electronic buyer–seller negotiations. The data we use were gathered using the Web-based negotiation support systems Inspire and SimpleNS. We experiment with various ways of representing the data to find the solution that gives the best results. The experiments show that we can identify, in relatively small data sets, linguistic features of interest for the detection of negotiation behaviour and negotiation-specific topics.
Lecture Notes in Computer Science, 2004
We propose a parser based on ideas from the Minimalist Programme. The parser supports free word o... more We propose a parser based on ideas from the Minimalist Programme. The parser supports free word order languages and simulates a human listener who necessarily begins sentence analysis before all the words in the sentence have become available. We first sketch the problems that free word order languages pose. Next we discuss an existing framework for minimalist parsing, and show how it is difficult to make it work for free word order languages and simulate realistic syntactic conditions. We briefly describe a formalism and a parsing algorithm that elegantly overcome these difficulties, and we illustrate them with detailed examples from Latin, a language whose word order freedom causes it to exhibit seemingly difficult discontinuous noun phrase situations.
Lecture Notes in Computer Science, 2003
Abstract. Morris and Hirst [10] present a method of linking significant words that are about the ... more Abstract. Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text ...
Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences HICSS-94, 1994
Page 1. Negotiation in Distributed Artificial Intelligence: Drawing from Human Experience Gregory... more Page 1. Negotiation in Distributed Artificial Intelligence: Drawing from Human Experience Gregory E. Kersten School of Business Carleton University Ottawa, Ontario, Canada gregory@business.carleton.ca Abstract Distributed ...
Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998
Semantic relationships among words and phrases are often marked by explicit syntactic or lexical ... more Semantic relationships among words and phrases are often marked by explicit syntactic or lexical clues that help recognize such relationships in texts. Within complex nominals, however, few overt clues are available. Systems that analyze such nominals must compensate for the lack of surface clues with other information. One way is to load the system with lexical semantics for nouns or adjectives. This merely shifts the problem elsewhere: how do we define the lexical semantics and build large semantic lexicons?
Lecture Notes in Computer Science, 2010
Extractive text summarization is the process of selecting relevant sentences from a collection of... more Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection. The question arises just how good extractive summarization can ever be. Without generating language to express the gist of a text -its abstract -can we expect to make summaries which are both readable and informative? In search for an answer, we employed a corpus partially labelled with Summary Content Units: snippets which convey the main ideas in the document collection. Starting from this corpus, we created SCU-optimal summaries for extractive summarization. We support the claim of optimality with a series of experiments.
Lecture Notes in Computer Science, 1996
The control of forest fires is a complex domain that requires a variety of knowledge and skills i... more The control of forest fires is a complex domain that requires a variety of knowledge and skills in decision making and planning under uncertainty. It poses a challenging problem for the design of simulation and support systems, and therefore acts as a good testbed for the application of intelligent system methodologies. One such a methodology is restructurable modelling. It is
Lecture Notes in Computer Science, 1996
There have been several proposals for a logic of decision making. However, none has been accompan... more There have been several proposals for a logic of decision making. However, none has been accompanied by a working implementation of a logic-based decision simulation and support system. The Negoplan system has been used to represent and simulate sequential decision making problems and has demonstrated its effectiveness in a number of real-world applications. In this paper we view Negoplan as
Current Issues in Linguistic Theory, 2004
We have implemented a system that measures semantic similarity using a computerized 1987 Roget&am... more We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus , and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet -based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget 's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is qui te close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget 's and WordNet to answer 80 TOEFL , 50 ESL and 300 Reader 's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of the questions respectively.
Lecture Notes in Computer Science, 2009
ABSTRACT Rank weight functions had been shown to increase the accuracy of measures of semantic re... more ABSTRACT Rank weight functions had been shown to increase the accuracy of measures of semantic relatedness for Polish. We present a generalised ranking principle and demonstrate its effect on a range of established measures of semantic relatedness, and on a different language. The results confirm that the generalised transformation method based on ranking brings an improvement over several well-known measures.
Lecture Notes in Computer Science, 1991
Without Abstract
Lecture Notes in Computer Science, 2001
Natural language text analysis presupposes the encoding of morphological phenomena. In this artic... more Natural language text analysis presupposes the encoding of morphological phenomena. In this article, we present some particularities of Modern Greek and the way these are encoded in the presented electronic lexicon. The project plan of its development combined both simple planning algorithms and more elaborate ones for the generation and recognition processes. The resulted lexicon exhibits fast access to its contents and easy content management. It is re-usable and modular enough to support existing NLP applications.
Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics, 1991
ABSTRACT
Lecture Notes in Computer Science, 2011
The clustering of related words is crucial for a variety of Natural Language Processing applicati... more The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget's Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget's Thesaurus automatically, and doing so with high confidence.
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2006
Different evaluation measures assess different characteristics of machine learning algorithms. Th... more Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate between researchers. Although most measures in use today focus on a classifier's ability to identify classes correctly, we suggest that, in certain cases, other properties, such as failure avoidance or class discrimination may also be useful. We suggest the application of measures which evaluate such properties. These measures -Youden's index, likelihood, Discriminant power -are used in medical diagnosis. We show that these measures are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of the proposed measures.
Lecture Notes in Computer Science, 2006
The paper shows how to construct language patterns that signal influence strategies and tactical ... more The paper shows how to construct language patterns that signal influence strategies and tactical moves corresponding to such strategies. We apply corpus analysis methods to the extraction of certain multiword patterns from the text data of electronic negotiations. The patterns thus acquired become features in the task of classifying those texts. A series of machine learning experiments predicts the negotiation outcome from the texts associated with first halves of negotiations. We compare the results with the classification of complete negotiations.
Lecture Notes in Computer Science, 2005
Lecture Notes in Computer Science, 1998
The evaluation of a large implemented natural language processing system involves more than its a... more The evaluation of a large implemented natural language processing system involves more than its application to a common performance task. Such tasks have been used in the message understanding conferences (MUCs), text retrieval conferences (TRECs) as well as in speech technology and machine translation workshops. It is useful to compare the performance of different systems in a predefined application, but a detailed evaluation must take into account the specificity of the system.
Lecture Notes in Computer Science, 2003
We propose an algorithm that will augment the structure of WordNet with links between the noun an... more We propose an algorithm that will augment the structure of WordNet with links between the noun and verb hierarchies, by using word definitions extracted from Longman's Dictionary of Contemporary English. The results obtained show that a simple algorithm gives promising ...