Florian Kunneman | Vrije Universiteit Amsterdam (original) (raw)
Papers by Florian Kunneman
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts su... more To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with a hashtag such as '#sarcasm'. We collected a training corpus of about 406 thousand Dutch tweets with hashtag synonyms denoting sarcasm. Assuming that the human labeling is correct (annotation of a sample indicates that about 90% of these tweets are indeed sarcastic), we train a machine learning classifier on the harvested examples, and apply it to a sample of a day's stream of 2.25 million Dutch tweets. Of the 353 explicitly marked tweets on this day, we detect 309 (87%) with the hashtag removed. We annotate the top of the ranked list of tweets most likely to be sarcastic that do not have the explicit hashtag. 35% of the top-250 ranked tweets are indeed sarcastic. Analysis indicates that the use of hashtags reduces the further use of linguistic markers for signalling sarcasm, such as exclamations and intensifiers. We hypothesize that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm. Checking the consistency of our finding in a language from another language family, we observe that in French the hashtag '#sarcasme' has a similar polarity switching function, be it to a lesser extent.
In deze scriptie wordt betoogd waarom twee verschillende modellen van het Web, Web 2.0 en het Sem... more In deze scriptie wordt betoogd waarom twee verschillende modellen van het Web, Web 2.0 en het Semantic Web, met elkaar geïntegreerd zouden moeten worden, en hoe deze integratie in zijn werk zou moeten gaan. Op het eerste oog staan de twee Webben ver van elkaar af. Web 2.0 is met name gericht op een lage instapgrens voor actieve participatie door de gebruiker. Bekende componenten hiervan zijn weblogs, Wikipedia, YouTube en Facebook. Mensen kunnen met deze diensten voor inhoud op het Web zorgen, zonder dat hiervoor veel kennis van het Web is vereist. Het Semantic Web daarentegen is een stuk minder gericht op de gewone gebruiker. Het is de overkoepelende term voor het streven naar een gigantische database op het Web, waarin gegevens op een betekenisvolle manier met elkaar samenhangen en waarbij die betekenis door de computer te begrijpen is. De datastructuur van het Semantic Web bestaat naast HTML en is relatief complex. Toch zorgen de verschillen tussen het Semantic Web en Web 2.0 er ...
Het Internet dankt haar populariteit onder andere aan het feit dat het een grote hoeveelheid aan ... more Het Internet dankt haar populariteit onder andere aan het feit dat het een grote hoeveelheid aan informatie voor mensen toegankelijk maakt. De uitspraak “dat googlen we even”, verwijzend naar de meest gebruikte zoekmachine op het Internet, is dan ook gemeengoed geworden. Maar hoewel Google en andere zoekmachines erg populair zijn, vergen ze een actieve opstelling bij de gebruiker: het is aan hem of haar om te beoordelen of zoekresultaten relevant zijn en aan de hand van de zoekresultaten verder te navigeren. De informatie waar een zoekmachine toe leidt is zogezegd ‘plat’, het is kale tekst met eventuele hyperlinks naar andere tekst. De aard van een hyperlink-relatie kan alleen in de tekst zelf gegeven worden. Het zoeken in een database verloopt wat dat betreft efficiënter. Daarin hebben gegevens een vastgestelde plek en zijn ze aan elkaar gerelateerd op een manier die door machines te herkennen is. Aan de hand van een zoekopdracht worden automatisch de gegevens doorzocht en de gewen...
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
The large number of messages on Twitter posted each day provide rich insights into real-world eve... more The large number of messages on Twitter posted each day provide rich insights into real-world events and public opinion. However, it is difficult to automatically distinguish tweets referring to such events from everyday chatter, and subsequently to distinguish significant events affecting many people from insignificant events. We apply a term-pivot approach to event detection from the Twitter stream. In order to filter out noisy and mundane events, we train a machine learning classifier on several rich features, and rank the events based on classifier confidence. After training and re-training the classifier using manually annotated data, we obtain an F β=1 score of 0.79. However, a baseline that only takes into account the frequency of the tweets that refer to an event yields a better F β=1 score of 0.86. We argue that performance is highly related to the definition of what makes a significant event, and that human understanding of this concept is not uniform.
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
We present a method for the identification of future event start dates from Twitter streams. Taki... more We present a method for the identification of future event start dates from Twitter streams. Taking hashtags or event name expressions as query terms, the method gathers a certain number of tweets about an event and uses clues in these tweets to estimate at what date the event will start. Clues include temporal expressions with knowledge-based and automatically generated estimations, and other predictive words. The estimation is performed either with a machine-learning classifier or by taking a majority vote over the temporal expressions found in the set of tweets. Results show that temporal expressions are indeed strong predictors. The majority-based and machine-learning approaches attain equal performances when trained and tested on a single event type, soccer matches, with an average estimation error of 0.05 days; but when tested on a range of different events, the majority-voting approach shows to be more robust than machine learning for this task, yielding high performance on all events. Still, per-event differences hint at a context in which machine learning might be beneficial.
We describe and test three methods to estimate the remaining time between a series of microtexts ... more We describe and test three methods to estimate the remaining time between a series of microtexts (tweets) and the future event they refer to via a hashtag. Our system generates hourly forecasts. A linear and a local regression-based approach are applied to map hourly clusters of tweets directly onto time-to-event. To take changes over time into account, we develop a novel time series analysis approach that first derives word frequency time series from sets of tweets and then performs local regression to predict timeto-event from nearest-neighbor time series. We train and test on a single type of event, Dutch premier league football matches. Our results indicate that in an 'early' stage, four days or more before the event, the time series analysis produces time-to-event predictions that are about one day off; closer to the event, local regression attains a similar accuracy. Local regression also outperforms both mean and median-based baselines, but on average none of the tested system has a consistently strong performance through time.
Explicit references on Twitter to future events can be leveraged to feed a fully automatic monito... more Explicit references on Twitter to future events can be leveraged to feed a fully automatic monitoring system of real-world events. We describe a system that extracts open-domain future events from the Twitter stream. It detects future time expressions and entity mentions in tweets, clusters tweets together that overlap in these mentions above certain thresholds, and summarizes these clusters into event descriptions that can be presented to users of the system. Terms for the event description are selected in an unsupervised fashion. 1 We evaluated the system on a month of Dutch tweets, by showing the top-250 ranked events found in this month to human annotators. Eighty per cent of the candidate events were indeed assessed as being an event by at least three out of four human annotators, while all four annotators regarded sixty-three per cent as a real event. An added component to complement event descriptions with additional terms was not assessed better than the original system, due to the occasional addition of redundant terms. Comparing the found events to gold-standard events from maintained calendars on the Web mentioned in at least five tweets, the system yields a recall-at-250 of 0.20 and a recall based on all retrieved events of 0.40.
Hashtags in Twitter posts may carry different semantic payloads. Their dual form (word and label)... more Hashtags in Twitter posts may carry different semantic payloads. Their dual form (word and label) may serve to categorize the tweet, but may also add content to the message, or strengthen it. Some hashtags are related to emotions. In a study on emotional hashtags in Dutch Twitter posts we employ machine learning classifiers to test to what extent tweets that are stripped from their hashtag could be re-assigned to this hashtag. About half of the 24 tested hashtags can be predicted with AUC scores of .80 or higher. However, when we apply the three best-performing classifiers to unseen tweets that do not carry the hashtag but might have carried it according to human annotators, the classifiers manage to attain a precision-at-250 of .7 for only two of the hashtags. We observe that some hashtags are predictable from their tweets, and strengthen the emotion already expressed in the tweets. Other hashtags are added to messages that do not predict them, presumably to provide emotional information that was not yet in the tweet.
Many events referred to on Twitter are of a periodic nature, characterized by roughly constant ti... more Many events referred to on Twitter are of a periodic nature, characterized by roughly constant time intervals in between occurrences. Examples are annual music festivals, weekly television programs, and the full moon cycle. We propose a system that can automatically identify periodic events from Twitter in an unsupervised and open-domain fashion. We first extract events from the Twitter stream by associating terms that have a high probability of denoting an event to the exact date of the event. We compare a timeline-based and a calendar-based approach to detecting periodic patterns from the event dates that are connected to these terms. After applying event extraction on over four years of Dutch tweets and scanning the resulting events for periodic patterns, the calendar-based approach yields a precision of 0.76 on the 500 top-ranked periodic events, while the timeline-based approach scores 0.63.
BOOK OF ABSTRACTS OF THE 23RD MEETING OF COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS: CLIN 2013, 2013
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
Information Processing & Management, 2014
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts su... more To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with a hashtag such as '#sarcasm'. We collected a training corpus of about 406 thousand Dutch tweets with hashtag synonyms denoting sarcasm. Assuming that the human labeling is correct (annotation of a sample indicates that about 90% of these tweets are indeed sarcastic), we train a machine learning classifier on the harvested examples, and apply it to a sample of a day's stream of 2.25 million Dutch tweets. Of the 353 explicitly marked tweets on this day, we detect 309 (87%) with the hashtag removed. We annotate the top of the ranked list of tweets most likely to be sarcastic that do not have the explicit hashtag. 35% of the top-250 ranked tweets are indeed sarcastic. Analysis indicates that the use of hashtags reduces the further use of linguistic markers for signalling sarcasm, such as exclamations and intensifiers. We hypothesize that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm. Checking the consistency of our finding in a language from another language family, we observe that in French the hashtag '#sarcasme' has a similar polarity switching function, be it to a lesser extent.
BOOK OF ABSTRACTS OF THE 23RD MEETING OF COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS: CLIN 2013, 2013
Events are mainly described in textual data by domain terms, verbs, time expressions, place names... more Events are mainly described in textual data by domain terms, verbs, time expressions, place names and participant information. Human readers understand features and the phase of the event by decoding these signals. Textual descriptions of events change with the time of the event and with the time the event is described. Therefore, analysis of this change in linguistic structure may provide insights to be implemented in information systems in order to identify an event phase automatically.
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts su... more To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with a hashtag such as '#sarcasm'. We collected a training corpus of about 406 thousand Dutch tweets with hashtag synonyms denoting sarcasm. Assuming that the human labeling is correct (annotation of a sample indicates that about 90% of these tweets are indeed sarcastic), we train a machine learning classifier on the harvested examples, and apply it to a sample of a day's stream of 2.25 million Dutch tweets. Of the 353 explicitly marked tweets on this day, we detect 309 (87%) with the hashtag removed. We annotate the top of the ranked list of tweets most likely to be sarcastic that do not have the explicit hashtag. 35% of the top-250 ranked tweets are indeed sarcastic. Analysis indicates that the use of hashtags reduces the further use of linguistic markers for signalling sarcasm, such as exclamations and intensifiers. We hypothesize that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm. Checking the consistency of our finding in a language from another language family, we observe that in French the hashtag '#sarcasme' has a similar polarity switching function, be it to a lesser extent.
In deze scriptie wordt betoogd waarom twee verschillende modellen van het Web, Web 2.0 en het Sem... more In deze scriptie wordt betoogd waarom twee verschillende modellen van het Web, Web 2.0 en het Semantic Web, met elkaar geïntegreerd zouden moeten worden, en hoe deze integratie in zijn werk zou moeten gaan. Op het eerste oog staan de twee Webben ver van elkaar af. Web 2.0 is met name gericht op een lage instapgrens voor actieve participatie door de gebruiker. Bekende componenten hiervan zijn weblogs, Wikipedia, YouTube en Facebook. Mensen kunnen met deze diensten voor inhoud op het Web zorgen, zonder dat hiervoor veel kennis van het Web is vereist. Het Semantic Web daarentegen is een stuk minder gericht op de gewone gebruiker. Het is de overkoepelende term voor het streven naar een gigantische database op het Web, waarin gegevens op een betekenisvolle manier met elkaar samenhangen en waarbij die betekenis door de computer te begrijpen is. De datastructuur van het Semantic Web bestaat naast HTML en is relatief complex. Toch zorgen de verschillen tussen het Semantic Web en Web 2.0 er ...
Het Internet dankt haar populariteit onder andere aan het feit dat het een grote hoeveelheid aan ... more Het Internet dankt haar populariteit onder andere aan het feit dat het een grote hoeveelheid aan informatie voor mensen toegankelijk maakt. De uitspraak “dat googlen we even”, verwijzend naar de meest gebruikte zoekmachine op het Internet, is dan ook gemeengoed geworden. Maar hoewel Google en andere zoekmachines erg populair zijn, vergen ze een actieve opstelling bij de gebruiker: het is aan hem of haar om te beoordelen of zoekresultaten relevant zijn en aan de hand van de zoekresultaten verder te navigeren. De informatie waar een zoekmachine toe leidt is zogezegd ‘plat’, het is kale tekst met eventuele hyperlinks naar andere tekst. De aard van een hyperlink-relatie kan alleen in de tekst zelf gegeven worden. Het zoeken in een database verloopt wat dat betreft efficiënter. Daarin hebben gegevens een vastgestelde plek en zijn ze aan elkaar gerelateerd op een manier die door machines te herkennen is. Aan de hand van een zoekopdracht worden automatisch de gegevens doorzocht en de gewen...
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
The large number of messages on Twitter posted each day provide rich insights into real-world eve... more The large number of messages on Twitter posted each day provide rich insights into real-world events and public opinion. However, it is difficult to automatically distinguish tweets referring to such events from everyday chatter, and subsequently to distinguish significant events affecting many people from insignificant events. We apply a term-pivot approach to event detection from the Twitter stream. In order to filter out noisy and mundane events, we train a machine learning classifier on several rich features, and rank the events based on classifier confidence. After training and re-training the classifier using manually annotated data, we obtain an F β=1 score of 0.79. However, a baseline that only takes into account the frequency of the tweets that refer to an event yields a better F β=1 score of 0.86. We argue that performance is highly related to the definition of what makes a significant event, and that human understanding of this concept is not uniform.
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
We present a method for the identification of future event start dates from Twitter streams. Taki... more We present a method for the identification of future event start dates from Twitter streams. Taking hashtags or event name expressions as query terms, the method gathers a certain number of tweets about an event and uses clues in these tweets to estimate at what date the event will start. Clues include temporal expressions with knowledge-based and automatically generated estimations, and other predictive words. The estimation is performed either with a machine-learning classifier or by taking a majority vote over the temporal expressions found in the set of tweets. Results show that temporal expressions are indeed strong predictors. The majority-based and machine-learning approaches attain equal performances when trained and tested on a single event type, soccer matches, with an average estimation error of 0.05 days; but when tested on a range of different events, the majority-voting approach shows to be more robust than machine learning for this task, yielding high performance on all events. Still, per-event differences hint at a context in which machine learning might be beneficial.
We describe and test three methods to estimate the remaining time between a series of microtexts ... more We describe and test three methods to estimate the remaining time between a series of microtexts (tweets) and the future event they refer to via a hashtag. Our system generates hourly forecasts. A linear and a local regression-based approach are applied to map hourly clusters of tweets directly onto time-to-event. To take changes over time into account, we develop a novel time series analysis approach that first derives word frequency time series from sets of tweets and then performs local regression to predict timeto-event from nearest-neighbor time series. We train and test on a single type of event, Dutch premier league football matches. Our results indicate that in an 'early' stage, four days or more before the event, the time series analysis produces time-to-event predictions that are about one day off; closer to the event, local regression attains a similar accuracy. Local regression also outperforms both mean and median-based baselines, but on average none of the tested system has a consistently strong performance through time.
Explicit references on Twitter to future events can be leveraged to feed a fully automatic monito... more Explicit references on Twitter to future events can be leveraged to feed a fully automatic monitoring system of real-world events. We describe a system that extracts open-domain future events from the Twitter stream. It detects future time expressions and entity mentions in tweets, clusters tweets together that overlap in these mentions above certain thresholds, and summarizes these clusters into event descriptions that can be presented to users of the system. Terms for the event description are selected in an unsupervised fashion. 1 We evaluated the system on a month of Dutch tweets, by showing the top-250 ranked events found in this month to human annotators. Eighty per cent of the candidate events were indeed assessed as being an event by at least three out of four human annotators, while all four annotators regarded sixty-three per cent as a real event. An added component to complement event descriptions with additional terms was not assessed better than the original system, due to the occasional addition of redundant terms. Comparing the found events to gold-standard events from maintained calendars on the Web mentioned in at least five tweets, the system yields a recall-at-250 of 0.20 and a recall based on all retrieved events of 0.40.
Hashtags in Twitter posts may carry different semantic payloads. Their dual form (word and label)... more Hashtags in Twitter posts may carry different semantic payloads. Their dual form (word and label) may serve to categorize the tweet, but may also add content to the message, or strengthen it. Some hashtags are related to emotions. In a study on emotional hashtags in Dutch Twitter posts we employ machine learning classifiers to test to what extent tweets that are stripped from their hashtag could be re-assigned to this hashtag. About half of the 24 tested hashtags can be predicted with AUC scores of .80 or higher. However, when we apply the three best-performing classifiers to unseen tweets that do not carry the hashtag but might have carried it according to human annotators, the classifiers manage to attain a precision-at-250 of .7 for only two of the hashtags. We observe that some hashtags are predictable from their tweets, and strengthen the emotion already expressed in the tweets. Other hashtags are added to messages that do not predict them, presumably to provide emotional information that was not yet in the tweet.
Many events referred to on Twitter are of a periodic nature, characterized by roughly constant ti... more Many events referred to on Twitter are of a periodic nature, characterized by roughly constant time intervals in between occurrences. Examples are annual music festivals, weekly television programs, and the full moon cycle. We propose a system that can automatically identify periodic events from Twitter in an unsupervised and open-domain fashion. We first extract events from the Twitter stream by associating terms that have a high probability of denoting an event to the exact date of the event. We compare a timeline-based and a calendar-based approach to detecting periodic patterns from the event dates that are connected to these terms. After applying event extraction on over four years of Dutch tweets and scanning the resulting events for periodic patterns, the calendar-based approach yields a precision of 0.76 on the 500 top-ranked periodic events, while the timeline-based approach scores 0.63.
BOOK OF ABSTRACTS OF THE 23RD MEETING OF COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS: CLIN 2013, 2013
A considerable portion of social media messages is devoted to current events. Aside from referenc... more A considerable portion of social media messages is devoted to current events. Aside from references to events that recently happened, social media messages may also refer to events that have not occurred yet. Future events, such as football matches in the case study we present here, may be scheduled and known to happen; other future events, such as transfers of football players, may only be rumoured, and may in fact not happen in the end. We describe a news mining component that learns to identify tweets referring to scheduled and unscheduled future events, by being trained on messages referring to scheduled future events (as the latter are easy to harvest). Our results show that discriminating between tweets that refer to upcoming football matches and tweets that refer to past matches can be done relatively reliably with supervised machine learning methods. However, when these trained models are applied to unscheduled events, performance drops to near-baseline performance. We discuss how these results can be explained by the distinction between event type and event domain.
Information Processing & Management, 2014
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts su... more To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with a hashtag such as '#sarcasm'. We collected a training corpus of about 406 thousand Dutch tweets with hashtag synonyms denoting sarcasm. Assuming that the human labeling is correct (annotation of a sample indicates that about 90% of these tweets are indeed sarcastic), we train a machine learning classifier on the harvested examples, and apply it to a sample of a day's stream of 2.25 million Dutch tweets. Of the 353 explicitly marked tweets on this day, we detect 309 (87%) with the hashtag removed. We annotate the top of the ranked list of tweets most likely to be sarcastic that do not have the explicit hashtag. 35% of the top-250 ranked tweets are indeed sarcastic. Analysis indicates that the use of hashtags reduces the further use of linguistic markers for signalling sarcasm, such as exclamations and intensifiers. We hypothesize that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm. Checking the consistency of our finding in a language from another language family, we observe that in French the hashtag '#sarcasme' has a similar polarity switching function, be it to a lesser extent.
BOOK OF ABSTRACTS OF THE 23RD MEETING OF COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS: CLIN 2013, 2013
Events are mainly described in textual data by domain terms, verbs, time expressions, place names... more Events are mainly described in textual data by domain terms, verbs, time expressions, place names and participant information. Human readers understand features and the phase of the event by decoding these signals. Textual descriptions of events change with the time of the event and with the time the event is described. Therefore, analysis of this change in linguistic structure may provide insights to be implemented in information systems in order to identify an event phase automatically.