Delphine Bernhard - Academia.edu (original) (raw)
Papers by Delphine Bernhard
Text readability depends on a variety of variables. While lexico-semantic and syntactic factors h... more Text readability depends on a variety of variables. While lexico-semantic and syntactic factors have been widely used in the literature, more hight-level discursive and cognitive properties such as cohesion and coherence have received little attention. This paper assesses the efficiency of 41 measures of text cohesion and text coherence as predictors of text readability. We compare results manually obtained on two corpora including texts with different difficulty levels, and show that some cohesive features are indeed useful predictors.
Lecture Notes in Computer Science, 2011
Automatically determining the publication date of a document is a complex task, since a document ... more Automatically determining the publication date of a document is a complex task, since a document may contain only few intra-textual hints about its publication date. Yet, it has many important applications. Indeed, the amount of digitized historical documents is constantly increasing, but their publication dates are not always properly identified via OCR acquisition. Accurate knowledge about publication dates is crucial for many applications, e.g. studying the evolution of documents topics over a certain period of time. In this article, we present a method for automatically determining the publication dates of documents, which was evaluated on a French newspaper corpus in the context of the DEFT 2011 evaluation campaign. Our system is based on a combination of different individual systems, relying both on supervised and unsupervised learning, and uses several external resources, e.g. Wikipedia, Google Books Ngrams, and etymological background knowledge about the French language. Our system detects the correct year of publication in 10% of the cases for 300-word excerpts and in 14% of the cases for 500-word excerpts, which is very promising given the complexity of the task.
SHS Web of Conferences, 2012
Journal of the American Medical Informatics Association, 2011
Objective This paper describes the approaches the authors developed while participating in the i2... more Objective This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts. Design The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.
Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists... more Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists and
language teachers. More recently, this issue has seen a growing interest in the field of Natural Language Processing (NLP) and, in particular, that of automatic text simplification. The aim of this task is to identify words and structures which
may be difficult to understand by a target audience and provide automated tools to simplify these contents. This article focuses on the lexical issue by identifying a set of predictors of the lexical complexity whose efficiency are assessed with
a correlational analysis. The best of those variables are integrated into a model able to predict the difficulty of words for learners of French.
Text readability depends on a variety of variables. While lexico-semantic and syntactic factors h... more Text readability depends on a variety of variables. While lexico-semantic and syntactic factors have been widely used in the literature, more hight-level discursive and cognitive properties such as cohesion and coherence have received little attention. This paper assesses the efficiency of 41 measures of text cohesion and text coherence as predictors of text readability. We compare results manually obtained on two corpora including texts with different difficulty levels, and show that some cohesive features are indeed useful predictors.
Lecture Notes in Computer Science, 2011
Automatically determining the publication date of a document is a complex task, since a document ... more Automatically determining the publication date of a document is a complex task, since a document may contain only few intra-textual hints about its publication date. Yet, it has many important applications. Indeed, the amount of digitized historical documents is constantly increasing, but their publication dates are not always properly identified via OCR acquisition. Accurate knowledge about publication dates is crucial for many applications, e.g. studying the evolution of documents topics over a certain period of time. In this article, we present a method for automatically determining the publication dates of documents, which was evaluated on a French newspaper corpus in the context of the DEFT 2011 evaluation campaign. Our system is based on a combination of different individual systems, relying both on supervised and unsupervised learning, and uses several external resources, e.g. Wikipedia, Google Books Ngrams, and etymological background knowledge about the French language. Our system detects the correct year of publication in 10% of the cases for 300-word excerpts and in 14% of the cases for 500-word excerpts, which is very promising given the complexity of the task.
SHS Web of Conferences, 2012
Journal of the American Medical Informatics Association, 2011
Objective This paper describes the approaches the authors developed while participating in the i2... more Objective This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts. Design The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.
Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists... more Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists and
language teachers. More recently, this issue has seen a growing interest in the field of Natural Language Processing (NLP) and, in particular, that of automatic text simplification. The aim of this task is to identify words and structures which
may be difficult to understand by a target audience and provide automated tools to simplify these contents. This article focuses on the lexical issue by identifying a set of predictors of the lexical complexity whose efficiency are assessed with
a correlational analysis. The best of those variables are integrated into a model able to predict the difficulty of words for learners of French.