Neus Català | Universitat Politecnica de Catalunya (original) (raw)

Papers by Neus Català

PLOS ONE, Dec 16, 2021

In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationshi... more In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.

Pattern Recognition Letters, Jul 1, 2017

The current wide access to data from different neuroimaging techniques has permitted to obtain da... more The current wide access to data from different neuroimaging techniques has permitted to obtain data to explore the possibility of finding objective criteria that can be used for diagnostic purposes. In order to decide which features of the data are relevant for the diagnostic task, we present in this paper a simple method for feature selection based on kernel alignment with the ideal kernel in support vector machines (SVM). The method presented shows state-of-the-art performance while being more efficient than other methods for feature selection in SVM. It is also less prone to overfitting due to the properties of the alignment measure. All these abilities are essential in neuroimaging study, where the number of features representing recordings is usually very large compared with the number of recordings. The method has been applied to a dataset in order to determine objective criteria for the diagnosis of schizophrenia. The dataset analyzed has been obtained from multichannel magnetoencephalogram (MEG) recordings, corresponding to the recordings during the performance of a mismatch negativity (MMN) auditory task by a set of schizophrenia patients and a control group. All signal frequency bands are analyzed (from δ (1-4Hz) to high frequency γ (60-200Hz)) and the signal correlations among the different sensors for these frequencies are used as features.

BMC Medical Informatics and Decision Making

Background The exponential growth of digital healthcare data is fueling the development of Knowle... more Background The exponential growth of digital healthcare data is fueling the development of Knowledge Discovery in Databases (KDD). Extracting temporal relationships between medical events is essential to reveal hidden patterns that can help physicians find optimal treatments, diagnose illnesses, detect drug adverse reactions, and more. This paper presents an approach for the extraction of patient evolution patterns from electronic health records written in Catalan and/or Spanish. Methods We propose a robust formulation for extracting Temporal Association Rules (TARs) that goes beyond simple rule extraction by considering the sequence of multiple visits. Our highly configurable algorithm leverages this formulation to extract Temporal Association Rules from sequences of medical instances. We can generate rules in the desired format, content, and temporal factors while accounting for different levels of abstraction of medical instances. To demonstrate the effectiveness of our methodolo...

In this paper we present a semantic role labeling system submitted to the CoNLL2005 shared task. ... more In this paper we present a semantic role labeling system submitted to the CoNLL2005 shared task. The system makes use of partial and full syntactic information and converts the task into a sequential BIO-tagging. As a result, the labeling architecture is very simple . Building on a state-of-the-art set of features, a binary classifier for each label is trained using AdaBoost with fixed depth decision trees. The final system, which combines the outputs of two base systems performed F 1 =76.59 on the official test set. Additionally, we provide results comparing the system when using partial vs. full parsing input information.

One important issue when constructing Information Extraction systems is how to obtain the knowled... more One important issue when constructing Information Extraction systems is how to obtain the knowledge needed for identifying relevant information in a document. In most approaches to this issue, the human expert intervention is necessary in many steps of the acquisition process. In this paper we describe ESSENCE, a new methodology that reduces significantly the need for human intervention. It is based on ELA, a new algorithm for acquiring information extraction patterns. The distinctive features of ESSENCE and ELA are that 1) allow to automatically acquire IE patterns from unrestricted text corpus representative of the domain, due to 2) the ability of identifying surrounding context regularities for semantically relevant concept-words for the IE task by using non domain specific lexical knowledge tools and semantic relations from WordNet, and 3) restricting the human intervention to only the definition of the task and the validation and typification of the set of IE patterns obtained....

This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation ... more This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our approach are: (a) The use of a surrogate function of F1 as loss function to close the gap between the minimization function and the evaluation metric, and (b) The generation of an ensemble of models for generating predictions by majority vote. Our system ranked second with an F1 score of 62.18% in the main task by a narrow margin with the winner that scored 63.94%.

I would like to thank all those people who, either directly or indirectly, have contributed to th... more I would like to thank all those people who, either directly or indirectly, have contributed to the conclusion (finally!) of this thesis. First and mainly, to my advisor, Horacio Rodríguez, for his permanent dedication and incommensurable patience. Secondly, to my family and friends, who have endured hearing me talking about the same Topic for all these years. The colleagues, current and former, of the Natural Language Processing Research Group have always provided a helpful and friendly environment:

In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationshi... more In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf’s rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes w...

The Evolution of Language, 2014

Computer Speech & Language

The pioneering research of G. K. Zipf on the relationship between word frequency and other word f... more The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.

Interaction Studies

Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of ... more Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from a combination of a standa...

Pattern Recognition Letters, 2016

TDX - Tesis Doctorals en Xarxa - 10 anys 2001 · 2011. Advanced Search. Restrict to TDX. ...

Procesamiento del Lenguaje Natural. Actas de …, Jan 1, 1997

Abstract: One of the most important issues when constructing an Information Extraction System is ... more Abstract: One of the most important issues when constructing an Information Extraction System is how to obtain the knowledge needed for identifying relevant information in a document. A manual approach not only is an expensive solution but also has a negative ...

… and Tools in Knowledge-Based Systems, Jan 1, 1998

The more extended way of acquiring information for knowledge-based systems is manually, frequentl... more The more extended way of acquiring information for knowledge-based systems is manually, frequently by means of a dialog between the system and the human expert (sometimes with the intervention of a knowledge engineer). However, the high cost of this approach, together ...

PLOS ONE, Dec 16, 2021

Pattern Recognition Letters, Jul 1, 2017

BMC Medical Informatics and Decision Making

In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationshi... more In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf’s rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes w...

The Evolution of Language, 2014

Computer Speech & Language

Interaction Studies

Pattern Recognition Letters, 2016

TDX - Tesis Doctorals en Xarxa - 10 anys 2001 · 2011. Advanced Search. Restrict to TDX. ...

Procesamiento del Lenguaje Natural. Actas de …, Jan 1, 1997

… and Tools in Knowledge-Based Systems, Jan 1, 1998