Paula Cristina Brito - Academia.edu (original) (raw)
Papers by Paula Cristina Brito
RPER
No âmbito de um projecto de monitorização da qualidade de vida desenvolvido pela Câmara Municipal... more No âmbito de um projecto de monitorização da qualidade de vida desenvolvido pela Câmara Municipal do Porto foi realizado um inquérito a 2400 residentes na cidade com o objectivo de avaliar a percepção dos cidadãos. Algumas das questões diziam respeito ao próprio conceito de qualidade de vida, visando identificar os aspectos que os inquiridos consideram fundamentais para que uma cidade apresente boas condições de vida e de bem-estar. Neste artigo apresentam-se os principais resultados obtidos relativamente a essas questões, num primeiro ponto através de um tratamento estatístico simples das respostas obtidas, e de seguida através de uma análise multivariada, que permite o agrupamento dos inquiridos em grandes grupos homogéneos e a sua caracterização sócio-económica.
IMPOSTO SOBRE CIRCULAÇÃO DE MERCADORIAS-BARES E RESTAURANTES-Incide o imposto sobre circulação de... more IMPOSTO SOBRE CIRCULAÇÃO DE MERCADORIAS-BARES E RESTAURANTES-Incide o imposto sobre circulação de mercadorias no caso de fornecimento de refeições em bares e restaurantes.
Intelligent Data Analysis, 2006
Intelligent Data Analysis, 2003
In statistics, the term "official data" denotes data collected in censuses and statistical survey... more In statistics, the term "official data" denotes data collected in censuses and statistical surveys by National Statistics Institutes (NSIs), as well as administrative and registration records collected by government departments and local authorities. They are used to produce "official statistics" for the purpose of making policy decisions, and to facilitate the appreciation of economic, social, demographic, and other matters of interest to governments, government departments, local authorities, businesses and to the general public. For instance, population and economic census information is of great value in planning public services (education, fund allocation, public transport), as well as in private businesses (placing new factories, shopping malls, or banks, as well as marketing particular products). Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected by NSIs to keep updated information on some economic and social phenomena. The application of data mining techniques to official data has great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. Nevertheless, it is not straightforward and requires challenging methodological research, which is still in the initial stages. This special issue includes six papers which constitute updated and extended versions of papers selected from those presented at the Workshop on Mining Official Data, chaired by the guest editors of this issue in Helsinki in August 2002. The workshop was organized under the auspices of the European project KDNet (The Knowledge Discovery Network of Excellence) and within the framework of the 13th European Conference on Machine Learning (ECML'02) and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Different directions can be distinguished in the approach of the problem of mining official data. In this issue, emphasis is placed on the following topics: Geo-referenciation. The practice of geo-referencing census data has increasingly spread over the last few decades and the techniques for attaching socioeconomic data to specific locations have markedly improved at the same time. In the UK, for instance, household expenditure data are provided for each enumeration district (ED), the smallest areal unit for which census data are published. At the same time, vectorized boundaries of the 1991 census EDs enable the investigation of socioeconomic phenomena in association with the geographical location of EDs. These advances cause a growing demand for more powerful data analysis techniques that can link population data to their spatial distribution. In this context, a European project, SPIN, has been developed to address problems concerning geo-referenciation. SPIN's
Interdisciplinary sciences, computational life sciences, 2018
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing bo... more In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
In these guidelines we describe the format instructions and the sub- mission procedure both to be... more In these guidelines we describe the format instructions and the sub- mission procedure both to be followed seriously. Since all texts should be compiled together, files that do not comply with this format may not be accepted. The text of these guidelines is written in the prescribed L ATEX format and can be used as a specimen.
This paper explores the determinants of debt maturity for a sample of 3306 non-financial listed f... more This paper explores the determinants of debt maturity for a sample of 3306 non-financial listed firms from thirteen European countries (twelve countries of Euro Zone and United Kingdom) in 2011. According to literature, two sets of explanatory variables are included: (i) characteristics of firms and (ii) institutional environment. The firm-level variables are growth opportunities, size, tax, firm value volatility, quality, rating, assets maturity and leverage. The model also includes independent country-level variables about term structure and volatility of interest rates, efficiency and type of legal system, banking sector size, turnover and size of stock market. Overall, our results confirm prior literature predictions about the influence of firm-level variables on firm debt maturity, with the exception of tax and growth opportunities. The results also suggest that the type of legal system has a significant impact on debt maturity and the higher the size of banking system in the e...
Statistical Analysis and Data Mining: The ASA Data Science Journal, 2015
Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis w... more Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogram or by a quantile function. Linear regression models for this type of data are necessarily more complex than a simple generalization of the classical model: the parameters cannot be negative; still the linear relation between the variables must be allowed to be either direct or inverse. In this work, we propose a new linear regression model for histogram-valued variables that solves this problem, named Distribution and Symmetric Distribution Regression Model. To determine the parameters of this model, it is necessary to solve a quadratic optimization problem, subject to non-negativity constraints on the unknowns; the error measure between the predicted and observed distributions uses the Mallows distance. As in classical analysis, the model is associated with a goodness-of-fit measure whose values range between 0 and 1. Using the proposed model, applications with real and simulated data are presented.
Symbolic Data Analysis generalizes the classical concept of variables, by allowing for new forms ... more Symbolic Data Analysis generalizes the classical concept of variables, by allowing for new forms of realizations, such as intervals, sets, distributions, etc. Histogram-valued variables are a type of symbolic variables where to each entity under analysis corresponds a requency distribution. In the last years, concepts and methods of classical statistics have been adapted to symbolic data of different types; recently there is growing interest in the analysis of histogram-valued variables. In this work a new linear regression model and a goodness-of-fit measure for histogram-valued variables are proposed. Using this model, we present some simulation results for a particular case.
Symbolic Data Analysis is concerned with data tables where the values in each cell are not single... more Symbolic Data Analysis is concerned with data tables where the values in each cell are not single values but elements that express the variability of the records, e.g., intervals or histograms. Symbolic linear regression aims at investigating the linear relationship between histogram or interval-valued variables. In this paper, we study two real data problems: in a first one, symbolic models are used to predict the distribution of the number of violent crimes in USA states from socio-economic characteristics; in a second application the objective is to predict the ranges of burned area in the Montesinho natural park from the ranges of specific weather conditions. The main goal of this work is to compare the performance of the symbolic regression models with classical ones.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2014
Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data tha... more Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data that comprehends inherent variability. While in Data Mining and classical Statistics the data to be analyzed usually presents one single value for each variable, that is no longer the case when the entities under analysis are not single elements, but groups gathered on the basis of some given criteria. Then, for each variable, variability inherent to each group should be taken into account. Also, when analysing concepts, such as botanic species, disease descriptions, car models, and so on, data entail intrinsic variability, which should be explicitly considered. To this purpose, new variable types have been introduced, whose realizations are not single real values or categories, but sets, intervals, or, more generally, distributions over a given domain. SDA provides methods for the (multivariate) analysis of such data, where the variability expressed in the data representation is taken into account, using various approaches.
Studies in Classification, Data Analysis, and Knowledge Organization, 2014
Starting from the main idea of Symbolic Data Analysis to extend Statistics and Data Mining method... more Starting from the main idea of Symbolic Data Analysis to extend Statistics and Data Mining methods from first-order to second-order objects, we focus on network data-as defined in the framework of Social Network Analysisto define a graph structure and the underlying network in the context of complex data objects. A Network Symbolic description is defined according to the statistical characterization of the network topological properties. We use suitable network measures, which are represented by means of symbolic variables. Their study through multidimensional data analysis, allows for the synthetic representation of a network as a point onto a metric space. The proposed approach is discussed on the basis of a simulation study considering three classical network growth processes.
Symbolic Data Analysis and the SODAS Software
Symbolic data analysis is a relatively new field that provides a range of methods for analyzing c... more Symbolic data analysis is a relatively new field that provides a range of methods for analyzing complex datasets. Standard statistical methods do not have the power or flexibility to make sense of very large datasets, and symbolic data analysis techniques have been developed in order to extract knowledge from such data. Symbolic data methods differ from that of data mining, for example, because rather than identifying points of interest in the data, symbolic data methods allow the user to build models of the data and make predictions about future events. This book is the result of the work f a pan-European project team led by Edwin Diday following 3 years work sponsored by EUROSTAT. It includes a full explanation of the new SODAS software developed as a result of this project. The software and methods described highlight the crossover between statistics and computer science, with a particular emphasis on data mining.
IFIP International Federation for Information Processing
We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperat... more We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperation networks. Firms can decide to cooperate horizontally (in the same market) or vertically with other firms that belong to the supply chain. Cooperation decisions are based on economic variables. We have defined a variant of the density dependence model to set up the dynamics of the survival in the simulation. To validate our model, we have used empirical outputs obtained in previous studies from the automobile manufacturing sector. We have observed that firms and networks proliferate in the regions with lower marginal costs, but new networks keep appearing and disappearing in regions with higher marginal costs.
Frontiers in Chemistry, 2014
Revista Brasileira de Educação Médica, 2013
As atividades de educação em saúde no ambiente escolar são práticas de promoção da saúde indutora... more As atividades de educação em saúde no ambiente escolar são práticas de promoção da saúde indutoras de processos de transformação coletiva que incidem sobre as condições de vida da população. Este estudo objetiva analisar a percepção dos pais/responsáveis de escolas de ensino fundamental público quanto à participação dos acadêmicos universitários em ações de educação em saúde do Programa de Educação pelo Trabalho para Saúde (PET). Trata-se de uma pesquisa qualitativa e descritiva, realizada mediante aplicação de questionários a pais/responsáveis de escolares de seis a 14 anos em duas escolas de ensino fundamental públicas. Foram aplicados cem questionários que levantaram dados sobre a participação dos acadêmicos em atividades de educação em saúde e sua contribuição para a formação dos profissionais de saúde. A análise dos dados foi realizada seguindo-se o referencial da análise de conteúdo. Os resultados da pesquisa permitem afirmar que os pais/responsáveis pelos escolares avaliam qu...
RPER
No âmbito de um projecto de monitorização da qualidade de vida desenvolvido pela Câmara Municipal... more No âmbito de um projecto de monitorização da qualidade de vida desenvolvido pela Câmara Municipal do Porto foi realizado um inquérito a 2400 residentes na cidade com o objectivo de avaliar a percepção dos cidadãos. Algumas das questões diziam respeito ao próprio conceito de qualidade de vida, visando identificar os aspectos que os inquiridos consideram fundamentais para que uma cidade apresente boas condições de vida e de bem-estar. Neste artigo apresentam-se os principais resultados obtidos relativamente a essas questões, num primeiro ponto através de um tratamento estatístico simples das respostas obtidas, e de seguida através de uma análise multivariada, que permite o agrupamento dos inquiridos em grandes grupos homogéneos e a sua caracterização sócio-económica.
IMPOSTO SOBRE CIRCULAÇÃO DE MERCADORIAS-BARES E RESTAURANTES-Incide o imposto sobre circulação de... more IMPOSTO SOBRE CIRCULAÇÃO DE MERCADORIAS-BARES E RESTAURANTES-Incide o imposto sobre circulação de mercadorias no caso de fornecimento de refeições em bares e restaurantes.
Intelligent Data Analysis, 2006
Intelligent Data Analysis, 2003
In statistics, the term "official data" denotes data collected in censuses and statistical survey... more In statistics, the term "official data" denotes data collected in censuses and statistical surveys by National Statistics Institutes (NSIs), as well as administrative and registration records collected by government departments and local authorities. They are used to produce "official statistics" for the purpose of making policy decisions, and to facilitate the appreciation of economic, social, demographic, and other matters of interest to governments, government departments, local authorities, businesses and to the general public. For instance, population and economic census information is of great value in planning public services (education, fund allocation, public transport), as well as in private businesses (placing new factories, shopping malls, or banks, as well as marketing particular products). Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected by NSIs to keep updated information on some economic and social phenomena. The application of data mining techniques to official data has great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. Nevertheless, it is not straightforward and requires challenging methodological research, which is still in the initial stages. This special issue includes six papers which constitute updated and extended versions of papers selected from those presented at the Workshop on Mining Official Data, chaired by the guest editors of this issue in Helsinki in August 2002. The workshop was organized under the auspices of the European project KDNet (The Knowledge Discovery Network of Excellence) and within the framework of the 13th European Conference on Machine Learning (ECML'02) and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Different directions can be distinguished in the approach of the problem of mining official data. In this issue, emphasis is placed on the following topics: Geo-referenciation. The practice of geo-referencing census data has increasingly spread over the last few decades and the techniques for attaching socioeconomic data to specific locations have markedly improved at the same time. In the UK, for instance, household expenditure data are provided for each enumeration district (ED), the smallest areal unit for which census data are published. At the same time, vectorized boundaries of the 1991 census EDs enable the investigation of socioeconomic phenomena in association with the geographical location of EDs. These advances cause a growing demand for more powerful data analysis techniques that can link population data to their spatial distribution. In this context, a European project, SPIN, has been developed to address problems concerning geo-referenciation. SPIN's
Interdisciplinary sciences, computational life sciences, 2018
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing bo... more In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
In these guidelines we describe the format instructions and the sub- mission procedure both to be... more In these guidelines we describe the format instructions and the sub- mission procedure both to be followed seriously. Since all texts should be compiled together, files that do not comply with this format may not be accepted. The text of these guidelines is written in the prescribed L ATEX format and can be used as a specimen.
This paper explores the determinants of debt maturity for a sample of 3306 non-financial listed f... more This paper explores the determinants of debt maturity for a sample of 3306 non-financial listed firms from thirteen European countries (twelve countries of Euro Zone and United Kingdom) in 2011. According to literature, two sets of explanatory variables are included: (i) characteristics of firms and (ii) institutional environment. The firm-level variables are growth opportunities, size, tax, firm value volatility, quality, rating, assets maturity and leverage. The model also includes independent country-level variables about term structure and volatility of interest rates, efficiency and type of legal system, banking sector size, turnover and size of stock market. Overall, our results confirm prior literature predictions about the influence of firm-level variables on firm debt maturity, with the exception of tax and growth opportunities. The results also suggest that the type of legal system has a significant impact on debt maturity and the higher the size of banking system in the e...
Statistical Analysis and Data Mining: The ASA Data Science Journal, 2015
Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis w... more Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogram or by a quantile function. Linear regression models for this type of data are necessarily more complex than a simple generalization of the classical model: the parameters cannot be negative; still the linear relation between the variables must be allowed to be either direct or inverse. In this work, we propose a new linear regression model for histogram-valued variables that solves this problem, named Distribution and Symmetric Distribution Regression Model. To determine the parameters of this model, it is necessary to solve a quadratic optimization problem, subject to non-negativity constraints on the unknowns; the error measure between the predicted and observed distributions uses the Mallows distance. As in classical analysis, the model is associated with a goodness-of-fit measure whose values range between 0 and 1. Using the proposed model, applications with real and simulated data are presented.
Symbolic Data Analysis generalizes the classical concept of variables, by allowing for new forms ... more Symbolic Data Analysis generalizes the classical concept of variables, by allowing for new forms of realizations, such as intervals, sets, distributions, etc. Histogram-valued variables are a type of symbolic variables where to each entity under analysis corresponds a requency distribution. In the last years, concepts and methods of classical statistics have been adapted to symbolic data of different types; recently there is growing interest in the analysis of histogram-valued variables. In this work a new linear regression model and a goodness-of-fit measure for histogram-valued variables are proposed. Using this model, we present some simulation results for a particular case.
Symbolic Data Analysis is concerned with data tables where the values in each cell are not single... more Symbolic Data Analysis is concerned with data tables where the values in each cell are not single values but elements that express the variability of the records, e.g., intervals or histograms. Symbolic linear regression aims at investigating the linear relationship between histogram or interval-valued variables. In this paper, we study two real data problems: in a first one, symbolic models are used to predict the distribution of the number of violent crimes in USA states from socio-economic characteristics; in a second application the objective is to predict the ranges of burned area in the Montesinho natural park from the ranges of specific weather conditions. The main goal of this work is to compare the performance of the symbolic regression models with classical ones.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2014
Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data tha... more Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data that comprehends inherent variability. While in Data Mining and classical Statistics the data to be analyzed usually presents one single value for each variable, that is no longer the case when the entities under analysis are not single elements, but groups gathered on the basis of some given criteria. Then, for each variable, variability inherent to each group should be taken into account. Also, when analysing concepts, such as botanic species, disease descriptions, car models, and so on, data entail intrinsic variability, which should be explicitly considered. To this purpose, new variable types have been introduced, whose realizations are not single real values or categories, but sets, intervals, or, more generally, distributions over a given domain. SDA provides methods for the (multivariate) analysis of such data, where the variability expressed in the data representation is taken into account, using various approaches.
Studies in Classification, Data Analysis, and Knowledge Organization, 2014
Starting from the main idea of Symbolic Data Analysis to extend Statistics and Data Mining method... more Starting from the main idea of Symbolic Data Analysis to extend Statistics and Data Mining methods from first-order to second-order objects, we focus on network data-as defined in the framework of Social Network Analysisto define a graph structure and the underlying network in the context of complex data objects. A Network Symbolic description is defined according to the statistical characterization of the network topological properties. We use suitable network measures, which are represented by means of symbolic variables. Their study through multidimensional data analysis, allows for the synthetic representation of a network as a point onto a metric space. The proposed approach is discussed on the basis of a simulation study considering three classical network growth processes.
Symbolic Data Analysis and the SODAS Software
Symbolic data analysis is a relatively new field that provides a range of methods for analyzing c... more Symbolic data analysis is a relatively new field that provides a range of methods for analyzing complex datasets. Standard statistical methods do not have the power or flexibility to make sense of very large datasets, and symbolic data analysis techniques have been developed in order to extract knowledge from such data. Symbolic data methods differ from that of data mining, for example, because rather than identifying points of interest in the data, symbolic data methods allow the user to build models of the data and make predictions about future events. This book is the result of the work f a pan-European project team led by Edwin Diday following 3 years work sponsored by EUROSTAT. It includes a full explanation of the new SODAS software developed as a result of this project. The software and methods described highlight the crossover between statistics and computer science, with a particular emphasis on data mining.
IFIP International Federation for Information Processing
We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperat... more We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperation networks. Firms can decide to cooperate horizontally (in the same market) or vertically with other firms that belong to the supply chain. Cooperation decisions are based on economic variables. We have defined a variant of the density dependence model to set up the dynamics of the survival in the simulation. To validate our model, we have used empirical outputs obtained in previous studies from the automobile manufacturing sector. We have observed that firms and networks proliferate in the regions with lower marginal costs, but new networks keep appearing and disappearing in regions with higher marginal costs.
Frontiers in Chemistry, 2014
Revista Brasileira de Educação Médica, 2013
As atividades de educação em saúde no ambiente escolar são práticas de promoção da saúde indutora... more As atividades de educação em saúde no ambiente escolar são práticas de promoção da saúde indutoras de processos de transformação coletiva que incidem sobre as condições de vida da população. Este estudo objetiva analisar a percepção dos pais/responsáveis de escolas de ensino fundamental público quanto à participação dos acadêmicos universitários em ações de educação em saúde do Programa de Educação pelo Trabalho para Saúde (PET). Trata-se de uma pesquisa qualitativa e descritiva, realizada mediante aplicação de questionários a pais/responsáveis de escolares de seis a 14 anos em duas escolas de ensino fundamental públicas. Foram aplicados cem questionários que levantaram dados sobre a participação dos acadêmicos em atividades de educação em saúde e sua contribuição para a formação dos profissionais de saúde. A análise dos dados foi realizada seguindo-se o referencial da análise de conteúdo. Os resultados da pesquisa permitem afirmar que os pais/responsáveis pelos escolares avaliam qu...