Daniela Stojanova | Jožef Stefan Institute (original) (raw)

Papers by Daniela Stojanova

Research paper thumbnail of Dealing with Spatial Autocorrelation in Gene Flow Modeling

Models of the Ecological …, 2012

Research paper thumbnail of A qualitative decisionͲsupport system for evaluating forest models

gozdis.si

Forestry decision making is a complex process and an important issue in forestry.

Research paper thumbnail of Models of the Ecological Hierarchy: From Molecules to the Ecosphere

Research paper thumbnail of Dealing with Spatial Autocorrelation when Learning Predictive Clustering Trees

Ecological …, 2012

Spatial autocorrelation is the correlation among data values which is strictly due to the relativ... more Spatial autocorrelation is the correlation among data values which is strictly due to the relative spatial proximity of the objects that the data refer to. Inappropriate treatment of data with spatial dependencies, where spatial autocorrelation is ignored, can obfuscate important insights. In this paper, we propose a data mining method that explicitly considers spatial autocorrelation in the values of the response (target) variable when learning predictive clustering models. The method is based on the concept of predictive clustering trees (PCTs), according to which hierarchies of clusters of similar data are identified and a predictive model is associated to each cluster. In particular, our approach is able to learn predictive models for both a continuous response (regression task) and a discrete response (classification task). We evaluate our approach on several real world problems of spatial regression and spatial classification. The consideration of the autocorrelation in the models improves predictions that are consistently clustered in space and that clusters try to preserve the spatial arrangement of the data, at the same time providing a multi-level insight into the spatial autocorrelation phenomenon. The evaluation of SCLUS in several ecological domains (e.g. predicting outcrossing rates within a conventional field due to the surrounding genetically modified fields, as well as predicting pollen dispersal rates from two lines of plants) confirms its capability of building spatial aware models which capture the spatial distribution of the target variable. In general, the maps obtained by using SCLUS do not require further post-smoothing of the results if we want to use them in practice.

Research paper thumbnail of Estimating Vegetation Height and Canopy Cover from Remotely

kt.ijs.si

High quality information on forest resources is important to forest ecosystem management. Tra-7 d... more High quality information on forest resources is important to forest ecosystem management. Tra-7 ditional ground measurements are labor and resource intensive and at the same time expensive 8 and time consuming. For most of the Slovenian forests, there is extensive ground-based infor-9 mation on forest properties of selected sample locations. However there is no continuous infor-10 mation of objectively measured vegetation height and canopy cover at appropriate resolution. 11 Currently, Light Detection And Ranging (LiDAR) technology provides detailed measure-12 ments of different forest properties because of its immediate generation of 3D data, its accuracy 13 and acquisition flexibility. However, existing LiDAR sensors have limited spatial coverage and 14 relatively high cost of acquisition. Satellite data, on the other hand, are low-cost and offer broader 15 spatial coverage of generalized forest structure, but are not expected to provide accurate infor-16 mation about vegetation height. 17 Integration of LiDAR and satellite data promises to improve the measurement, mapping, and 18 monitoring of forest properties. The primary objective of this study is to model the vegetation 19 height and canopy cover in Slovenia by integrating LiDAR data, Landsat satellite data, and the 20 use of machine learning techniques. This kind of integration uses the accuracy and precision of 21 LiDAR data and the wide coverage of satellite data in order to generate cost effective realistic 22 estimates of the vegetation height and canopy cover, and consequently generate continuous forest 23 vegetation map products to be used in forest management and monitoring.

Research paper thumbnail of CONSIDERING AUTOCORRELATION IN PREDICTIVE MODELS

Research paper thumbnail of WEB-BASED GIS SYSTEM: A CASE STUDY FROM SLOVENIA

evkartenn.com

Large amount of geographical data have been used more and more in many areas in different applica... more Large amount of geographical data have been used more and more in many areas in different application domains, such as government, telecommunications, utilities, cadastre, land management, environment and ecology. Recently, the internet technology is moving Geographical Information Systems (GIS) towards Web based applications, providing more visual information for the end users and simplifying the interaction between users and GIS. We present a web based information system that has been developed by the Slovenian Forestry Institute in order to promote the hunting community in Slovenia. This information system facilitates the hunting in terms of providing online up-to-date information on various dispossessions of species on different locations. The information is made available at three different levels i.e. Country level, where the information are aggregated at national level; Hunting region level where the information is aggregated at district-wise; and at Hunting communities level where the information is aggregated village-wise. The application follows OpenGIS Standards compliant for Web Feature Service (WFS) and Web Map Service (WMS). The data is stored in a spatial database. The output formats includes tables, graphs and maps products (Google Earth, GEORSS, Shapefiles, raster image formats, pdf, etc.). All presented data are extensively equipped with their metadata description, so as to enable delivery of exact information to the end user. Technologies like HTML and JAVA scripts are made use of for designing the client end interfaces. The Web GIS based applications constitute the new paradigm of distributed applications, that combines the best aspects of the development of components and the development web using standard GIS protocols and data formats of generalized use to obtain multiplatform integration.

Research paper thumbnail of Predicting Forest Stand Properties from Satellite Images with Different Data Mining Techniques

Proceedings of the 9th …, 2006

This paper work is focused on the comparison of different data mining techniques and their perfor... more This paper work is focused on the comparison of different data mining techniques and their performances by building predictive models of forest stand properties from satellite images. We used the WEKA data mining environment to implement our numeric prediction experiments, applying linear regression, model (regression) trees, and bagging. The best results (with regard to correlation) we obtained by bagging model trees for considered target attributes.

Research paper thumbnail of Global and local spatial autocorrelation in predictive clustering trees

Discovery Science, 2011

Spatial autocorrelation is the correlation among data values, strictly due to the relative locati... more Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence -a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored. In this paper, we propose a data mining method that explicitly considers autocorrelation when building the models. The method is based on the concept of predictive clustering trees (PCTs). The proposed approach combines the possibility of capturing both global and local effects and dealing with positive spatial autocorrelation. The discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions. Results show the effectiveness of the proposed solution.

Research paper thumbnail of Network Regression with Predictive Clustering Trees

Machine Learning and …, 2011

Research paper thumbnail of Zeolites as alcohol adsorbents from aqueous solutions

Acta periodica …, 2006

The potential usage of zeolites as adsorbents for the removal of organic molecules from water was... more The potential usage of zeolites as adsorbents for the removal of organic molecules from water was investigated in a series of experiments with aqueous solutions of lower alcohols. This could represent a simple solution to the problem of cleaning up industrial wastewater as well as recovering valuable chemicals at relatively low costs. Adsorption isotherms of the Langmuir type were applied, and calculations showed that the amount of propanol adsorbed on silicalite corresponded to approximately 70% of the pore volume. The adsorption process is simple, and recovery of the more concentrated products is easily done by heat treatment and/or at lowered pressures.

Research paper thumbnail of Using relational decision trees to model out-crossing rates in a multi-field setting

Ecological …, 2012

Nearly three-quarters of the genetically modified maize (the insect resistant type MON 810, also ... more Nearly three-quarters of the genetically modified maize (the insect resistant type MON 810, also called Bt maize) produced in the EU are cultivated in Spain, where the share of Bt maize cultivation in some regions (Catalonia) is very high (above 70%). In order to ensure coexistence with the production of conventional maize and satisfy the 0.9% EU threshold for adventitious presence of authorized genetically modified (GM) material in conventional (non-GM) maize crops, a set of preventive coexistence measures must be applied. These measures usually include the setup of large and fixed isolation distances, pollen barriers, flowering coincidence, crop rotation and other measures, which are very hard to fulfill in a multi-field setting. Basic empirical and modeling studies that explore the feasibility of coexistence between GM and non-GM crops focus on pair-based interactions between fields while multi-field studies build upon them, attempting to consider the complexity of gene flow under crop management practices.

Research paper thumbnail of Estimating the risk of fire outbreaks in the natural environment

Data Mining and …, 2012

A constant and controlled level of emission of carbon and other gases into the atmosphere is a pr... more A constant and controlled level of emission of carbon and other gases into the atmosphere is a pre-condition for preventing global warming and an essential issue for a sustainable world. Fires in the natural environment are phenomena that extensively increase the level of greenhouse emissions and disturb the normal functioning of natural ecosystems. Therefore, estimating the risk of fire outbreaks and fire prevention are the first steps in reducing the damage caused by fire. In this study, we build predictive models to estimate the risk of fire outbreaks in Slovenia, using data from a GIS, Remote Sensing imagery and the weather prediction model ALADIN.

Research paper thumbnail of Estimating vegetation height and canopy cover from remotely sensed data with machine learning

Ecological …, 2010

High quality information on forest resources is important to forest ecosystem management. Traditi... more High quality information on forest resources is important to forest ecosystem management. Traditional ground measurements are labor and resource intensive and at the same time expensive and time consuming. For most of the Slovenian forests, there is extensive ground-based information on forest properties of selected sample locations. However there is no continuous information of objectively measured vegetation height and canopy cover at appropriate resolution.Currently, Light Detection And Ranging (LiDAR) technology provides detailed measurements of different forest properties because of its immediate generation of 3D data, its accuracy and acquisition flexibility. However, existing LiDAR sensors have limited spatial coverage and relatively high cost of acquisition. Satellite data, on the other hand, are low-cost and offer broader spatial coverage of generalized forest structure, but are not expected to provide accurate information about vegetation height.Integration of LiDAR and satellite data promises to improve the measurement, mapping, and monitoring of forest properties. The primary objective of this study is to model the vegetation height and canopy cover in Slovenia by integrating LiDAR data, Landsat satellite data, and the use of machine learning techniques. This kind of integration uses the accuracy and precision of LiDAR data and the wide coverage of satellite data in order to generate cost-effective realistic estimates of the vegetation height and canopy cover, and consequently generate continuous forest vegetation map products to be used in forest management and monitoring.Several machine learning techniques are applied to this task: they are evaluated and their performance is compared by using statistical significance tests. Ensemble methods perform significantly better than single- and multi-target regression trees and are further used for the generation of forest maps. Such maps are used for land-cover and land-use classification, as well as for monitoring and managing ongoing forest processes (like spontaneous afforestation, forest reduction and forest fires) that affect the stability of forest ecosystems.

Research paper thumbnail of A Qualitative Decision-Support Model for Evaluating Researchers

Informatica (slovenia), 2007

The evaluation of research work is an essential element of the scientific enterprise. In general,... more The evaluation of research work is an essential element of the scientific enterprise. In general, the evaluation of researchers and their work is highly dependent on the social and economic condition of the country in which the researchers work. The most commonly used form of evaluation is based on peer review. In Slovenia, a quantitative model for evaluating researchers has been developed and used by the Slovenian Research Agency, which has been criticized by the public. In order to alleviate some of the problems with this model and motivate further discussion on this issue, we propose an alternative qualitative model. The model belongs to the paradigm of hierarchical multi-attribute models and has been developed after a literature survey on existing models in foreign countries.

Research paper thumbnail of LEARNING TO PREDICT FOREST FIRES WITH DIFFERENT DATA MINING TECHNIQUES

… Information Society (IS …, 2006

The motive for this study was to learn to predict forest fires in Slovenia using different data m... more The motive for this study was to learn to predict forest fires in Slovenia using different data mining techniques. We used predictive models based on the forest structure GIS (geographical information system), the weather prediction model -Aladin and MODIS satellite data. We examined three different datasets: one for the Kras region, one for Primorska region and one for continental Slovenia because of the climate differences. On these datasets we applied logistic regression and decision trees (J48), as well as random forests, bagging and boosting of decision trees, in order to obtain predictive models of fire occurrence. Best results in terms of predictive accuracy were obtained by bagging decision trees.

Research paper thumbnail of Dealing with Spatial Autocorrelation in Gene Flow Modeling

Models of the Ecological …, 2012

Research paper thumbnail of A qualitative decisionͲsupport system for evaluating forest models

gozdis.si

Forestry decision making is a complex process and an important issue in forestry.

Research paper thumbnail of Models of the Ecological Hierarchy: From Molecules to the Ecosphere

Research paper thumbnail of Dealing with Spatial Autocorrelation when Learning Predictive Clustering Trees

Ecological …, 2012

Spatial autocorrelation is the correlation among data values which is strictly due to the relativ... more Spatial autocorrelation is the correlation among data values which is strictly due to the relative spatial proximity of the objects that the data refer to. Inappropriate treatment of data with spatial dependencies, where spatial autocorrelation is ignored, can obfuscate important insights. In this paper, we propose a data mining method that explicitly considers spatial autocorrelation in the values of the response (target) variable when learning predictive clustering models. The method is based on the concept of predictive clustering trees (PCTs), according to which hierarchies of clusters of similar data are identified and a predictive model is associated to each cluster. In particular, our approach is able to learn predictive models for both a continuous response (regression task) and a discrete response (classification task). We evaluate our approach on several real world problems of spatial regression and spatial classification. The consideration of the autocorrelation in the models improves predictions that are consistently clustered in space and that clusters try to preserve the spatial arrangement of the data, at the same time providing a multi-level insight into the spatial autocorrelation phenomenon. The evaluation of SCLUS in several ecological domains (e.g. predicting outcrossing rates within a conventional field due to the surrounding genetically modified fields, as well as predicting pollen dispersal rates from two lines of plants) confirms its capability of building spatial aware models which capture the spatial distribution of the target variable. In general, the maps obtained by using SCLUS do not require further post-smoothing of the results if we want to use them in practice.

Research paper thumbnail of Estimating Vegetation Height and Canopy Cover from Remotely

kt.ijs.si

High quality information on forest resources is important to forest ecosystem management. Tra-7 d... more High quality information on forest resources is important to forest ecosystem management. Tra-7 ditional ground measurements are labor and resource intensive and at the same time expensive 8 and time consuming. For most of the Slovenian forests, there is extensive ground-based infor-9 mation on forest properties of selected sample locations. However there is no continuous infor-10 mation of objectively measured vegetation height and canopy cover at appropriate resolution. 11 Currently, Light Detection And Ranging (LiDAR) technology provides detailed measure-12 ments of different forest properties because of its immediate generation of 3D data, its accuracy 13 and acquisition flexibility. However, existing LiDAR sensors have limited spatial coverage and 14 relatively high cost of acquisition. Satellite data, on the other hand, are low-cost and offer broader 15 spatial coverage of generalized forest structure, but are not expected to provide accurate infor-16 mation about vegetation height. 17 Integration of LiDAR and satellite data promises to improve the measurement, mapping, and 18 monitoring of forest properties. The primary objective of this study is to model the vegetation 19 height and canopy cover in Slovenia by integrating LiDAR data, Landsat satellite data, and the 20 use of machine learning techniques. This kind of integration uses the accuracy and precision of 21 LiDAR data and the wide coverage of satellite data in order to generate cost effective realistic 22 estimates of the vegetation height and canopy cover, and consequently generate continuous forest 23 vegetation map products to be used in forest management and monitoring.

Research paper thumbnail of CONSIDERING AUTOCORRELATION IN PREDICTIVE MODELS

Research paper thumbnail of WEB-BASED GIS SYSTEM: A CASE STUDY FROM SLOVENIA

evkartenn.com

Large amount of geographical data have been used more and more in many areas in different applica... more Large amount of geographical data have been used more and more in many areas in different application domains, such as government, telecommunications, utilities, cadastre, land management, environment and ecology. Recently, the internet technology is moving Geographical Information Systems (GIS) towards Web based applications, providing more visual information for the end users and simplifying the interaction between users and GIS. We present a web based information system that has been developed by the Slovenian Forestry Institute in order to promote the hunting community in Slovenia. This information system facilitates the hunting in terms of providing online up-to-date information on various dispossessions of species on different locations. The information is made available at three different levels i.e. Country level, where the information are aggregated at national level; Hunting region level where the information is aggregated at district-wise; and at Hunting communities level where the information is aggregated village-wise. The application follows OpenGIS Standards compliant for Web Feature Service (WFS) and Web Map Service (WMS). The data is stored in a spatial database. The output formats includes tables, graphs and maps products (Google Earth, GEORSS, Shapefiles, raster image formats, pdf, etc.). All presented data are extensively equipped with their metadata description, so as to enable delivery of exact information to the end user. Technologies like HTML and JAVA scripts are made use of for designing the client end interfaces. The Web GIS based applications constitute the new paradigm of distributed applications, that combines the best aspects of the development of components and the development web using standard GIS protocols and data formats of generalized use to obtain multiplatform integration.

Research paper thumbnail of Predicting Forest Stand Properties from Satellite Images with Different Data Mining Techniques

Proceedings of the 9th …, 2006

This paper work is focused on the comparison of different data mining techniques and their perfor... more This paper work is focused on the comparison of different data mining techniques and their performances by building predictive models of forest stand properties from satellite images. We used the WEKA data mining environment to implement our numeric prediction experiments, applying linear regression, model (regression) trees, and bagging. The best results (with regard to correlation) we obtained by bagging model trees for considered target attributes.

Research paper thumbnail of Global and local spatial autocorrelation in predictive clustering trees

Discovery Science, 2011

Spatial autocorrelation is the correlation among data values, strictly due to the relative locati... more Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence -a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored. In this paper, we propose a data mining method that explicitly considers autocorrelation when building the models. The method is based on the concept of predictive clustering trees (PCTs). The proposed approach combines the possibility of capturing both global and local effects and dealing with positive spatial autocorrelation. The discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions. Results show the effectiveness of the proposed solution.

Research paper thumbnail of Network Regression with Predictive Clustering Trees

Machine Learning and …, 2011

Research paper thumbnail of Zeolites as alcohol adsorbents from aqueous solutions

Acta periodica …, 2006

The potential usage of zeolites as adsorbents for the removal of organic molecules from water was... more The potential usage of zeolites as adsorbents for the removal of organic molecules from water was investigated in a series of experiments with aqueous solutions of lower alcohols. This could represent a simple solution to the problem of cleaning up industrial wastewater as well as recovering valuable chemicals at relatively low costs. Adsorption isotherms of the Langmuir type were applied, and calculations showed that the amount of propanol adsorbed on silicalite corresponded to approximately 70% of the pore volume. The adsorption process is simple, and recovery of the more concentrated products is easily done by heat treatment and/or at lowered pressures.

Research paper thumbnail of Using relational decision trees to model out-crossing rates in a multi-field setting

Ecological …, 2012

Nearly three-quarters of the genetically modified maize (the insect resistant type MON 810, also ... more Nearly three-quarters of the genetically modified maize (the insect resistant type MON 810, also called Bt maize) produced in the EU are cultivated in Spain, where the share of Bt maize cultivation in some regions (Catalonia) is very high (above 70%). In order to ensure coexistence with the production of conventional maize and satisfy the 0.9% EU threshold for adventitious presence of authorized genetically modified (GM) material in conventional (non-GM) maize crops, a set of preventive coexistence measures must be applied. These measures usually include the setup of large and fixed isolation distances, pollen barriers, flowering coincidence, crop rotation and other measures, which are very hard to fulfill in a multi-field setting. Basic empirical and modeling studies that explore the feasibility of coexistence between GM and non-GM crops focus on pair-based interactions between fields while multi-field studies build upon them, attempting to consider the complexity of gene flow under crop management practices.

Research paper thumbnail of Estimating the risk of fire outbreaks in the natural environment

Data Mining and …, 2012

A constant and controlled level of emission of carbon and other gases into the atmosphere is a pr... more A constant and controlled level of emission of carbon and other gases into the atmosphere is a pre-condition for preventing global warming and an essential issue for a sustainable world. Fires in the natural environment are phenomena that extensively increase the level of greenhouse emissions and disturb the normal functioning of natural ecosystems. Therefore, estimating the risk of fire outbreaks and fire prevention are the first steps in reducing the damage caused by fire. In this study, we build predictive models to estimate the risk of fire outbreaks in Slovenia, using data from a GIS, Remote Sensing imagery and the weather prediction model ALADIN.

Research paper thumbnail of Estimating vegetation height and canopy cover from remotely sensed data with machine learning

Ecological …, 2010

High quality information on forest resources is important to forest ecosystem management. Traditi... more High quality information on forest resources is important to forest ecosystem management. Traditional ground measurements are labor and resource intensive and at the same time expensive and time consuming. For most of the Slovenian forests, there is extensive ground-based information on forest properties of selected sample locations. However there is no continuous information of objectively measured vegetation height and canopy cover at appropriate resolution.Currently, Light Detection And Ranging (LiDAR) technology provides detailed measurements of different forest properties because of its immediate generation of 3D data, its accuracy and acquisition flexibility. However, existing LiDAR sensors have limited spatial coverage and relatively high cost of acquisition. Satellite data, on the other hand, are low-cost and offer broader spatial coverage of generalized forest structure, but are not expected to provide accurate information about vegetation height.Integration of LiDAR and satellite data promises to improve the measurement, mapping, and monitoring of forest properties. The primary objective of this study is to model the vegetation height and canopy cover in Slovenia by integrating LiDAR data, Landsat satellite data, and the use of machine learning techniques. This kind of integration uses the accuracy and precision of LiDAR data and the wide coverage of satellite data in order to generate cost-effective realistic estimates of the vegetation height and canopy cover, and consequently generate continuous forest vegetation map products to be used in forest management and monitoring.Several machine learning techniques are applied to this task: they are evaluated and their performance is compared by using statistical significance tests. Ensemble methods perform significantly better than single- and multi-target regression trees and are further used for the generation of forest maps. Such maps are used for land-cover and land-use classification, as well as for monitoring and managing ongoing forest processes (like spontaneous afforestation, forest reduction and forest fires) that affect the stability of forest ecosystems.

Research paper thumbnail of A Qualitative Decision-Support Model for Evaluating Researchers

Informatica (slovenia), 2007

The evaluation of research work is an essential element of the scientific enterprise. In general,... more The evaluation of research work is an essential element of the scientific enterprise. In general, the evaluation of researchers and their work is highly dependent on the social and economic condition of the country in which the researchers work. The most commonly used form of evaluation is based on peer review. In Slovenia, a quantitative model for evaluating researchers has been developed and used by the Slovenian Research Agency, which has been criticized by the public. In order to alleviate some of the problems with this model and motivate further discussion on this issue, we propose an alternative qualitative model. The model belongs to the paradigm of hierarchical multi-attribute models and has been developed after a literature survey on existing models in foreign countries.

Research paper thumbnail of LEARNING TO PREDICT FOREST FIRES WITH DIFFERENT DATA MINING TECHNIQUES

… Information Society (IS …, 2006

The motive for this study was to learn to predict forest fires in Slovenia using different data m... more The motive for this study was to learn to predict forest fires in Slovenia using different data mining techniques. We used predictive models based on the forest structure GIS (geographical information system), the weather prediction model -Aladin and MODIS satellite data. We examined three different datasets: one for the Kras region, one for Primorska region and one for continental Slovenia because of the climate differences. On these datasets we applied logistic regression and decision trees (J48), as well as random forests, bagging and boosting of decision trees, in order to obtain predictive models of fire occurrence. Best results in terms of predictive accuracy were obtained by bagging decision trees.