Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data (original) (raw)
Related papers
Studies in Computational Intelligence, 2008
Sustainable management efforts are currently hindered by a lack of basic information about the spatial distribution of species on large landscapes. Based on complex ecological databases, computationally advanced species distribution models can provide great progress for solving this ecological problem. However, current lack of knowledge about the ecological relationships that drive species distributions reduces the capacity for classical statistical approaches to produce accurate predictive maps. Advancements in machine learning, like classification and bagging algorithms, provide a powerful tool for quickly building accurate predictive models of species distributions even when little ecological knowledge is readily available. Such approaches are also well known for their robustness when dealing with large data sets that have low quality. Here, we use Random Forests (Salford System's Ltd. and R language), a highly accurate bagging classification algorithm originally developed by L. Breiman and A. Cutler, to build multi-species avian distribution models using data collected as part of the Kenai National Wildlife Refuge Long-term Ecological Monitoring Program (LTEMP). Distribution maps are a useful monitoring metric because they can be used to document range expansions or contractions and can also be linked to population estimates. We utilize variable radius point count data collected in 2004 and 2006 at 255 points arranged in a 4.8 km resolution, systematic grid spanning the 7722 km 2 spatial extent of Alaska's Kenai National Wildlife Refuge. We build distribution models for 41 bird species that are present within 200m of 2-56% of the sampling points resulting in models that represent species which are both rare and common on the landscape. All models were built using 2 Dawn R. Magness, Falk Huettmann, and John M. Morton a common set of 157 environmental predictor variables representing topographical features, climatic space, vegetation, anthropogenic variables, spatial structure, and 5 randomly generated neutral landscape variables for quality assessment. Models with that many predictors have not been used before in avian modeling, but are commonly used in similar types of applications in commercial disciplines. Random Forests produced strong models (ROC >0.8) for 16 bird species, marginal models (0.7 >ROC <0.8) for 13 species, and weak models (ROC <0.7) for 11 species. The ability of Random Forests to provide accurate predictive models was independent of how common or rare a bird was on the landscape. Random Forests did not rank any of the 5 neutral landscape variables as important for any of the 41 bird species. We argue that for inventory and monitoring programs the interpretive focus and confidence in reliability should be placed in the predictive ability of the map, and not in the assumed ecological meaning of the predictors or their linear relationships to the response variable. Given this focus, computer learning algorithms would provide a very powerful, cost-saving approach for building reliable predictions of species occurrence on the landscape given the current lack of knowledge on the ecological drivers for many species. Land management agencies need reliable predictions of current species distributions in order to detect and understand how climate change and other landscape drivers will affect future biodiversity.
Support vector machines to map rare and endangered native plants in Pacific islands forests
Ecological Informatics, 2012
It is critical to know accurately the ecological and geographic range of rare and endangered species for biodiversity conservation and management. In this study, we used support vector machines (SVM) for modeling rare species distribution and we compared it to another emerging machine learning classifier called random forests (RF). The comparison was performed using three native and endemic plants found at low-to mid-elevation in the island of Moorea (French Polynesia, South Pacific) and considered rare because of scarce occurrence records: Lepinia taitensis (28 observed occurrences), Pouteria tahitensis (20 occurrences) and Santalum insulare var. raiateense (81 occurrences). We selected a set of biophysical variables to describe plant habitats in tropical high volcanic islands, including topographic descriptors and an overstory vegetation map. The former were extracted from a digital elevation model (DEM) and the latter is a result of a SVM classification of spectral and textural bands from very high resolution Quickbird satellite imagery. Our results show that SVM slightly but constantly outperforms RF in predicting the distribution of rare species based on the kappa coefficient and the area under the curve (AUC) achieved by both classifiers. The predicted potential habitats of the three rare species are considerably wider than their currently observed distribution ranges. We hypothesize that the causes of this discrepancy are strong anthropogenic disturbances that have impacted low-to midelevation forests in the past and present. There is an urgent need to set up conservation strategies for the endangered plants found in these shrinking habitats on the Pacific islands.
Remote Sensing, 2021
Accurate maps of the spatial distribution of tropical tree species provide valuable insights for ecologists and forest management. The discrimination of tree species for economic, ecological, and technical reasons is usually necessary for achieving promising results in tree species mapping. Most of the data used in tree species mapping normally have some degree of imbalance. This study aimed to assess the effects of imbalanced data in identifying and mapping trees species under threat in a selectively logged sub-montane heterogeneous tropical forest using random forest (RF) and support vector machine with radial basis function (RBF-SVM) kernel classifiers and WorldView-2 multispectral imagery. For comparison purposes, the original imbalanced dataset was standardized using three data sampling techniques: oversampling, undersampling, and combined oversampling and undersampling techniques in R. The combined oversampling and undersampling technique produced the best results: F1-scores o...
2009
Remote sensing provides critical information for broad scale assessments of wildlife habitat distribution and conservation. However, such efforts have been typically unable to incorporate information about vegetation structure, a variable important for explaining the distribution of many wildlife species. We evaluated the consequences of incorporating remotely sensed information about horizontal vegetation structure into current assessments of wildlife habitat distribution and conservation. For this, we integrated the new NLCD tree canopy cover product into the US GAP Analysis database, using avian species and the finished Idaho GAP Analysis as a case study. We found: (1) a 15-68% decrease in the extent of the predicted habitat for avian species associated with specific tree canopy conditions, (2) a marked decrease in the species richness values predicted at the Landsat pixel scale, but not at coarser scales, (3) a modified distribution of biodiversity hotspots, and (4) surprising results in conservation assessment: despite the strong changes in the species predicted habitats, their distribution in relation to the reserves network remained the same. This study highlights the value of area wide vegetation structure data for refined biodiversity and conservation analyses. We discuss further opportunities and limitations for the use of the NLCD data in wildlife habitat studies.
Environmental Management, 2009
To achieve the overall objective of restoring natural environment and sustainable resource usability, each forest management practice effect needs to be predicted using a simulation model. Previous simulation efforts were typically confined to public land. Comprehensive forest management practices entail incorporating interactions between public and private land. To make inclusion of private land into management planning feasible at the regional scale, this study uses a new method of combining Forest Inventory and Analysis (FIA) data with remotely sensed forest group data to retrieve detailed species composition and age information for the Missouri Ozark Highlands. Remote sensed forest group and land form data inferred from topography were integrated to produce distinct combinations (ecotypes). Forest types and size classes were assigned to ecotypes based on their proportions in the FIA data. Then tree species and tree age determined from FIA subplots stratified by forest type and size class were assigned to pixels for the entire study area. The resulting species composition map can improve simulation model performance in that it has spatially explicit and continuous information of dominant and associated species, and tree ages that are unavailable from either satellite imagery or forest inventory data. In addition, the resulting species map revealed that public land and private land in Ozark Highlands differ in species composition and stand size. Shortleaf pine is a co-dominant species in public land, whereas it becomes a minor species in private land. Public forest is older than private forest. Both public and private forests have deviated from historical forest condition in terms of species composition. Based on possible reasons causing the deviation discussed in this study, corresponding management avenues that can assist in restoring natural environment were recommended.
Predicting species distributions and community composition using satellite remote sensing predictors
2021
Biodiversity is rapidly changing due to changes in the climate and human related activities; thus, the accurate predictions of species composition and diversity are critical to developing conservation actions and management strategies. In this paper, using oak assemblages distributed across the continental United States obtained from the National Ecological Observatory Network (NEON), we assessed the performance of stacked species distribution models (S-SDMs), constructed using satellite remote sensing as covariates and under a Bayesian framework, in order to build the next-generation of biodiversity models. This study represents an attempt to evaluate the integrated predictions of biodiversity models—including assemblage diversity and composition—obtained by stacking next-generation SDMs. We found three main results. First, environmental predictors derived entirely from satellite remote sensing represent adequate covariates for biodiversity modeling. Second, applying constraints to...
International Journal of Applied Earth Observation and Geoinformation, 2020
Semi-arid parkland agrosystems are strongly sensitive to climate change and anthropic pressure. In the context of sustainability research, trees are considered critical for various ecosystem services covering environment quality as well as food security and health. But their actual ecological impact on both cropland and natural vegetation is not well understood yet, and collecting spatial and structural information around agroforestry systems is becoming an important issue. Tree mapping in semi-arid parklands could be one of these prerequisites. While for obtaining an exhaustive inventory of individual trees and for analysing their spatial distribution, remote sensing is the ideal tool. However, it has been noted that depending on the spatial resolution and sensor spectral characteristics, tree species cannot be distinguished clearly, even in the sparsely vegetated semi-arid ecosystems of West Africa. Thus, this work focuses on assessing the capabilities of Worldview-3 imagery, acquired in 8 spectral bands, to detect, delineate, and identify certain key tree species in the Faidherbia albida parkland in Bambey, Senegal, based on a ground-truth database corresponding to 5000 trees. The tree crowns are delineated through NDVI thresholding and consecutive filtering to provide object-based radiometric signatures, radiometric indices, and textural information. A factorial discriminant analysis is then performed, which indicates that only four out of the seven most abundant species in the study area can be discriminated: "Faidherbia albida"," Azadirachta indica", "Balanites aegyptiaca" and "Tamarindus indica". Next, random forest and support vector machine classifiers are employed to identify the optimal combination of classifier parameters to discriminate these classes with a high accuracy, robustness, and stability. The linear support vector machine with cost=1 and gamma=0.01 provides the optimal results with a global accuracy of 88 % and kappa of 0.71. This classifier is applied to the whole study area to map all the trees with crowns larger than 2 m, sorted in four identified species and a fifth common group of unidentified species. This map thus enables analysing the variability in tree density and the spatial distribution of different species. Such information can afterwards be correlated to the ecological functioning of the parkland and local practices, and offers promising opportunities to help future sustainability initiatives in different socio-ecological contexts.
Remote Sensing, 2012
Mapping the spatial distribution of plant species in savannas provides insight into the roles of competition, fire, herbivory, soils and climate in maintaining the biodiversity of these ecosystems. This study focuses on the challenges facing large-scale species mapping using a fusion of Light Detection and Ranging (LiDAR) and hyperspectral imagery. Here we build upon previous work on airborne species detection by using a two-stage support vector machine (SVM) classifier to first predict species from hyperspectral data at the pixel scale. Tree crowns are segmented from the lidar imagery such that crown-level information, such as maximum tree height, can then be combined with the pixel-level species probabilities to predict the species of each tree. An overall prediction accuracy of 76% was achieved for 15 species. We also show that bidirectional reflectance distribution (BRDF) effects caused by anisotropic scattering properties of savanna vegetation can result in flight line artifacts evident in species probability maps, yet these can be largely mitigated by applying a semi-empirical BRDF model to the hyperspectral data. We find that confronting these three challenges-reflectance anisotropy, integration of pixel-and crown-level data, and crown delineation over large areas-enables species mapping at ecosystem scales for monitoring biodiversity and ecosystem function.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy
PloS one, 2015
Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods-binary support vector machine (SVM) and biased SVM-for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the thr...