GitHub - giopogg/webSDM: Model and predicts species accounting for their known trophic interaction network (original) (raw)
R package ‘webSDM’
Giovanni Poggiato 18/09/23
webSDM - including known trophic interactions in SDM
webSDM implements the trophic SDM model described in Poggiato et al., “Integrating trophic food webs in species distribution models improves ecological niche estimation and prediction”. Trophic SDM integrate a known trophic interaction network in SDM, thus providing more realistic and ecologically sound predictions. We here after present a quick introduction to some of the feature of the package, but we refer to the vignettes and the help functions for a more complete documentation of our package.
‘webSDM’ map
Installing the R package
Run to install webSDM
#install.packages('webSDM', repos = "http://cran.us.r-project.org") devtools::install_github("giopogg/webSDM") library(webSDM) library(ggplot2) library(rstanarm)
Fit a trophic SDM to data
Fitting a trophic SDM require the species distribution data, the environmental covariates and a known trophic interaction network.
We load a simulated datasets, where Y contains the species distribution data (a site x species matrix), X the environmental covariates (a sites x covariates matrix) and G is an igraph
object describing the interaction network (with links going from predators to prey). We specify, for every species, a quadratic relationship with respect to the environment, and we fit a model assuming bottom-up control (i.e., predators are modeled as a function of preys). We fit the trophicSDM in the Bayesian framework.
data(X, Y, G)
m = trophicSDM(Y = Y, X = X, G = G, env.formula = "~ X_1 + X_2", family = binomial(link = "logit"), mode = "prey", method = "stan_glm")
Inferred coefficients
We can see the formula of each glm in the argument $form.all
m$form.all
$Y1
[1] "y ~ X_1 + X_2"
$Y2
[1] "y ~ X_1 + X_2"
$Y3
[1] "y ~ X_1 + X_2"
$Y4
[1] "y ~ X_1+X_2+Y3"
$Y5
[1] "y ~ X_1+X_2+Y1+Y2+Y3"
$Y6
[1] "y ~ X_1+X_2+Y3+Y5"
We can have a first look to regression coefficients using the functionplot
.
We can access to the regression coefficients (eventually standardised) with the function coef
.
coef(m, standardise = T, level = 0.9)
$Y1
estimate 5% 95%
(Intercept) 0.3436947 0.3436947 0.3436947
X_1 -0.2021379 -0.2610196 -0.1437026
X_2 0.2551022 0.1967936 0.3060621
$Y2
estimate 5% 95%
(Intercept) -0.26667469 -0.2666747 -0.26667469
X_1 0.38623879 0.3185228 0.44548596
X_2 -0.07487371 -0.1398988 -0.01233344
$Y3
estimate 5% 95%
(Intercept) -0.6087146 -0.6087146 -0.6087146
X_1 0.2691505 0.2040072 0.3264109
X_2 0.1959559 0.1277228 0.2558819
$Y4
estimate 5% 95%
(Intercept) 0.78313457 0.78313457 0.783134568
X_1 -0.17572121 -0.23117578 -0.111252286
X_2 -0.01240440 -0.07179788 0.044998167
Y3 -0.07121957 -0.13386056 -0.008400504
$Y5
estimate 5% 95%
(Intercept) -0.6183171 -0.61831706 -0.6183171
X_1 0.2012124 0.13693977 0.2583883
X_2 -0.3040676 -0.36413006 -0.2482235
Y1 0.4429577 0.37314154 0.5052524
Y2 -0.0238899 -0.08046871 0.0309484
Y3 -0.2091481 -0.26014005 -0.1566986
$Y6
estimate 5% 95%
(Intercept) 1.32269012 1.32269012 1.32269012
X_1 -0.03325986 -0.09097127 0.01820810
X_2 0.08919394 0.02389159 0.15177465
Y3 -0.24024322 -0.30943078 -0.17621452
Y5 -0.09973197 -0.15640795 -0.04450875
We can visualise the biotic coefficients with the functionplotG_inferred
We can also visualise the importance of each (set of) variable with the function `computeVariableImportance’
VarImpo = computeVariableImportance(m, groups = list("Abiotic" = c("X_1","X_2"), "Biotic" = c("Y1","Y2", "Y3", "Y4", "Y5", "Y6"))) VarImpo = apply(VarImpo, 2, function(x) x/(x[1]+x[2])) tab = reshape2::melt(VarImpo) tab$Var2 = factor(tab$Var2, levels = colnames(Y)) ggplot(tab, aes(x = Var2, y = value, fill = Var1)) + geom_bar(stat="identity") + theme_classic()
Local objects
We can access each local model in the field $model
and then select a given local model, that can be analysed using some the implemented methods
m$model$Y5
==================================================================
Local SDMfit for species Y5 with no penalty , fitted using stan_glm
==================================================================
* Useful S3 methods
print(), coef(), plot(), predict()
$model gives the stanreg class object
==================================================================
Predictions
We can predict with a fitted trophic SDM in multiple ways. The most straightforward way is to use the function predict
. We can for example obtain 50 draws from the posterior predictive distribution of species (pred_samples = 50) using predicted presence-absences of species to predict their predators (prob.cov = TRUE). When we don’t specify Xnew, the function sets Xnew = X by default. We can thus obtain the fitted values of the model and compute goodness of fit metrics. Notice that other metrics like AIC or loo are also available.
Ypred = predict(m, fullPost = FALSE, pred_samples = 50, prob.cov = FALSE)
predict returns a list contaning for each species the predictive samples at each site
But since we specified fullPost = FALSE it only give back the predictive mean and quantiles
Ypred = do.call(cbind, lapply(Ypred, function(x) x$predictions.mean))
Ypred = Ypred[,colnames(Y)] evaluateModelFit(m, Ynew = Y, Ypredicted = Ypred)
auc tss species
1 0.6733574 0.28064516 Y1
2 0.7048959 0.31015078 Y2
3 0.6690035 0.27931242 Y3
4 0.5988342 0.17092061 Y4
5 0.6236528 0.20786045 Y5
6 0.5581631 0.09940342 Y6
m$log.lik
[1] -3638.686
m$AIC
[1] 7337.373
loo(m)
[1] -3650.931
We can also evaluate the quality of model predictions using K-fold cross-validation:
CV = trophicSDM_CV(m, K = 3, prob.cov = T, run.parallel = FALSE)
Transfom in a table
Ypred = CV$meanPred
Re order columns
Ypred = Ypred[,colnames(Y)] evaluateModelFit(m, Ynew = Y, Ypredicted = Ypred)
We can now evaluate species probabilities of presence for the environmental conditions X_1 = 0.5 and X_2 = 0.5.
Pred = predict(m, Xnew = data.frame(X_1 = 0.5, X_2 = 0.5), fullPost = F) t(do.call(cbind, Pred))
predictions.mean predictions.q025 predictions.q975
Y1 0.6275369 0.5919506 0.6673241
Y2 0.6811964 0.6555955 0.7031938
Y3 0.7187342 0.6891241 0.7377668
Y4 0.4978676 0.4544304 0.5767335
Y5 0.3806087 0.0963942 0.7238955
Y6 0.6484422 0.4698095 0.8539271
We can also obtain an estimation of the potential niche, that corresponds, in the bottom-up approach, to the probability of presence of a species given that its preys are present. We can for example compute the probability of presence of species for the environmental conditions X_1 = 0.5 and X_2 = 0.5 assuming all their preys to be present.
Ypred = predictPotential(m, fullPost = FALSE, pred_samples = 100, Xnew = data.frame(X_1 = 0.5, X_2 = 0.5))
Frequentist approach
Notice that we can also fit a trophic SDM in the frequentist approach.
m = trophicSDM(Y = Y, X = X, G = G, env.formula = "~ X_1 + X_2", family = binomial(link = "logit"), mode = "prey", method = "glm")
All the above-mentioned functions are also available in the frequentist framework, with adaptations when necessary (e.g. coefficients are significant or not depending on the p-values instead of the credible interval). However, error propagation is not available and we can only obtain one prediction for each species and site, instead of multiple samples in the Bayesian case.
Penalisation
We can infer a sparse model by specifying penal = "horshoe"
if we setmethod = "stan_glm"
(i.e. in the Bayesian framework), orpenal = "elasticnet"
if we set method = "glm"
(i.e. in the frequentist framework).
m = trophicSDM(Y = Y, X = X, G = G, env.formula = "~ X_1 + X_2", family = binomial(link = "logit"), mode = "prey", method = "glm", penal = "elasticnet")
m = trophicSDM(Y = Y, X = X, G = G, env.formula = "~ X_1 + X_2", family = binomial(link = "logit"), mode = "prey", method = "stan_glm", penal = "horshoe")
Composite variables
We can include composite variables using the arguments sp.formula
andsp.partition
. We refer to the vignette ‘Composite variable’ for an exhaustive description of these arguments and how to use them to obtain any kind of model specification. For example, we can model species as a function of of their prey richness.
m = trophicSDM(Y = Y, X = X, G = G, env.formula = "~ X_1 + X_2", sp.formula = "richness", family = binomial(link = "logit"), mode = "prey", method = "glm") m$form.all
$Y1
[1] "y ~ X_1 + X_2"
$Y2
[1] "y ~ X_1 + X_2"
$Y3
[1] "y ~ X_1 + X_2"
$Y4
[1] "y ~ X_1+X_2+I(Y3)"
$Y5
[1] "y ~ X_1+X_2+I(Y1 + Y2 + Y3)"
$Y6
[1] "y ~ X_1+X_2+I(Y3 + Y5)"
Author
This package is currently developed by Giovanni Poggiato from Laboratoire d’Ecologie Alpine. It is supported by the ANR GAMBAS. The framework implemented in this package is described in: “Integrating trophic food webs in species distribution models improves ecological niche estimation and prediction” Poggiato Giovanni, Jérémy Andréoletti, Laura J. Pollock and Wilfried Thuiller. In preparation.