jueves, 9 de julio de 2020

N-mixture models to estimate animal abundance with R and unmarked

Hi there!

 During the pandemic lockdown I was studying how N-mixture models to estimate animal abundance works. As a result, I wrote this small tutorial in R (Spanish) that helped me to better understand such models. Any correction or suggestion can be made by posting here or at jflopez.bio@gmail.com.

Hope you enjoy it!

viernes, 3 de abril de 2020

Another "flatten the COVID-19 curve" simulation... in R

 Hi there,

This is the best meme I've found during these days...

Well, here it is my "BUT" contribution. Some weeks ago The Washington Post published this simulations about how "social distancing" could help to "flat the curve" of COVID-19 infections. I fell in love with these simulations because their simplicity and explanatory power. Indeed, you can use the pretty similar principles to simulate predator hunt behavior and other movement patterns... I wrote some R code to have a tiny version of them...

 r <- raster(nrows = 100, ncols = 100, xmn = 0, xmx = 100, ymn = 0, ymx = 100)   
 r[] <- 0  
 steps <- 500  
 n <- 100  
 locations <- data.frame(matrix(ncol = n, nrow = steps))  
 pp <- randomPoints(r,n)  
 cell <- cellFromXY(r, pp)  
 infected <- 1:10  
 pob <- 1:n  
 ratio <- data.frame(1:steps, NA)  
 pob_inf <- list()  
 for (j in 1:steps){  
  locations[j,] <- cell  
  for(i in 1:n){  
   cell[i] <- adjacent(r, cell[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell[i], 8))+0.49),0),2]  
  new_inf <- cell %in% cell[infected]  
  infected <- pob[new_inf]  
  ratio[j,2] <- length(infected)  
  pob_inf[[j]] <- infected  
 locations2 <- data.frame(matrix(ncol = n, nrow = steps))  
 cell2 <- cellFromXY(r, pp)  
 infected2 <- 1:10  
 pob2 <- 1:n  
 ratio2 <- data.frame(1:steps, NA)  
 pob_inf2 <- list()  
 for (j in 1:steps){  
  locations2[j,] <- cell2  
  for(i in 1:25){  
   cell2[i] <- adjacent(r, cell2[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell2[i], 8))+0.49),0),2]  
  new_inf2 <- cell2 %in% cell2[infected2]  
  infected2 <- pob2[new_inf2]  
  ratio2[j,2] <- length(infected2)  
  pob_inf2[[j]] <- infected2  

Let's make some plots to put them together in a GIF and better visualize the results...

 num <- seq(1,500,4)  
 for (p in 1:125){  
  id <- sprintf("%03d", num[p])  
  png(paste("corona_",id,".png", sep=""), width=780, height=800, units="px", pointsize = 15)  
  layout(matrix(c(1,1,2,3),2,2,byrow = TRUE))  
  plot(ratio[1:num[p],],pch=19, xlim = c(0,500), ylim = c(0,100), ylab = "nº of infected",   
     xlab = "time", col = "red", cex.axis = 1.4, cex.lab = 1.4, main = "Infected curves", cex.main = 2)  
  legend("topright", legend = c("free movement", "restricted movement"),lwd = c(4,4), col = c("red","blue"), cex = 1.5 )  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "free movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations[num[p],]))[pob_inf[[num[p]]],], pch = 18, col = "red", cex = 1.4)  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "restricted movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations2[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations2[num[p],]))[pob_inf2[[num[p]]],], pch = 18, col = "blue", cex = 1.4)  


 Stay safe!

martes, 18 de febrero de 2020

rWind is working again!

 Yep, after several months fallen, rWind R package is working again! I'm sorry, but I'm too busy lately with my PhD dissertation and I have not all the time that I'd like to improve rWind 😞. The problem was due to the change of the URL to GFS server, and it was solved thanks to a pull request in GitHub... Thank you very much! The new version on CRAN (1.1.5) comes with some new functions to download sea current data from the OSCAR database! New examples with sea and wind currents coming soon...  also on Twitter @javi_ferlop


domingo, 14 de abril de 2019

Understanding Bayesian Inference with a simple example in R!

 Hi there!

Last summer, the Royal Botanical Garden (Madrid, Spain) hosted the first edition of MadPhylo, a workshop about Bayesian Inference in phylogeny using RevBayes. It was a pleasure for me to be part of the organization staff with John Huelsenbeck, Brian Moore, Sebastian Hoena, Mike May, Isabel Sanmartin and Tamara Villaverde. Next edition of Madphylo will be held June 10, 2019 to June 19, 2019at the Real Jardín Botánico de Madrid. If you are interested in Bayesian Inference and phylogeny just can't miss it! You'll learn the RevBayes language, a programming language to perform phylogeny (and other) analyses under a  Bayesian framework! Apply here!


The first days were focused to explain how we can use the Bayesian framework to estimate the parameters of a model. I'm not an expert in Bayesian Inference at all, but in this post I'll try to reproduce one of the first Madphylo tutorials in R language. As a simple example, we'll use a coin flipping experiment. We can model this experiment with a binomial distribution:

binomial(n, p)

where n is the number of tosses and the parameter p is the probability to obtain a head. Given the number of tosses and the number of heads (h), we can use Bayesian Inference to compute the probability to obtain a head (p).

If we use Bayes’ theorem, we have that the probability of a specific value of p given the number of heads (h) is:

where Pr(h|p) is the likelihood of the data given a value of p, Pr(p) is the prior probability for p and Pr(h) is the marginal likelihood for the data. The prior probability of our parameter represent our believe about what values can take the parameter. If we think our parameter can take whatever value with the same probability, we use an uniform (flat) prior.

We can use a Markov Chain Monte Carlo (MCMC) to introduce many different values of p and get the posterior probability of these values. To do this, we can use the Metropolis-Hastings MCMC algorithm, with the next steps (simplified):

- Step 1) Set an initial value for p.

 p <- runif(1, 0, 1)  

- Step 2) Propose a new value of p, called p'.

 p_prime <- p + runif(1, -0.05, 0.05)  

- Step 3) Compute the acceptance probability of this new value for the parameter. We have to check if the new value improves the posterior probability given our data. This can be seen as the ratio: Pr(p'|h) / Pr(p|h).

It can be simplify as:

The advantage of this method is that we avoid to compute the marginal likelihood, that is often difficult to obtain with more complex models. Let’s stop here a little bit to explain each term of this equation.

Since our main model is a binomial model (coin toss), the likelihood function Pr(h|p) can be defined in R language as:

 # help(dbinom)  
 likelihood <- function(h, n, p){  
  lh <- dbinom(h, n, p)  

Pr(p) is our prior probability, that is, our believe about what values can take p. The only thing that we know is that it must be a value between 0 and 1, since it is a probability. If we think that all values have the same probability, we can define a flat prior using the beta distribution. A beta distribution with parameters β(1,1) is a flat distribution between 0 and 1 (you can learn more about beta distribution here). Then, in R we can obtain Pr(p) under a β(1,1) as:

 # help(dbeta)  
 dbeta(p, 1, 1)  

Now, the acceptance probability (R, see equations in Step 3) will be the minimum value: 1 or the ratio of posterior probabilities given the different p. We express this equation in R language as follows:

 R <- likelihood(h,n,p_prime)/likelihood(h,n,p) * (dbeta(p_prime,1,1)/dbeta(p,1,1))  

- Step 4) Next, we generate a uniform random number between 0 and 1. If this number is < R, we will accept the new value for p (p') and we update the value of p = p'. If not, the change is rejected.

  if (R > 1) {R <- 1}  
   random <- runif (1, 0, 1)  
  if (random < R) {  
   p <- p_prime  

- Step 5) Now we record the current value of p.

 posterior[i,1] <- log(likelihood(h, n, p))  
 posterior[i,2] <- p  

Finally, we should repeat this loop many times to obtain a good estimate of p. This can be easily done in R using a For loop (check the full code below).
  Following the example studied in MadPhylo with RevBayes, I wrote some code to reproduce this example with R. We toss a coin 100 times, and we obtain 73 heads. Is my coin biased? Let's check what is the probability to obtain a head given the data:

 # Set the numer of tosses.  
 n <- 100
 # Set the number of heads obtained.  
 h <- 73  
 # Define our likelihood function.   
 # Since our model is a binomial model, we can use:  
 likelihood <- function(h,n,p){  
  lh <- dbinom(h,n,p)  
 # Set the starting value of p  
 p <- runif(1,0,1)  
 # Create an empty data.frame to store the accepted p values for each iteration.  
 # Remember: "the posterior probability is just an updated version of the prior"  
 posterior <- data.frame()  
 # Set the lenght of the loop (Marcov Chain, number of iterations).  
 nrep <- 5000  
 # Start the loop (MCMC)  
 for (i in 1:nrep) {  
  # Obtain a new proposal value for p  
  p_prime <- p + runif(1, -0.05,0.05)  
  # Avoid values out of the range 0 - 1  
  if (p_prime < 0) {p_prime <- abs(p_prime)}  
  if (p_prime > 1) {p_prime <- 2 - p_prime}  
  # Compute the acceptance proability using our likelihood function and the  
  # beta(1,1) distribution as our prior probability.  
  R <- likelihood(h,n,p_prime)/likelihood(h,n,p) * (dbeta(p_prime,1,1)/dbeta(p,1,1))  
  # Accept or reject the new value of p  
  if (R > 1) {R <- 1}  
   random <- runif (1,0,1)  
  if (random < R) {  
   p <- p_prime  
  # Store the likelihood of the accepted p and its value  
  posterior[i,1] <- log(likelihood(h, n, p))  
  posterior[i,2] <- p  

Let's plot some results...

 par(mfrow= c(1,2))  
 prior <- rbeta(5000, 1,1)  
 plot(1:5000 ,posterior$V2, cex=0, xlab = "generations", ylab = "p",  
    main = "trace of MCMC\n accepted values of parameter p\n prior = beta(1,1) generations = 5000")  
 lines(1:5000, posterior$V2, cex=0)  
 abline(h=mean(posterior$V2), col="red")  
 plot(density(posterior$V2), xlim = c(min(min(prior),min((posterior$V2))), max(max(prior),max((posterior$V2)))),   
    ylim = c(0, max(max(density(prior)$y),max((density(posterior$V2)$y)))), main= "prior VS posterior\n prior= beta(1,1)",  
    lwd=3, col="red")  
 lines(density(prior), lwd=3, lty=2, col="blue")  
 legend("topleft", legend=c("prior density","posterior density"),  
     col=c("blue","red"), lty=c(3,1), lwd=c(3,3), cex = 1)  

Well, we can see that the probability to obtain a head given our data is around 0.7, so our coin must be a fake!

 You can play with the code and explorewith a different number of tosses, or the effect of a different prior for p...

If you want to learn more about Bayesian Inference, I recommend you these YouTube videos by Rasmus Bååth, or this wonderful book/course by Richard McElreath

That's all!


domingo, 25 de noviembre de 2018

Plotting wind highways using rWind

Hi there!

Our manuscript about rWind R package has been recently accepted for publication in Ecography! As you know, rWind is a tool used to download and manage wind data, with some utilities that make easy to include wind information in ecological or evolutionary analyses (or others!).

Though there are several examples in the Supporting material of the publication, and others in the vignette included in the CRAN version, here you have a full tutorial with the main utilities of rWind... enjoy!

Computing wind connectivity of Azores Archipelago

In this example, we will compute wind connectivity of Azores Islands with different points of Europe, Africa and North America during June 2018. I will divide each task of this tutorial in different sections for clarification.


Downloading wind time series using lubridate and wind.dl_2()

In the last version of rWind (1.0.4), we can use lubridate to create a list of dates which will be used latter with wind.dl_2 to download wind time series. The object obtained with wind.dl_2 is an "rWind_series" "list". This object can be easily used with other rWind functions as we'll see later.

 # Make sure that you have the last version of rWind  
 # Here, we use ymd_hms from lubridate package to create a sequence of dates  
 dt <- seq(ymd_hms(paste(2018,6,1,12,00,00, sep="-")),  
      ymd_hms(paste(2018,6,30,12,00,00, sep="-")),by="1 days")  
 # Now we can use wind.dl_2 with this sequence of dates. Have into account  
 # that it could take a while, they are 30 datasets and it's a big area.   
 ww <- wind.dl_2(dt,-85,5,20,60)  

Obtaining maximum wind speed during the time series with dplyr

We can use the rWind function called "tidy" to play with the "rWind_series" "list" object to obtain, for example, the maximum speed reported for each cell in the study area during the time series.

 # After tidy our wind data, we can use dplyr utilities to easily  
 # obtain several metrics from our wind data. Notice that wind average  
 # is a special case, and it will be discussed later  
 t_ww <- tidy(ww)  
 g_ww <- t_ww %>% group_by(lat, lon)  
 # Now, we select only the maximum speed values for each coordinate  
 max_ww <- g_ww %>% summarise(speed = max(speed))  
 maxw <- cbind(max_ww$lon, max_ww$lat, max_ww$speed)  
 # We can also check the maximum speed value in the whole study area   
 # Finally, we can transform this data into a raster layer  
 rmax <- rasterFromXYZ(maxw)  
 acol <- colorRampPalette(c("white", "blue", "darkblue"))  
 plot(rmax, col=acol(1000), main= "Maximum wind speed reported",  
     xlab="Longitude", ylab="Lattitude")  
 lines(getMap(resolution = "high"), lwd=2)  

Computing averages of wind speeds and directions

Since wind data is a vectorial magnitude given in two components (U and V), to compute speed and direction averages we should use both vectors. To make it easier, rWind includes wind.mean function. Notice that wind.mean is specially developed to work with wind.dl_2 outputs, so it will require "rWind_series" object as input.
 # We can easily use wind.mean() function  
 w_mean <- wind.mean(ww)  
 # And then use wind2raster directly to create a raster layer  
 r_mean <- wind2raster(w_mean)  

 # We can plot a subset of this raster around Azores Islands.  
 # Using "arrowDir" we can include arrows for wind direction with   
 # "Arrowhead" function from "shape" R package  
 plot(r_mean$wind.speed, main= "Wind speed and direction average",
      col=acol(1000), xlim=c(-38,-18), ylim=c(33,43))  
 alpha <- arrowDir(w_mean)  
 Arrowhead(w_mean$lon, w_mean$lat, angle=alpha,   
      arr.length = 0.2, arr.type="curved")  
 lines(getMap(resolution = "low"), lwd=4)  

Wind connectivity between Azores and mainland

To complete this tutorial, we will use the downloaded wind data to compute wind connectivity between several points in mainland and Azores Archipelago. This kind of measures can be useful in order to establish theories about island colonization by plants or other organisms that could be influenced by wind currents. We will also use this connectivity to compute least cost paths between two locations, in order to virtually plot a wind highway connecting mainland with Azores islands.

 # First, we define our locations. 7 locations in the East   
 # and 7 in the West. The last one is a point in the Azores islands  
 loc <- matrix(c(-7, -4, -1, -9, -6,-10,-15,   
         -60, -56,-63,-76,-76,-81,-80, -27,  
         55, 50, 45, 40, 35, 30, 25,   
         55, 50, 45, 40, 35, 30, 25, 39), 15, 2)  
 colnames(loc) <- c("lon", "lat")  
 rownames(loc) <- c(seq(1,14,1), "Azores")  
 # Check how "loc" is looking   
 # You can plot them  
 #plot(r_mean$wind.speed, col=acol(1000))  
 #lines(getMap(resolution = "low"), lwd=2)  
 #points(loc, col="red4", pch = 19, cex = 2)  

Now, we'll execute the next tasks:
  1. We'll use wind2raster to obtain raster layers of wind direction and speed of each day in the time series.
  2. Then, for each day, we'll compute the conductance matrix, a matrix with connectivity values between every single cell of the entire study area.
  3. Later, with each conductance matrix, we'll compute the cost between our locations and will store them in a cost_list object.
  4. Finally, for each day, if the Cost is not Infinite, we'll get the shortest path between two selected locations, and we'll store it into a “paths” object.

 # 1) Raster layers from wind time series  
 layers <- wind2raster(ww)  
 # 2) Conductance from wind direction and speed  
 Conductance <- flow.dispersion(layers, type="passive",  
 # 3) and 4) Using a loop to compute cost between  
 # locations and least cost paths between location 11  
 # and Azores islands (as an example).  
 cost_list <- array(NA_real_, dim=c(15,15,30))  
 paths <- list(1:30)  
 for (i in 1:30){  
  cost_list[,,i] <- costDistance(Conductance[[i]],loc)  
  if (costDistance(Conductance[[i]],loc[11,] , loc[15,]) != Inf){  
   paths[[i]] <- shortestPath(Conductance[[i]], loc[11,], loc[15,],  

Now, we will manage a little bit the results to plot them in a cool way.

 # Transform costs in connectivity  
 connectivity <- 1/cost_list  
 # Obtain connectivity averages  
 conn_avg <- apply(connectivity, c(1, 2), mean, na.rm = TRUE)  
 # Here we filter only connectivity from mainland to Azores  
 # As connectivity has arbitrary magnitude and units, we will   
 # multiply connectivity values 1000 times to avoid decimal zeros.  
 conn_azores <- conn_avg[,15]  
 conn_azores <- conn_azores *1000  
 # Finally, we can plot connectivity values relative to the width  
 # of the connections between mainland and Azores. You can power  
 # the values to increase the differences between them.  
 plot(r_mean$wind.speed, main= "Connectivity from mainland to Azores",
 lines(getMap(resolution = "low"), lwd=1)  
 for (i in 1:14){  
  lines(rbind(loc[i,], loc[15,]), lwd = conn_azores[i]^3, col="red4")  
 points(loc, col="red", pch = 19, cex = 1.5)  
 text( loc+2, labels=c(round(conn_azores[1:14], 2),"Azores"),  
    col="white", cex= 1.5, font=2)  

Last but not least, we can use the shortest paths stored for each day to plot a density heatmap. This can help us to visualize wind corridors across our study area (our wind highways!).

 # We'll use spatstat library to convert the shortest  
 # paths lines into an psp object.  
 paths_clean <- paths[!sapply(paths, is.null)]  
 paths_merged <- paths_clean[[1]]  
 for (h in 2:length(paths_clean)) {  
  paths_merged <- rbind(paths_merged,paths_clean[[h]])  
 paths_psp <- as(paths_merged, "psp")  
 # Now, we can obtain a kernel density of the paths  
 # and convert it into a raster layer that can be plotted.  
 # You can play with sigma value to modify the smoothing.  
 lines_kernel <- density(paths_psp, sigma=1, dimyx=c(350,410))  
 kernel <- raster(lines_kernel)  
 ext <- extent(r_mean$wind.speed)  
 kernel <- extend(kernel, ext)  
 kernel[is.na(kernel)] <- minValue(kernel)  
 acol <- colorRampPalette(c("grey40","darkblue","red2",
 plot(kernel, col=acol(1000), zlim=c(-0.01,5),  
       main = "Least cost paths density", xlab="Longitude",  
 lines(getMap(resolution = "high"), lwd=2)

Isn't it cool?

 That's all for now!