lunes, 28 de junio de 2021

Missing Migrants, tracking human deaths along migratory routes

 Hi there,

Several years ago I received the book “Como si nunca hubieran sido” as a Christmas present. In this poem, Javier Gallego and Juan Gallego talk about the real drama suffered by immigrants trying to reach Europe through the Mediterranean route. The current pandemic situation has overshadowed many problems that affect millions of people each year. Missing Migrants Project tracks incidents involving migrants, including refugees and asylum-seekers, who have died or gone missing in the process of migration towards an international destination. It shows the situation of many people who are forced to emigrate even during the pandemic time. And it also helps me to contextualize my first world problems. 


Here you have a simple analyses about the human migration fatalities during the last years in the Mediterranean corridor using R. 


 # Let's download and read the data from the Missing Migrants Project web site  
 url <- ""  
 mm <- read.csv("/home/javifl/Descargas/MissingMigrants-Global-2021-06-27T21-15-07.csv")  
 # Filter the data for Mediterranean region  
 mm <- mm %>% filter(Region == "Mediterranean") %>% filter(Migration.Route != "")  
 mm$lon <- as.numeric(unlist(map(strsplit(mm$Location.Coordinates, ", "), 2)))  
 mm$lat <- as.numeric(unlist(map(strsplit(mm$Location.Coordinates, ", "), 1)))  
 mm$date <- mdy(mm$Reported.Date)  
 # plot the incidence of fatalities  
 mmInc <- incidence(mm$date, interval = 30, groups = mm$Migration.Route)  
 plot(mmInc, labels_week = F, stack = TRUE, border = "grey") + 
      scale_x_date(labels = date_format("%B %Y"))  


We can plot the monthly incidence of accidents involving migrants using the R package incidence. Here you can see how during the first months of 2020 the migration fatalities decreased, probably due to the pandemic impact. However, human migration incidents increased during the second half of 2020 despite of pandemic restrictions in West and Central Mediterranean routes.


Finally, we can plot monthly incidents using ggmap R package.

 months <- seq(ymd("2014-01-1"),ymd(max(mm$date)), by = '1 month')  
 myLocation<-c(min(mm$lon)-2.5, min(mm$lat)-2, max(mm$lon)+1, max(mm$lat)+1)  
 myMap<-get_map(location=myLocation, source="stamen", maptype="watercolor", crop=FALSE)  
 for (i in 1:length(months)-1){  
  int <- interval(months[i], months[i+1])  
  newmm <- mm[mm$date %within% int,]  
  id <- sprintf("%02d", i)  
  if (nrow(newmm) > 0){  
   png(paste("mm",id,".png", sep=""), width=900, height=610, units="px",    
   print(ggmap(myMap) +   
       geom_point(aes(x=lon, y=lat, size = 2), data=newmm, col="orange", alpha=0.5) +  
       #geom_density2d(aes(x=lon, y=lat), data=newmm, size = 0.3) +   
       #stat_density2d(data = newmm,  
       #        aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), size = 0.01,  
       #        bins = 16, geom = "polygon") + scale_fill_gradient(low = "green", high = "red") +  
       scale_alpha(range = c(0, 0.2), guide = FALSE) +  
       theme(legend.position = "none", plot.title = element_text(size = 25)) +  
       ggtitle(paste(year(months[i]), month(months[i], label = T, abbr = F), sep=" - ")))  

Stay safe!

"No soy un número ni parte de una cifra 

aunque se paga por igual la misma tarifa..."

viernes, 22 de enero de 2021

Are betting houses randomly distributed in Madrid? Point pattern analysis in R

Hi there!

  Last December 13th there was a demonstration in Carabanchel, my home neighborhood in Madrid (Spain) against the proliferation of betting houses (small casinos where you can get cheap drinks or coffees while you spend your money betting or gambling). Many neighborhood associations usually complain against this kind of places arguing they bait young people with very low prices, who can develop gambling addiction in the future. Moreover some organizations ensure that the proliferation of this kind of places is focused in working class neighborhoods with low incomes and high unemployed rates (see this great report for more information about that - in Spanish). So, how could we know if betting houses are randomly distributed in Madrid?

  Imagine that you have a 25x25 (625 cells) square grid and you want to scatter 2500 points. If the points are randomly distributed, they will have the same probability of occupy whatever cell. That is, the first point will have a probability of 1/625 of being in cell number 1, the same probability (1/625) probability of being at cell number 2, and so on. Thus, if we have 2500 points in 625 cells, each cell would have 2500/625 = 4 points in average. This is the meaning of the lambda from a Poisson distribution, the mean or the most probable value.

  Imagine now that we have the same points in the same grid, but now they are clustered in 3 groups. In this dummy example, we can easily guess that they are not randomly distributed by just observing the point pattern. But how could we demonstrate mathematically/numerically they are not randomly distributed?

  Well, given we know the shape of the histogram (distribution) that should have a completely random distribution (in our case a Poisson(λ=4)), an easy and straightforward way to demonstrate mathematically that the second point patter is not random at all would be to compare both distributions using for example a Kolmogorov–Smirnov test in R.

  Lets do the same with betting houses in Madrid city (Spain). Betting places locations can be downloaded from the city government website, as well as other data such as neighborhood limits or family income per neighborhood.I prepared a .zip with the needed files which can be downloaded from here.

There are currently 409 betting houses. We will divide the city in a grid of 18x18 cells of 1 km size as follows


 # Load required packages  
 # Load data  
 nb <- readOGR("/home/javifl/gambling/SHP_ETRS89/barrios_income_pop.shp")  
 bh <- read.csv("/home/javifl/gambling/betting_house.csv")  
 # Create the grid  
 bh.sp <- SpatialPointsDataFrame(coords=bh[,1:2], data=bh)  
 ext <- extent(c(min(bh$x),min(bh$x)+18000,min(bh$y),min(bh$y)+18000))  
 landscape <- raster(ncols=18,nrows=18,ext=ext, vals=1:324)  
 grid <- rasterToPolygons(landscape)  
 # Plot betting houses in Madrid  
 plot(nb, main = "City of Madrid")  
 plot(grid, lwd = 0.4, add = TRUE)  
 points(bh.sp, pch=16, cex=0.7, col="black")  


 Cool. Since Madrid is an heterogeneous city, we will select for our analysis only those cells in which there is one or more betting houses. Then, we'll spread random points in the selected squares to compare both histograms/distributions: random points VS betting houses.

 # Select cells with betting houses  
 grid$bh <- poly.counts(bh.sp, grid)  
 sel.sqr <- (grid[grid$bh > 0,])  
 landscape[grid$bh == 0] <- NA  
 # Plot  
 lines(nb, lwd = 0.2)  
 points(bh.sp, pch=16, cex=0.7, col="black")  
 # Add random points in selected cells  
 rp <- randomPoints(disaggregate(landscape, 100), nrow(bh))  
 points(rp, pch=16, cex=0.7, col = "red" )  
 rp.sp <- SpatialPointsDataFrame(coords=rp[,1:2], data=data.frame(rp))  

 Well done! We have selected 118 cells with at least one betting house and we have scattered 409 random points on them. So, now the last task is to count the number of betting houses and the number of random points at each cell. Then, we could compare both histograms/ distributions, and decide if they present a similar pattern.

 par(mfrow = c(1,2))  
 hist(poly.counts(rp.sp, sel.sqr), right=F, main = "random points",   
 xlab="number of random points by cell")  
 hist(poly.counts(bh.sp, sel.sqr), right=F, main = "betting houses",   
 xlab="number of betting houses by cell")  

As you can see, under a random point pattern the most frequent value is between 3 and 4. Since we have 409 points randomly distributed over 108 cells, we expected to have 409/108 = 3.78 points at each cell. It looks pretty good!

However, we can see that the most frequent value in the betting house histogram is between 0 and 2... This is because there are many empty cells, while there are some of them with high number of betting houses (a clustered pattern). We could compare both distributions mathematically using a Kolmogorov–Smirnov test.

 #Kolmogorov–Smirnov test  
 ks.test(poly.counts(rp.sp, sel.sqr), poly.counts(bh.sp, sel.sqr))  


Bonus Track: so, what variable or variables drive the distribution of betting houses in Madrid? Here you have a clue... but take the data and explore it yourself!



Stay safe!





jueves, 9 de julio de 2020

N-mixture models to estimate animal abundance with R and unmarked

Hi there!

 During the pandemic lockdown I was studying how N-mixture models to estimate animal abundance works. As a result, I wrote this small tutorial in R (Spanish) that helped me to better understand such models. Any correction or suggestion can be made by posting here or at

Hope you enjoy it!

viernes, 3 de abril de 2020

Another "flatten the COVID-19 curve" simulation... in R

 Hi there,

This is the best meme I've found during these days...

Well, here it is my "BUT" contribution. Some weeks ago The Washington Post published this simulations about how "social distancing" could help to "flat the curve" of COVID-19 infections. I fell in love with these simulations because their simplicity and explanatory power. Indeed, you can use the pretty similar principles to simulate predator hunt behavior and other movement patterns... I wrote some R code to have a tiny version of them...

 r <- raster(nrows = 100, ncols = 100, xmn = 0, xmx = 100, ymn = 0, ymx = 100)   
 r[] <- 0  
 steps <- 500  
 n <- 100  
 locations <- data.frame(matrix(ncol = n, nrow = steps))  
 pp <- randomPoints(r,n)  
 cell <- cellFromXY(r, pp)  
 infected <- 1:10  
 pob <- 1:n  
 ratio <- data.frame(1:steps, NA)  
 pob_inf <- list()  
 for (j in 1:steps){  
  locations[j,] <- cell  
  for(i in 1:n){  
   cell[i] <- adjacent(r, cell[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell[i], 8))+0.49),0),2]  
  new_inf <- cell %in% cell[infected]  
  infected <- pob[new_inf]  
  ratio[j,2] <- length(infected)  
  pob_inf[[j]] <- infected  
 locations2 <- data.frame(matrix(ncol = n, nrow = steps))  
 cell2 <- cellFromXY(r, pp)  
 infected2 <- 1:10  
 pob2 <- 1:n  
 ratio2 <- data.frame(1:steps, NA)  
 pob_inf2 <- list()  
 for (j in 1:steps){  
  locations2[j,] <- cell2  
  for(i in 1:25){  
   cell2[i] <- adjacent(r, cell2[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell2[i], 8))+0.49),0),2]  
  new_inf2 <- cell2 %in% cell2[infected2]  
  infected2 <- pob2[new_inf2]  
  ratio2[j,2] <- length(infected2)  
  pob_inf2[[j]] <- infected2  

Let's make some plots to put them together in a GIF and better visualize the results...

 num <- seq(1,500,4)  
 for (p in 1:125){  
  id <- sprintf("%03d", num[p])  
  png(paste("corona_",id,".png", sep=""), width=780, height=800, units="px", pointsize = 15)  
  layout(matrix(c(1,1,2,3),2,2,byrow = TRUE))  
  plot(ratio[1:num[p],],pch=19, xlim = c(0,500), ylim = c(0,100), ylab = "nº of infected",   
     xlab = "time", col = "red", cex.axis = 1.4, cex.lab = 1.4, main = "Infected curves", cex.main = 2)  
  legend("topright", legend = c("free movement", "restricted movement"),lwd = c(4,4), col = c("red","blue"), cex = 1.5 )  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "free movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations[num[p],]))[pob_inf[[num[p]]],], pch = 18, col = "red", cex = 1.4)  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "restricted movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations2[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations2[num[p],]))[pob_inf2[[num[p]]],], pch = 18, col = "blue", cex = 1.4)   


 Stay safe!

martes, 18 de febrero de 2020

rWind is working again!

 Yep, after several months fallen, rWind R package is working again! I'm sorry, but I'm too busy lately with my PhD dissertation and I have not all the time that I'd like to improve rWind 😞. The problem was due to the change of the URL to GFS server, and it was solved thanks to a pull request in GitHub... Thank you very much! The new version on CRAN (1.1.5) comes with some new functions to download sea current data from the OSCAR database! New examples with sea and wind currents coming soon...  also on Twitter @javi_ferlop