sábado, 26 de noviembre de 2022

Getting coordinates (lon,lat) from road name and kilometer in Spain


 Hi there!

 Early this year we published a research about getting animal abundance estimates from wildlife - vehicles collisions in Ecography journal:

Can we model distribution of population abundance from wildlife–vehicles collision data?


The first thing we had to deal with was to obtain geographical coordinates (latitude and longitude) from a database with the name of the roads and the kilometer point. At the end, I wrote a small chunk of code to do that by using the Transport Network from the Spanish Geographic Institute. I first downloaded all the kilometer point datasets (by provinces) and I merged them all in a single file. You can find this file here (but it's old and current datasets should be updated, so I recommend you to download it again from the original sources). Then I used R language to create a routine to "search" the coordinates that better matched my points.

I recently updated my code, warping it in a function and I created a toy-dataset with 5 points to show you how it works (see below, comments in Spanish).


 # Guardamos las URLs de los datos almacenados en mi Github (https://github.com/jabiologo/)  
 roadData <- "https://github.com/jabiologo/roads/blob/main/roadData.RData?raw=true"  
 # Cargamos la base de datos del IGM con los puntos kilométricos  
 # Cargamos la base de datos con nuestros nombres de carreteras y kilómetros  
 # Base de datos del IGN  
 # Mis puntos para los cuales quier las coordenadas  
 # Nótese que si se quieren utilizar unos datos propios con este script,  
 # estos deben tener el mismo formato que el objeto "puntos"  

 # Cargamos la función  
 pkm2xy <- function(events, database){  
  # Almacenaremos todas las carreteras que no se encuentren en nuestra base de datos  
  noMatch <- events[!(events[,1] %in% database$Nombre),]  
  # Filtramos nuestros datos  
  events <- events[events[,1] %in% database$Nombre,]  
  # Creamos una serie de columas para almacenar las coordenadas y otra info  
  events$X <- NA  
  events$Y <- NA  
  events$error <- NA  
  events$RoadAssigned <- NA  
  events$PkmAssigned <- NA  
  # Mediante un bucle iremos obteniendo nuestras coordenadas  
  for(i in 1:nrow(events)){  
   # Seleccionamos de nuestra base de datos aquel punto kilométrico que se  
   # encuentre más cerca de nuestro kilómetro  
   sel <- database %>% filter(Nombre == events[i,1]) %>%   
    filter(abs(numero - events[i,2]) == min(abs(numero - events[i,2])))  
   # Almacenamos las coordenadas  
   events[i,3:4] <- sel[1,1:2]  
   # Calculamos el error entre el punto kilométrico obtenido y el deseado  
   events[i,5] <- abs(sel$numero[1] - events[i,2])  
   # Almacenamos el nombre de la carretera y el punto kilométrico que hemos   
   # seleccionado en nuestra base de datos del IGN   
   events[i,6] <- sel$Nombre[1]  
   events[i,7] <- sel$numero[1]  
  # Retornamos una lista con dos elementos:  
  # Aquellos puntos para los cuales se han conseguido las coordenadas  
  # Aquellos puntos para los que no se ha encontrado la carretera  
 # Corremos nuestra función con el fichero de ejemplo  


As you can see, there was a road (CR-P-7111) that was not present in our database, so we couldn't find the coordinates for this point. However, we obtained the coordinates for the rest points. The function also show you the error in kilometers between the point and the latitude/longitude obtained. I recommend to perform a random test to visually check that everything was right, that is, if your dataset is large, select a number of points and manually check with Google Maps or similar if the function worked well.

You can find the code and the data also in my GitHub repository.


Hope it's useful! 



Fernández‐López, J., Blanco‐Aguiar, J. A., Vicente, J., & Acevedo, P. (2022). Can we model distribution of population abundance from wildlife–vehicles collision data?. Ecography, 2022(5), e06113.

lunes, 28 de junio de 2021

Missing Migrants, tracking human deaths along migratory routes

 Hi there,

Several years ago I received the book “Como si nunca hubieran sido” as a Christmas present. In this poem, Javier Gallego and Juan Gallego talk about the real drama suffered by immigrants trying to reach Europe through the Mediterranean route. The current pandemic situation has overshadowed many problems that affect millions of people each year. Missing Migrants Project tracks incidents involving migrants, including refugees and asylum-seekers, who have died or gone missing in the process of migration towards an international destination. It shows the situation of many people who are forced to emigrate even during the pandemic time. And it also helps me to contextualize my first world problems. 


Here you have a simple analyses about the human migration fatalities during the last years in the Mediterranean corridor using R. 


 # Let's download and read the data from the Missing Migrants Project web site  
 url <- "https://missingmigrants.iom.int/global-figures/all/csv"  
 mm <- read.csv("/home/javifl/Descargas/MissingMigrants-Global-2021-06-27T21-15-07.csv")  
 # Filter the data for Mediterranean region  
 mm <- mm %>% filter(Region == "Mediterranean") %>% filter(Migration.Route != "")  
 mm$lon <- as.numeric(unlist(map(strsplit(mm$Location.Coordinates, ", "), 2)))  
 mm$lat <- as.numeric(unlist(map(strsplit(mm$Location.Coordinates, ", "), 1)))  
 mm$date <- mdy(mm$Reported.Date)  
 # plot the incidence of fatalities  
 mmInc <- incidence(mm$date, interval = 30, groups = mm$Migration.Route)  
 plot(mmInc, labels_week = F, stack = TRUE, border = "grey") + 
      scale_x_date(labels = date_format("%B %Y"))  


We can plot the monthly incidence of accidents involving migrants using the R package incidence. Here you can see how during the first months of 2020 the migration fatalities decreased, probably due to the pandemic impact. However, human migration incidents increased during the second half of 2020 despite of pandemic restrictions in West and Central Mediterranean routes.


Finally, we can plot monthly incidents using ggmap R package.

 months <- seq(ymd("2014-01-1"),ymd(max(mm$date)), by = '1 month')  
 myLocation<-c(min(mm$lon)-2.5, min(mm$lat)-2, max(mm$lon)+1, max(mm$lat)+1)  
 myMap<-get_map(location=myLocation, source="stamen", maptype="watercolor", crop=FALSE)  
 for (i in 1:length(months)-1){  
  int <- interval(months[i], months[i+1])  
  newmm <- mm[mm$date %within% int,]  
  id <- sprintf("%02d", i)  
  if (nrow(newmm) > 0){  
   png(paste("mm",id,".png", sep=""), width=900, height=610, units="px",    
   print(ggmap(myMap) +   
       geom_point(aes(x=lon, y=lat, size = 2), data=newmm, col="orange", alpha=0.5) +  
       #geom_density2d(aes(x=lon, y=lat), data=newmm, size = 0.3) +   
       #stat_density2d(data = newmm,  
       #        aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), size = 0.01,  
       #        bins = 16, geom = "polygon") + scale_fill_gradient(low = "green", high = "red") +  
       scale_alpha(range = c(0, 0.2), guide = FALSE) +  
       theme(legend.position = "none", plot.title = element_text(size = 25)) +  
       ggtitle(paste(year(months[i]), month(months[i], label = T, abbr = F), sep=" - ")))  

Stay safe!

"No soy un número ni parte de una cifra 

aunque se paga por igual la misma tarifa..."

viernes, 22 de enero de 2021

Are betting houses randomly distributed in Madrid? Point pattern analysis in R

Hi there!

  Last December 13th there was a demonstration in Carabanchel, my home neighborhood in Madrid (Spain) against the proliferation of betting houses (small casinos where you can get cheap drinks or coffees while you spend your money betting or gambling). Many neighborhood associations usually complain against this kind of places arguing they bait young people with very low prices, who can develop gambling addiction in the future. Moreover some organizations ensure that the proliferation of this kind of places is focused in working class neighborhoods with low incomes and high unemployed rates (see this great report for more information about that - in Spanish). So, how could we know if betting houses are randomly distributed in Madrid?

  Imagine that you have a 25x25 (625 cells) square grid and you want to scatter 2500 points. If the points are randomly distributed, they will have the same probability of occupy whatever cell. That is, the first point will have a probability of 1/625 of being in cell number 1, the same probability (1/625) probability of being at cell number 2, and so on. Thus, if we have 2500 points in 625 cells, each cell would have 2500/625 = 4 points in average. This is the meaning of the lambda from a Poisson distribution, the mean or the most probable value.

  Imagine now that we have the same points in the same grid, but now they are clustered in 3 groups. In this dummy example, we can easily guess that they are not randomly distributed by just observing the point pattern. But how could we demonstrate mathematically/numerically they are not randomly distributed?

  Well, given we know the shape of the histogram (distribution) that should have a completely random distribution (in our case a Poisson(λ=4)), an easy and straightforward way to demonstrate mathematically that the second point patter is not random at all would be to compare both distributions using for example a Kolmogorov–Smirnov test in R.

  Lets do the same with betting houses in Madrid city (Spain). Betting places locations can be downloaded from the city government website, as well as other data such as neighborhood limits or family income per neighborhood.I prepared a .zip with the needed files which can be downloaded from here.

There are currently 409 betting houses. We will divide the city in a grid of 18x18 cells of 1 km size as follows


 # Load required packages  
 # Load data  
 nb <- readOGR("/home/javifl/gambling/SHP_ETRS89/barrios_income_pop.shp")  
 bh <- read.csv("/home/javifl/gambling/betting_house.csv")  
 # Create the grid  
 bh.sp <- SpatialPointsDataFrame(coords=bh[,1:2], data=bh)  
 ext <- extent(c(min(bh$x),min(bh$x)+18000,min(bh$y),min(bh$y)+18000))  
 landscape <- raster(ncols=18,nrows=18,ext=ext, vals=1:324)  
 grid <- rasterToPolygons(landscape)  
 # Plot betting houses in Madrid  
 plot(nb, main = "City of Madrid")  
 plot(grid, lwd = 0.4, add = TRUE)  
 points(bh.sp, pch=16, cex=0.7, col="black")  


 Cool. Since Madrid is an heterogeneous city, we will select for our analysis only those cells in which there is one or more betting houses. Then, we'll spread random points in the selected squares to compare both histograms/distributions: random points VS betting houses.

 # Select cells with betting houses  
 grid$bh <- poly.counts(bh.sp, grid)  
 sel.sqr <- (grid[grid$bh > 0,])  
 landscape[grid$bh == 0] <- NA  
 # Plot  
 lines(nb, lwd = 0.2)  
 points(bh.sp, pch=16, cex=0.7, col="black")  
 # Add random points in selected cells  
 rp <- randomPoints(disaggregate(landscape, 100), nrow(bh))  
 points(rp, pch=16, cex=0.7, col = "red" )  
 rp.sp <- SpatialPointsDataFrame(coords=rp[,1:2], data=data.frame(rp))  

 Well done! We have selected 118 cells with at least one betting house and we have scattered 409 random points on them. So, now the last task is to count the number of betting houses and the number of random points at each cell. Then, we could compare both histograms/ distributions, and decide if they present a similar pattern.

 par(mfrow = c(1,2))  
 hist(poly.counts(rp.sp, sel.sqr), right=F, main = "random points",   
 xlab="number of random points by cell")  
 hist(poly.counts(bh.sp, sel.sqr), right=F, main = "betting houses",   
 xlab="number of betting houses by cell")  

As you can see, under a random point pattern the most frequent value is between 3 and 4. Since we have 409 points randomly distributed over 108 cells, we expected to have 409/108 = 3.78 points at each cell. It looks pretty good!

However, we can see that the most frequent value in the betting house histogram is between 0 and 2... This is because there are many empty cells, while there are some of them with high number of betting houses (a clustered pattern). We could compare both distributions mathematically using a Kolmogorov–Smirnov test.

 #Kolmogorov–Smirnov test  
 ks.test(poly.counts(rp.sp, sel.sqr), poly.counts(bh.sp, sel.sqr))  


Bonus Track: so, what variable or variables drive the distribution of betting houses in Madrid? Here you have a clue... but take the data and explore it yourself!



Stay safe!





jueves, 9 de julio de 2020

N-mixture models to estimate animal abundance with R and unmarked

Hi there!

 During the pandemic lockdown I was studying how N-mixture models to estimate animal abundance works. As a result, I wrote this small tutorial in R (Spanish) that helped me to better understand such models. Any correction or suggestion can be made by posting here or at jflopez.bio@gmail.com.

Hope you enjoy it!

viernes, 3 de abril de 2020

Another "flatten the COVID-19 curve" simulation... in R

 Hi there,

This is the best meme I've found during these days...

Well, here it is my "BUT" contribution. Some weeks ago The Washington Post published this simulations about how "social distancing" could help to "flat the curve" of COVID-19 infections. I fell in love with these simulations because their simplicity and explanatory power. Indeed, you can use the pretty similar principles to simulate predator hunt behavior and other movement patterns... I wrote some R code to have a tiny version of them...

 r <- raster(nrows = 100, ncols = 100, xmn = 0, xmx = 100, ymn = 0, ymx = 100)   
 r[] <- 0  
 steps <- 500  
 n <- 100  
 locations <- data.frame(matrix(ncol = n, nrow = steps))  
 pp <- randomPoints(r,n)  
 cell <- cellFromXY(r, pp)  
 infected <- 1:10  
 pob <- 1:n  
 ratio <- data.frame(1:steps, NA)  
 pob_inf <- list()  
 for (j in 1:steps){  
  locations[j,] <- cell  
  for(i in 1:n){  
   cell[i] <- adjacent(r, cell[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell[i], 8))+0.49),0),2]  
  new_inf <- cell %in% cell[infected]  
  infected <- pob[new_inf]  
  ratio[j,2] <- length(infected)  
  pob_inf[[j]] <- infected  
 locations2 <- data.frame(matrix(ncol = n, nrow = steps))  
 cell2 <- cellFromXY(r, pp)  
 infected2 <- 1:10  
 pob2 <- 1:n  
 ratio2 <- data.frame(1:steps, NA)  
 pob_inf2 <- list()  
 for (j in 1:steps){  
  locations2[j,] <- cell2  
  for(i in 1:25){  
   cell2[i] <- adjacent(r, cell2[i], 8)[round(runif(1,0.51,nrow(adjacent(r, cell2[i], 8))+0.49),0),2]  
  new_inf2 <- cell2 %in% cell2[infected2]  
  infected2 <- pob2[new_inf2]  
  ratio2[j,2] <- length(infected2)  
  pob_inf2[[j]] <- infected2  

Let's make some plots to put them together in a GIF and better visualize the results...

 num <- seq(1,500,4)  
 for (p in 1:125){  
  id <- sprintf("%03d", num[p])  
  png(paste("corona_",id,".png", sep=""), width=780, height=800, units="px", pointsize = 15)  
  layout(matrix(c(1,1,2,3),2,2,byrow = TRUE))  
  plot(ratio[1:num[p],],pch=19, xlim = c(0,500), ylim = c(0,100), ylab = "nº of infected",   
     xlab = "time", col = "red", cex.axis = 1.4, cex.lab = 1.4, main = "Infected curves", cex.main = 2)  
  legend("topright", legend = c("free movement", "restricted movement"),lwd = c(4,4), col = c("red","blue"), cex = 1.5 )  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "free movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations[num[p],]))[pob_inf[[num[p]]],], pch = 18, col = "red", cex = 1.4)  
  plot(r, col = "white", legend = FALSE, axes = FALSE, main = "restricted movement", cex.main = 1.8)  
  points(xyFromCell(r,as.numeric(locations2[num[p],])), cex = 1.2, col = "grey40", pch = 18)  
  points(xyFromCell(r,as.numeric(locations2[num[p],]))[pob_inf2[[num[p]]],], pch = 18, col = "blue", cex = 1.4)  


 Stay safe!