long time ago...: 2018

domingo, 25 de noviembre de 2018

Plotting wind highways using rWind

Hi there!

Our manuscript about rWind R package has been recently accepted for publication in Ecography! As you know, rWind is a tool used to download and manage wind data, with some utilities that make easy to include wind information in ecological or evolutionary analyses (or others!).

Though there are several examples in the Supporting material of the publication, and others in the vignette included in the CRAN version, here you have a full tutorial with the main utilities of rWind... enjoy!

Computing wind connectivity of Azores Archipelago

In this example, we will compute wind connectivity of Azores Islands with different points of Europe, Africa and North America during June 2018. I will divide each task of this tutorial in different sections for clarification.

Downloading wind time series using lubridate and wind.dl_2()

In the last version of rWind (1.0.4), we can use lubridate to create a list of dates which will be used latter with wind.dl_2 to download wind time series. The object obtained with wind.dl_2 is an "rWind_series" "list". This object can be easily used with other rWind functions as we'll see later.

 # Make sure that you have the last version of rWind  
   
 devtools::install_github("jabiologo/rWind")  
 library(rWind)  
 library(lubridate)  
   
 # Here, we use ymd_hms from lubridate package to create a sequence of dates  
   
 dt <- seq(ymd_hms(paste(2018,6,1,12,00,00, sep="-")),  
      ymd_hms(paste(2018,6,30,12,00,00, sep="-")),by="1 days")  
   
 # Now we can use wind.dl_2 with this sequence of dates. Have into account  
 # that it could take a while, they are 30 datasets and it's a big area.   
   
 ww <- wind.dl_2(dt,-85,5,20,60)

Obtaining maximum wind speed during the time series with dplyr

We can use the rWind function called "tidy" to play with the "rWind_series" "list" object to obtain, for example, the maximum speed reported for each cell in the study area during the time series.

 # After tidy our wind data, we can use dplyr utilities to easily  
 # obtain several metrics from our wind data. Notice that wind average  
 # is a special case, and it will be discussed later  
   
 t_ww <- tidy(ww)  
   
 library(dplyr)  
 g_ww <- t_ww %>% group_by(lat, lon)  
   
 # Now, we select only the maximum speed values for each coordinate  
   
 max_ww <- g_ww %>% summarise(speed = max(speed))  
 maxw <- cbind(max_ww$lon, max_ww$lat, max_ww$speed)  
 head(maxw)  
   
 # We can also check the maximum speed value in the whole study area   
   
 max(maxw[,3])  
   
 # Finally, we can transform this data into a raster layer  
   
 rmax <- rasterFromXYZ(maxw)  
 acol <- colorRampPalette(c("white", "blue", "darkblue"))  
 plot(rmax, col=acol(1000), main= "Maximum wind speed reported",  
     xlab="Longitude", ylab="Lattitude")  
   
 library(rworldmap)  
 lines(getMap(resolution = "high"), lwd=2)

Computing averages of wind speeds and directions

Since wind data is a vectorial magnitude given in two components (U and V), to compute speed and direction averages we should use both vectors. To make it easier, rWind includes wind.mean function. Notice that wind.mean is specially developed to work with wind.dl_2 outputs, so it will require "rWind_series" object as input.

 # We can easily use wind.mean() function  
   
 w_mean <- wind.mean(ww)  
   
 # And then use wind2raster directly to create a raster layer  
   
 r_mean <- wind2raster(w_mean)  

 # We can plot a subset of this raster around Azores Islands.  
 # Using "arrowDir" we can include arrows for wind direction with   
 # "Arrowhead" function from "shape" R package  
   
 plot(r_mean$wind.speed, main= "Wind speed and direction average",
      col=acol(1000), xlim=c(-38,-18), ylim=c(33,43))  
 alpha <- arrowDir(w_mean)  
 library(shape)  
 Arrowhead(w_mean$lon, w_mean$lat, angle=alpha,   
      arr.length = 0.2, arr.type="curved")  
 lines(getMap(resolution = "low"), lwd=4)

Wind connectivity between Azores and mainland

To complete this tutorial, we will use the downloaded wind data to compute wind connectivity between several points in mainland and Azores Archipelago. This kind of measures can be useful in order to establish theories about island colonization by plants or other organisms that could be influenced by wind currents. We will also use this connectivity to compute least cost paths between two locations, in order to virtually plot a wind highway connecting mainland with Azores islands.

 # First, we define our locations. 7 locations in the East   
 # and 7 in the West. The last one is a point in the Azores islands  
   
 loc <- matrix(c(-7, -4, -1, -9, -6,-10,-15,   
         -60, -56,-63,-76,-76,-81,-80, -27,  
         55, 50, 45, 40, 35, 30, 25,   
         55, 50, 45, 40, 35, 30, 25, 39), 15, 2)  
   
 colnames(loc) <- c("lon", "lat")  
 rownames(loc) <- c(seq(1,14,1), "Azores")  
   
 # Check how "loc" is looking   
   
 tail(loc)  
   
 # You can plot them  
 #plot(r_mean$wind.speed, col=acol(1000))  
 #lines(getMap(resolution = "low"), lwd=2)  
 #points(loc, col="red4", pch = 19, cex = 2)

Now, we'll execute the next tasks:

We'll use wind2raster to obtain raster layers of wind direction and speed of each day in the time series.
Then, for each day, we'll compute the conductance matrix, a matrix with connectivity values between every single cell of the entire study area.
Later, with each conductance matrix, we'll compute the cost between our locations and will store them in a cost_list object.
Finally, for each day, if the Cost is not Infinite, we'll get the shortest path between two selected locations, and we'll store it into a “paths” object.

 # 1) Raster layers from wind time series  
   
 layers <- wind2raster(ww)  
   
 # 2) Conductance from wind direction and speed  
   
 Conductance <- flow.dispersion(layers, type="passive",  
                 output="transitionLayer")  
   
 # 3) and 4) Using a loop to compute cost between  
 # locations and least cost paths between location 11  
 # and Azores islands (as an example).  
   
 cost_list <- array(NA_real_, dim=c(15,15,30))  
 paths <- list(1:30)  
   
 library(gdistance)  
   
 for (i in 1:30){  
  cost_list[,,i] <- costDistance(Conductance[[i]],loc)  
  if (costDistance(Conductance[[i]],loc[11,] , loc[15,]) != Inf){  
   paths[[i]] <- shortestPath(Conductance[[i]], loc[11,], loc[15,],  
                 output="SpatialLines")  
   print(i)  
  }  
 }

Now, we will manage a little bit the results to plot them in a cool way.

 # Transform costs in connectivity  
   
 connectivity <- 1/cost_list  
   
 # Obtain connectivity averages  
   
 conn_avg <- apply(connectivity, c(1, 2), mean, na.rm = TRUE)  
   
 # Here we filter only connectivity from mainland to Azores  
 # As connectivity has arbitrary magnitude and units, we will   
 # multiply connectivity values 1000 times to avoid decimal zeros.  
   
 conn_azores <- conn_avg[,15]  
 conn_azores <- conn_azores *1000  
   
 # Finally, we can plot connectivity values relative to the width  
 # of the connections between mainland and Azores. You can power  
 # the values to increase the differences between them.  
   
 plot(r_mean$wind.speed, main= "Connectivity from mainland to Azores",
     col=acol(1000))  
 lines(getMap(resolution = "low"), lwd=1)  
 for (i in 1:14){  
  lines(rbind(loc[i,], loc[15,]), lwd = conn_azores[i]^3, col="red4")  
 }  
 points(loc, col="red", pch = 19, cex = 1.5)  
 text( loc+2, labels=c(round(conn_azores[1:14], 2),"Azores"),  
    col="white", cex= 1.5, font=2)

Last but not least, we can use the shortest paths stored for each day to plot a density heatmap. This can help us to visualize wind corridors across our study area (our wind highways!).

 # We'll use spatstat library to convert the shortest  
 # paths lines into an psp object.  
   
 library(spatstat)  
   
 paths_clean <- paths[!sapply(paths, is.null)]  
 paths_merged <- paths_clean[[1]]  
 for (h in 2:length(paths_clean)) {  
  paths_merged <- rbind(paths_merged,paths_clean[[h]])  
 }  
 paths_psp <- as(paths_merged, "psp")  
   
 # Now, we can obtain a kernel density of the paths  
 # and convert it into a raster layer that can be plotted.  
 # You can play with sigma value to modify the smoothing.  
   
 lines_kernel <- density(paths_psp, sigma=1, dimyx=c(350,410))  
 kernel <- raster(lines_kernel)  
 #kernel[kernel<(maxValue(kernel)*0.1)]<-NA  
   
 ext <- extent(r_mean$wind.speed)  
 kernel <- extend(kernel, ext)  
 kernel[is.na(kernel)] <- minValue(kernel)  
   
 acol <- colorRampPalette(c("grey40","darkblue","red2",
                               "orange","yellow","white"))  
 plot(kernel, col=acol(1000), zlim=c(-0.01,5),  
       main = "Least cost paths density", xlab="Longitude",  
       ylab="Lattitude")  
 lines(getMap(resolution = "high"), lwd=2)

Isn't it cool?

That's all for now!

lunes, 23 de julio de 2018

Antipredator behavior with R (or why wildebeest should stay together)

Hi there!

In 2010, when I was studying my Biology Degree at Universidad Complutense in Madrid, I fell in love with a documentary miniseries called Great Migrations (National Geographic). Their episodes talk about awesome migrations of animals around the globe. One of these chapters is “Great Migrations: Science Of Migrations”. It shows how scientists study the patterns and processes of animal migration. One of these researchers is John Fryxell, from Gelph University in Canada. John explains how mathematical models can help to understand movement patterns. Through simulations, he shows how wildebeests maintaining a clustered movement pattern can more effectively avoid to be hunted by a virtual lion (here is the Spanish version in YouTube). I have to confess that each time I see those images, I think that I would be completely fulfilled with this kind of jobs.

I modified a little bit my previous “go” function to reproduce the experiment shown by John. This is a small tribute to this documentary series and to all those people who inspire and motivate us and arouse our curiosity, as John.

You can find the code to perform these simulations with R below. To perform the gif of this post, you can follow this procedures. And if you want to be updated with my posts, find me on my new Twitter account @javi_ferlop

That is all, enjoy!

 # First, we prepare packages and raster layers to be used later. We also load  
 # the species class to define the hunter inital location and his environmental  
 # optimum (grass cover in this case), as in previous posts.  
   
 library (raster)  
 library(dismo)  
   
 gc <- raster(nrows = 100, ncols = 100, xmn = 0, xmx = 100, ymn = 0, ymx = 100)  
 gc[] <- runif(10000, -80, 180)  
 gc <- focal(gc, w = matrix(1, 5, 5), mean)  
 gc <- focal(gc, w = matrix(1, 5, 5), mean)  
   
 species <- setClass("species", slots = c(x = "numeric", y = "numeric",   
                      opt = "numeric"))  
   
 # Then we will load the hunt function. This is pretty similar to my previous  
 # "go" function. However, I added a "preys" attribute, a set of locations  
 # of available preys. After each movement of the hunter, the function checks  
 # if the hunter is in the same cell as any prey. If yes, it records this  
 # location and delete this prey from the available preys. The output is a 4  
 # column dataframe: the first two are the location of the hunter at each  
 # iteration. The last two show if there is any capture at this iteration.   
   
 hunt <- function (sp, env, n, sigma, theta_x, alpha_x, theta_y, alpha_y, preys) {  
  track <- data.frame() #create an empty data.frame to store each step coordinates  
  track[1,1] <- sp@x  #the first position will be the initial position of the specimen defined by the species class  
  track[1,2] <- sp@y  #the first position will be the initial position of the specimen defined by the species class  
  hunted <- data.frame()#create an empty data.frame to store hunted preys  
    
    
  for (step in 2:n) {  
     
   #update preys locations   
     
   prey_cells <- cellFromXY(env, as.matrix(preys + step))  
     
   #First, the function searches in the adjacent cells what have the value that best fits with the species requirements  
   lon_candidate<--9999  
   lat_candidate<--9999  
   #randomization/atraction routine  
   while ( is.na(extract(env, matrix(c(lon_candidate,lat_candidate),1,2)))) {  
    lon_candidate <- track[step-1,1]+ (sigma * rnorm(1)) + (alpha_x * ( theta_x - track[step-1,1]))  
    lat_candidate <- track[step-1,2]+ (sigma * rnorm(1)) + (alpha_y * ( theta_y - track[step-1,2]))  
   }  
     
   neig <- adjacent(env,   
            cellFromXY(env, matrix(c(lon_candidate,  
                        lat_candidate), 1,2)),   
            directions=8, pairs=FALSE )  
     
   options <- data.frame()  
   for (i in 1:length(neig)){  
    options[i,1]<-neig[i]  
    options[i,2]<- sp@opt - env[neig[i]]  
   }  
   option <- c(na.omit(options[abs(options$V2) == min(abs(na.omit(options$V2))), 1 ]),   
         na.omit(options[abs(options$V2) == min(abs(na.omit(options$V2))), 1 ]))  
     
   new_cell <- sample(option,1)  
   new_coords <- xyFromCell(env,new_cell)  
     
   track[step,1] <- new_coords[1]  
   track[step,2] <- new_coords[2]  
     
   if (new_cell %in% prey_cells) {  
    hunted[step,1] <- new_coords[1]  
    hunted[step,2] <- new_coords[2]  
    preys <- preys[-match(new_cell,prey_cells),]  
   }  
   else {  
    hunted[step,1] <- NA  
    hunted[step,2] <- NA  
   }  
  }  
  return(cbind(track,hunted))  
 }  
   
 # Later we define our predator, in this case, a lion! The inital possition will  
 # be at the middle of the space (50,50). We will also select the environmental  
 # optimum of our lion, 60% of grass cover (our lion hates bare ground xD)  
   
 lion <- species(x= 50, y =50, opt= 60)  
   
 # Now, we will set the locations of a bunch of wildebeests using "randomPoints"   
 # from "dismo" package. As our wildebeests will be running around, we will   
 # initially locate them at the bottom left.  
   
 r <- raster(nrows=500, ncols=500, xmn=-500, xmx=0, ymn=-500,ymx=0)  
 wb <- randomPoints(r, 10000)  
   
 # Here we have our wildebeests ready to run!  
   
 plot(wb)  
   
 # Let start the hunting!   
   
 hunting <- hunt(lion, gc, 600, 2, 0, 0, 0, 0, wb)  
   
 # In "hunting" object is stored each location of our lion during the 600   
 # iterations, as well as each time she catches a wildebeest. To see only the   
 # catches, you can use:  
   
 na.omit(hunting)  
   
 # Pretty awesome, right? But, what happens if we group our wildebeests in   
 # clusters? To do it, we will define another small function that creates   
 # clusters of points:  
   
 clus <- function(n,m1,m2){  
  df<-data.frame()  
  for (i in 1:n){  
     
   a <- rnorm(100,m1[i],1)  
   b <- rnorm(100,m2[i],1)  
   df <- rbind(df,data.frame(a,b))  
  }  
    
  return(df)  
 }  
   
 wb_clustered <- clus(100, runif(100,-500,0),runif(100,-500,0))  
   
 # Look at the wildebeests locations now!  
   
 plot(wb_clustered)  
   
 # Ok, let's go again!  
   
 hunting_clustered <- hunt(lion, gc, 600, 2, 0, 0, 0, 0, wb_clustered)  
   
 # Let's chek how many catches he have now:  
   
 na.omit(hunting_clustered)  
   
 # And comparing both strategies...  
   
 nrow(na.omit(hunting))  
 nrow(na.omit(hunting_clustered))  
   
 # ...looks like wildebeests should stay together!!  
 # You can run these simulations n times to see if the results are robust.  
   
 # Finally we can print each iteration to make a cool gif ;)  
   
 for (i in 1:600) {  
  id <- sprintf("%03d", i)  
  png(paste("lion1_",id,".png", sep=""), width=1300, height=690, units="px",  
    pointsize = 16)  
    
  par(mfrow=c(1,2))  
    
  plot(gc, xlim = c(6,94), ylim = c(6,94), main="random", legend=FALSE)  
  points(hunting[i,1:2], cex=1.5, col="red", pch = 0, lwd =3)  
  points(hunting[1:i,3:4], cex=1, col="blue", pch = 3, lwd =3)  
  points (wb +i, pch = 20)  
    
  plot(gc, xlim = c(6,94), ylim = c(6,94), main="clustered", legend=FALSE)  
  points(hunting_clustered[i,1:2], cex=1.5, col="red", pch = 0, lwd =3)  
  points(hunting_clustered[1:i,3:4], cex=1, col="blue", pch = 3, lwd =3)  
  points (wb_clustered +i, pch = 20)  
    
  dev.off()  
 }

miércoles, 4 de julio de 2018

New rWind release on CRAN! (v1.0.2)

Hi there!

Just a few lines to inform you about the new release of rWind R package (v1.0.2). This version have several new features. Here you have an example of one of them, the function wind.dl_2 to download time series of wind data. In this example we create a bunch of PNG files to be converted later into a GIF of wind speed of east Asia region.

Enjoy!

 # We will create a gif of wind speed for east Asia during the end of June (2018)  
   
 # First, we should load the packages that we will use. Use install.packages() to   
 # download and install the new version of rWind (v1.0.2)  
   
 install.package("rWind")  
   
 library(rWind)  
   
 library(fields)  
 library(shape)  
 library(rworldmap)  
 library(lubridate)  
   
 # Now, we use lubridate package to create a sequence of dates/times (each three  
 # hours)  
   
 dt <- seq(ymd_hms(paste(2018,6,25,00,00,00, sep="-")),  
      ymd_hms(paste(2018,7,4,21,00,00, sep="-")),by="3 hours")  
   
 # Now, we use the new function wind.dl_2 to download the whole time series of  
 # wind data. We use the "dt" object created with lubridate to provide the input   
 # to wind.dl_2. Since it's a large area and many days, it could take a while...  
   
 wind_series <- wind.dl_2(dt,90,150,5,40)  
   
 # Next, we can use wind2raster function from rWind package directly over the   
 # output from wind.dl_2, it has been adapted to work with this lists of wind  
 # data.  
   
 wind_series_layer <- wind2raster(wind_series)  
   
 # Finally, we will create a bunch of PNG files to be converted later in a GIF  
   
 id<-0  
 for (i in 1:72) {  
  id <- sprintf("%03d", i)  
  png(paste("asia",id,".png", sep=""), width=1000, height=600, units="px",  
    pointsize=18)  
  image.plot(wind_series_layer[[i]]$wind.speed, col=bpy.colors(1000),  
        zlim=c(0,18), main =wind_series[[i]]$time[1])  
  lines(getMap(resolution = "low"), lwd=3)  
  dev.off()  
 }

martes, 1 de mayo de 2018

Simulating animal movements and habitat use

Hi there!

I was training some ways to simulate animal (or other organisms) movements having into account habitat suitability. To do this, I used my previous eWalk model as the underlying process to simulate random or directional walks. This model is based on Brownian / Ornstein–Uhlenbeck process. You can find more about eWalk model here!

Today, I will add one more element to this movement simulations. In this case, we will have into account the habitat or environmental preferences of the simulated species, to perform a simulation like this:

First, we will create a raster layer as a random environmental variable, for example tree cover.

   
 library (raster)  
 library (dismo)  
   
 tc <- raster(nrows=100, ncols=100, xmn=0, xmx=100, ymn=0,ymx=100)  
 tc[] <- runif(10000, -80, 180)  
 tc <- focal(tc, w=matrix(1, 5, 5), mean)  
 tc <- focal(tc, w=matrix(1, 5, 5), mean)  
 plot(tc)

Second, we will define the species class. The species will be defined by their position (coordinates), and their optimum for the environmental variable.

   
 species <- setClass("species", slots=c(x="numeric", y="numeric", opt="numeric"))

Here we will define the Red deer species as a specimen in the coordinates (50,50) and an optimum of 80 for the environmental variable (tree cover). In the same way, we will define the Egyptian mongoose as a specimen in the coordinates (50,50) and an optimum of 30 for the tree cover variable.

   
 Red_deer <- species(x= 50, y =50, opt= 90)  
 Egyptian_mongoose <- species(x= 50, y =50, opt= 30)

Now, we will load the "go" function (I do not have a name for it yet). It require a species (sp), a raster layer with any environmental variable (env), number of iterations (n), a Brownian motion parameter (that is, how random is the movement of your species), a geographical optimum (the wanted destination of your species theta_x and theta_y), and the attraction strength or "interest" of the species to get this position (alpha_x and alpha_y). The syntaxis should be something like this:

   
 path <- go (sp, env, n, sigma, theta_x, alpha_x, theta_y, alpha_y)

Here is the function to load (I will comment the function in a future post):

   
 go <- function (sp, env, n, sigma, theta_x, alpha_x, theta_y, alpha_y) {  
  track <- data.frame()  
  track[1,1] <- sp@x  
  track[1,2] <- sp@y  
  for (step in 2:n) {  
   neig <- adjacent(env,   
            cellFromXY(env, matrix(c(track[step-1,1],  
                        track[step-1,2]), 1,2)),   
            directions=8, pairs=FALSE )  
   options <- data.frame()  
   for (i in 1:length(neig)){  
    options[i,1]<-neig[i]  
    options[i,2]<- sp@opt - env[neig[i]]  
   }  
   option <- c(options[abs(na.omit(options$V2)) == min(abs(na.omit(options$V2))), 1 ],   
         options[abs(na.omit(options$V2)) == min(abs(na.omit(options$V2))), 1 ])  
   new_cell <- sample(option,1)  
   new_coords <- xyFromCell(env,new_cell)  
   lon_candidate<--9999  
   lat_candidate<--9999  
     
   while ( is.na(extract(env, matrix(c(lon_candidate,lat_candidate),1,2)))) {  
    lon_candidate <- new_coords[1]+ (sigma * rnorm(1)) + (alpha_x * ( theta_x - new_coords[1]))  
    lat_candidate <- new_coords[2]+ (sigma * rnorm(1)) + (alpha_y * ( theta_y - new_coords[2]))  
   }  
   track[step,1] <- lon_candidate  
   track[step,2] <- lat_candidate  
  }  
  return(track)  
 }

Well, now we can perform a simple experiment with our two specimens. We will simulate random movement of these two species having into account their environmental optimums. The "go" function will return us the track or the path followed by each specimen (coordinates by each step).

   
 deer_simul <- go (Red_deer, tc, 100, 2, 90, 0, 90, 0)  
 mongoose_simul <- go (Egyptian_mongoose, tc, 100, 2, 90, 0, 90, 0)

We can plot the paths...

   
 plot(tc)  
 lines(deer_simul, lwd=1.5, col="red")  
 points(deer_simul, cex=0.3, col="red")  
 lines(mongoose_simul, lwd=1.5, col="blue")  
 points(mongoose_simul, cex=0.3, col="blue")  
 legend("topleft", legend=c("deer","mongoose"), col=c("red","blue"),  
     lty=c(1,1), lwd=c(2,2))

To test if each species is actually "searching" their environmental optimum, we can extract the environmental values by step for each species and plot their density distributions.

   
 plot(density(extract(tc, deer_simul)),lwd=3, col="red", xlim=c(20,80),   
    ylim=c(0,max(c(density(extract(tc, deer_simul))$y,  
           density(extract(tc, mongoose_simul))$y))),  
    main="locations density distribution", xlab="tree cover")  
 lines(density(extract(tc, mongoose_simul)),lwd=3, col="blue")  
 legend("topleft", legend=c("deer","mongoose"), col=c("red","blue"),  
     lty=c(1,1), lwd=c(3,3))

So, we can see that the deer is actually using patches with a higher value of tree cover than the mongoose... our simulation worked! You can use the code of this post to perform a GIF like the one above.

That's all I have to say about this for now... In the next posts we will simulate more animal movements and migrations!

PD: Let me show you a nice song about the mongoose...

miércoles, 14 de marzo de 2018

Nice ggplot with sad data: something happens with women in science

Last March 8, millions of women in more than 170 countries around the world joined street protests calling for "a society free of sexist oppression, exploitation and violence". Spanish strike was one of the most numerous, where around 5.3 million women joined the strike. Today, I will use some data (download) regarding the scientific career in Spain (2016) to make a plot using ggplot2 and discuss about it. Here you have the graph and the code:

 library(ggplot2)  
 library(scales)  
 dat <- read.csv("wis.csv")  
 dat$pos <-  
  factor(dat$pos, levels = c("pre", "post", "ryc", "ct", "ic" , "pi"))  
 ggplot(dat, aes(x = pos, y = per)) +  
  geom_line(size = 3, aes(group = sex, color = sex)) +  
  geom_point() +  
  geom_text(aes(label = round(per * 100, 1)), vjust = -1, size = 4) +  
  labs(  
   title = "Women in Science \n Spanish National Research Council",  
   x = "research stage",  
   y = "sex ratio",  
   color = " \n",  
   subtitle = "(2016)"  
  ) +  
  scale_color_manual(labels = c("women", "men"),  
            values = c("purple", "orange")) +  
  scale_x_discrete(  
   breaks = c("pre", "post", "ryc", "ct", "ic" , "pi"),  
   labels = c(  
    "Predoctoral",  
    "Postdoctoral",  
    "Ramón y Cajal",  
    "Científico\nTitular",  
    "Investigador\nCientífico",  
    "Profesor de\nInvestigación"  
   )  
  ) +  
  scale_y_continuous(labels = percent) +  
  theme(  
   text = element_text(size = 15),  
   legend.text = element_text(size = 15),  
   axis.text.x = element_text(face = "bold", size = 11, angle = 30),  
   axis.title = element_text(size = 14, face = "bold"),  
   plot.title = element_text(hjust = 0.5, face = "bold", size = 20),  
   plot.subtitle = element_text(hjust = 0.5, size = 19)  
  )

In X axis, we can see the stages at the Spanish National Research Council (known in Spanish as CSIC) positions, from young PhD candidates (left - “Predoctoral”) to the highest level, Research Professor (right - “Profesor de Investigación”). It’s easy to understand: in Spain, there are more women starting a scientific career than men, but only few of them get to the highest positions in CSIC.

We usually call this the “scissors graph”, since the two curves soon cross each other changing the trend shown at the beginning. For the last stages this imbalance between males and females can be due to the age (mean age of Research Professor in CSIC is around 58 years old, and things have changed a lot in Spain since 30 years ago). However, the graph shows that for the earlier stages this imbalance is still being a problem: just from PhD candidate stage to first postdoctoral position, women lose around 15% positions in favor of men. And this is not only happening in Spain. Only active politics and special measures orientated to these first stages could change the trend for the highest positions in the future!

If you want to know a little bit more about women in science in Spain, here you have a poster made by some friends from the Ecology Department in Universidad de Alcalá.