Facet ggplots While Preserving Unique Order

Data

This document goes through the step by step of how ggplots, faceted by race, were able to preserve the order of a specific measure, uniqely.

For reference here is what I’m talking about: http://10.64.183.96:3939/LB_explore_prmcScore/.

Let’s continue…

First, we need some data. For these graphs, I used the theRatesByCounty_df. This dataframe includes the age-adjusted incidence rates, counts, and percent late stage at diagnosis for every combination of geography, race/ethnicity (NHW, NHB, Hispanic, all), and lifestage (youth, adult, senior, all).

Table 1: theRatesByCounty_df
crudeRate adjRate lci uci site siteNum place_num sex race count ratePop pop lifeStage pLateStage place
2124.22 2130.03 2055.19 2207.01 All 0 1 P All_Races 3104 29225 31895 Senior 0.4391108 Alachua County
336.99 357.99 344.19 372.27 All 0 1 P All_Races 2733 162199 161909 Adult 0.4548116 Alachua County
19.13 18.45 13.87 24.20 All 0 1 P All_Races 58 60652 62777 Youth 0.5689655 Alachua County
467.72 484.55 471.98 497.40 All 0 1 P All_Races 5895 252076 256581 All 0.4476675 Alachua County
1226.04 1252.24 974.98 1592.43 All 0 1 P Hispanic 71 1158 1326 Senior 0.4788732 Alachua County
125.21 207.45 165.72 256.65 All 0 1 P Hispanic 90 14376 14416 Adult 0.4777778 Alachua County
NaN NaN NaN NaN All 0 1 P Hispanic NaN 6486 7337 Youth NaN Alachua County
152.58 285.88 242.11 336.78 All 0 1 P Hispanic 168 22021 23079 All 0.4761905 Alachua County
2035.59 2067.90 1872.33 2279.43 All 0 1 P NHB 421 4136 4529 Senior 0.4608076 Alachua County
349.59 367.12 335.99 400.73 All 0 1 P NHB 531 30379 30667 Adult 0.5216573 Alachua County

Cool. 😎

library(tidyverse)

# I only want to look at Sylvester catchment area
sylRates <- theRatesByCounty_df %>% 
   filter(place_num == 888)

# I don't care about a particular lifestage nor do I care about all cancers as a
# whole or other cancers aside from the 19 we've marked as important
prevalence <- sylRates %>% 
  filter(lifeStage == "All", !site %in% c("All","Other")) %>%
  #also, for the sake of the graphics, I want the percentage to be pretty
  mutate(pLateStage = round(pLateStage*100,1)) 

It Started With a Plot…

I wanted to start off by showing the counts for the 19 different cancers for all races

f <- prevalence %>%  
  filter(sex == "P", race == "All_Races") %>% #sex == "people"
  ggplot(aes(x = site, y = count))

f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
  coord_flip() +
  geom_text(aes(label = count), hjust = -0.4, color = "black",
            position = position_dodge(width = 1), size = 2) +
  labs(x = "Site",
       y = "Count") +
  ggtitle("Count of 19 Cancers by Site") +
  theme_light() 

That’s cool but, I want to order this by most frequent to less frequent so that means I need to make site a factor and because there are 4 different race/ethnicity categories, I also need to make sure each race has it’s own order. Therefore, I need to group_by() race and arrange() by race and count.

all_sexes <- prevalence %>% 
  filter(sex == "P") %>%   # sex == "people"
  group_by(race) %>% 
  arrange(race, count) %>% 
  mutate(site = factor(site, levels = unique(site)))

#let's try making a plot for just "All_Races"
f <- filter(all_sexes, race == "All_Races") %>% 
  ggplot(aes(x = site, y = count))

f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
  coord_flip() +
  geom_text(aes(label = count), hjust = -0.4, color = "black",
            position = position_dodge(width = 1), size = 2) +
  labs(x = "Site",
       y = "Count") +
  ggtitle("Count of 19 Cancers by Site") +
  theme_light() 

Yippee!!

Now, I need to replicate this plot for not only all 19 cancers, but the top 5 cancers, the top 5 cancers by each race group, the cancers that are caused by viruses, and the cancers caused by viruses for each race group. Not only do I want to look at the counts, but I also want these plots for the percent late stage at diagnosis and for the age-adjusted incidence rates…oh, $#!@.

That’s 15 different plots and if you know me, there’s no way in hell I am copying and pasting this crap 15 times…

Red rover, red rover, please send a function right over :)

This didn’t seem terribly intimidating at first, the only arguments I needed was which measure I wanted to plot: count, adjusted rate, or percent late stage and if I wanted to filter the data on just the cancers that were caused by viruses.

The reason why I say terribly is because I had an inkling there would probably be some NSE involved here because I would need to pipe my data to arrange() on the measure argument, follwed by a ggplot. All tidyverse functions that take unquoted arguments. So, I thought “ok, first I need turn my argument to a quosure, and then unquote when I call it in either my dplyr verbs or ggplot geoms”. Here’s what that looked like:

library(stringr)

makeGraphs <- function(measure, virus = FALSE) {
  
  measureText <- str_to_sentence(measure)
  quo_measure <- enquo(measure)
  
  all_sexes <- prevalence %>% 
    filter(sex == "P") %>% 
    group_by(race) %>% 
    arrange(race, !!quo_measure) %>%
    mutate(site= factor(site, levels = unique(site), ordered = TRUE))
  
  
  if(virus == FALSE){
    f <- filter(all_sexes, race == "All_Races") %>% 
      ggplot(aes(x = site, y = !!quo_measure))
    
    allcancers <- f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
      coord_flip() +
      geom_text(aes(label = !!quo_measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of 19 Cancers by Site ")) +
      theme_light() 
    
  }
}

This did NOT work.

Sigh…after a couple hours of falling down the rabbit hole of Stack Overflow and the tidyeval vignette, I gave up on this option. Fortunately, those hours were not spent in vain. I did stumble accross a potential solution where I use a bit of tidy (thanks to rlang version >= 0.4) and base, as such:

library(stringr)

makeGraphs <- function(measure, virus = FALSE) {
  
  measureText <- str_to_sentence(measure)
  
  all_sexes <- prevalence %>% 
    filter(sex == "P") %>% 
    group_by(race) %>% 
    arrange(race, .data[[measure]]) %>%  # .data is now recognized as a way to refer to the parent data frame, so reference by string works - like in base
    ungroup() %>% 
    mutate(site= factor(site, levels = unique(site), ordered = TRUE))
  
  
  if(virus == FALSE){
    f <- filter(all_sexes, race == "All_Races") %>% 
      ggplot(aes_string(x = "site", y = measure)) 
    
    f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of 19 Cancers by Site ")) +
      theme_light() 
    
  }
}

makeGraphs("count")

WOOP!…progress. Now to turn this into a full function that will produce and save all 15 graphics!

library(stringr)

makeGraphs <- function(measure, virus = FALSE) {
  
  measureText <- str_to_sentence(measure)
  
  all_sexes <- prevalence %>% 
    filter(sex == "P") %>% 
    group_by(race) %>% 
    arrange(race, .data[[measure]]) %>%  
    ungroup() %>% 
    mutate(site= factor(site, levels = unique(site), ordered = TRUE))
  
  
  if(virus == FALSE){
    f <- filter(all_sexes, race == "All_Races") %>% 
      ggplot(aes_string(x = "site", y = measure)) 
    
    allcancers <- f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of 19 Cancers by Site ")) +
      theme_light() 
    
    # ggsave(paste0("./images/", measureText, "_allcancers.png"))
    
    # just breast, colorectal, lung, prostate, cervical
    
    all_sexes_five_cancer <- filter(all_sexes, site %in% c("Prostate", "Colorectal", "Breast", "Cervix", 
                                "Lung & Bronchus"))
    
    
    g <- filter(all_sexes_five_cancer, race == "All_Races") %>%
      ggplot(aes_string(x = "site", y = measure))
    
    topCancer <- g + geom_bar(position = "dodge", stat = "identity", fill = "#F39C12") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Most Prevalent Cancers by Site ")) +
      theme_light()
    
    # ggsave(paste0("./images/", measureText,"_topCancer.png"))
    
    # facet by race
    h <-  ggplot(all_sexes_five_cancer, aes_string(x = "site", y = measure))
    
    topCancerRace <- h + geom_bar(position = "dodge", stat = "identity", fill = "#9B59B6") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.1, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Most Prevalent Cancers by Site & Race")) +
      theme_light() +
      facet_wrap("race", scales = "free")
    
    # ggsave(paste0("./images/", measureText,"_topCancerRace.png"))
    
    list(allcancers, topCancer, topCancerRace)
    
  }else{
    # just cancers caused by viruses
    
    virus <- filter(all_sexes, site %in% c("Anal", "Non-Hodgkin Lymphoma", "Liver", 
                                   "Oral & Oropharynx", "Cervix", 
                                   "Stomach"))
    
    #just all_races
    
    i <- filter(virus, race == "All_Races") %>%
      ggplot(aes_string(x = "site", y = measure))
    
    virusCancer <- i + geom_bar(position = "dodge", stat = "identity", fill = "#F0B27A") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Cancers Caused by Viruses")) +
      theme_light()
    
    # ggsave(paste0("./images/", measureText,"_virusCancer.png"))
    
    # facet by race
    
    i <-  ggplot(virus, aes_string(x = "site", y = measure))
    
    virusCancerRace <- i + geom_bar(position = "dodge", stat = "identity", fill = "#9B59B6") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.1, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Cancers Caused by Viruses by Race")) +
      theme_light() +
      facet_wrap("race", scales = "free")
    
    # ggsave(paste0("./images/", measureText,"_virusCancerRace.png"))
    
    list(virusCancer, virusCancerRace)
  }
}

makeGraphs("count")

Now, this made me excited at first until I noticed that when I facted by race, the order of the levels for site remained fixed, regardless of race. The order for the “All_Races” group was getting recycled. This is where the headache, more cursing, and thoughts of running away and never coming back started to creep in.

How can I facet on race yet still preserve the order of site per each race group???

In the Spongebob narrator voice * A few hours later…. I gave up and went home.

Then in the middle of the night I awoke with an idea…as most annoying problems tend to do. Faceting fixes the y axis so I can’t use this. I will need to create 4 individual plots and join them into 1 for easier comparison.

I was advised to try par() or layout() but then learned that these functions are not compatible with ggpolot - only plot functions.

Again, there’s no way I’m going to copy and paste each ggplot code 4 times - one that filters for each race group…I feel another function coming on.

Better yet, map() to the rescue!

What if I listed the race categories and passed that to a map which contained a function that filtered the data by the race and drew the plot? So let’s try this one mo’gain (don’t judge, I’m a Georgia girl, born and bred).

library(ggpubr) # for arranging the individual race plots to one

makeGraphs <- function(measure, virus = FALSE) {
  
  measureText <- str_to_sentence(measure)
  
  all_sexes <- prevalence %>% 
    filter(sex == "P") %>% 
    group_by(race) %>% 
    arrange(race, .data[[measure]]) %>%  
    ungroup() %>% 
    mutate(site= factor(site, levels = unique(site), ordered = TRUE))
  
  
  if(virus == FALSE){
    f <-  filter(all_sexes, race == "All_Races") %>% 
      ggplot(aes_string(x = "site", y = measure))
    
    allcancers <- f + geom_bar(position = "dodge", stat = "identity", fill = "#2ECC71") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of 19 Cancers by Site ")) +
      theme_light() 
    
    # ggsave(paste0("./images/", measureText, "_allcancers.png"))
    
    # just breast, colorectal, lung, prostate, cervical
    
    all_sexes_five_cancer <- filter(all_sexes, site %in% c("Prostate", "Colorectal", "Breast", "Cervix", 
                                "Lung & Bronchus"))
    
    
    g <-  filter(all_sexes_five_cancer, race == "All_Races") %>% 
      ggplot(aes_string(x = "site", y = measure))
    
    topCancer <- g + geom_bar(position = "dodge", stat = "identity", fill = "#F39C12") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Most Prevalent Cancers by Site ")) +
      theme_light()
    
    # ggsave(paste0("./images/", measureText,"_topCancer.png"))
    
    # facet by race
    races <- list("All_Races", "NHW", "NHB", "Hispanic")
    
    raceGraphs <- map(races, function(x){
      h <-  all_sexes_five_cancer %>% 
        filter(race == x) %>% 
        arrange(race, .data[[measure]]) %>%  
        mutate(site= factor(site, levels = unique(site), ordered = TRUE)) %>% 
        ggplot(aes_string(x = "site", y = measure))
      
      h + geom_bar(position = "dodge", stat = "identity", fill = "#9B59B6") +
        coord_flip() +
        geom_text(aes_string(label = measure), hjust = -0.1, color = "black",
                  position = position_dodge(width = 1), size = 2) +
        labs(x = "Site",
             y = measureText) +
        ggtitle(x) +
        theme_light()+
        theme(plot.title = element_text(size = 10, color = "#B03A2E", hjust = 1),
              plot.margin = margin(0,0,0,0))
      
    })
    
    h <- ggarrange(
      raceGraphs[[1]] + rremove("x.title"),
      raceGraphs[[2]] + rremove("y.title") + rremove("x.title"),
      raceGraphs[[3]],
      raceGraphs[[4]] + rremove("y.title"),
      ncol = 2, nrow = 2
    )
    
    topCancerRace <- annotate_figure(h, fig.lab = paste0(measureText," of Most Prevalent Cancers"),
                                     fig.lab.face = "bold")
    
    # ggsave(paste0("./images/", measureText,"_topCancerRace.png"))
    
    list(allcancers, topCancer, topCancerRace)
  }else{
    # just cancers caused by viruses
    
    virus <- filter(all_sexes, site %in% c("Anal", "Non-Hodgkin Lymphoma", "Liver", 
                                   "Oral & Oropharynx", "Cervix", 
                                   "Stomach"))
    
    
    i <-  filter(virus, race == "All_Races") %>% 
      ggplot(aes_string(x = "site", y = measure))
    
    virusCancer <- i + geom_bar(position = "dodge", stat = "identity", fill = "#F0B27A") +
      coord_flip() +
      geom_text(aes_string(label = measure), hjust = -0.4, color = "black",
                position = position_dodge(width = 1), size = 2) +
      labs(x = "Site",
           y = measureText) +
      ggtitle(paste0(measureText," of Cancers Caused by Viruses by Site")) +
      theme_light()
    
    # ggsave(paste0("./images/", measureText,"_virusCancer.png"))
    
    # facet by race
    races <- list("All_Races", "NHW", "NHB", "Hispanic")
    
    raceGraphs <- map(races, function(x){
      j <-  virus %>% 
        filter(race == x) %>% 
        arrange(race, .data[[measure]]) %>%  
        mutate(site= factor(site, levels = unique(site), ordered = TRUE)) %>% 
        ggplot(aes_string(x = "site", y = measure))
      
      j + geom_bar(position = "dodge", stat = "identity", fill = "#BB8FCE") +
        coord_flip() +
        geom_text(aes_string(label = measure), hjust = -0.1, color = "black",
                  position = position_dodge(width = 1), size = 2) +
        labs(x = "Site",
             y = measureText) +
        ggtitle(x) +
        theme_light()+
        theme(plot.title = element_text(size = 10, color = "#E74C3C", hjust = 1),
              plot.margin = margin(0,0,0,0))
    })
    
    j <- ggarrange(
      raceGraphs[[1]] + rremove("x.title"),
      raceGraphs[[2]] + rremove("y.title") + rremove("x.title"),
      raceGraphs[[3]],
      raceGraphs[[4]] + rremove("y.title"),
      ncol = 2, nrow = 2
    )
    
    virusCancerRace <- annotate_figure(j, fig.lab = paste0(measureText," of Cancers Caused by Viruses by Site"),
                                       fig.lab.face = "bold")
    
    # ggsave(paste0("./images/", measureText,"_virusCancerRace.png"))
    
    list(virusCancer, virusCancerRace)
  }
  
  
}

makeGraphs("count")

makeGraphs("adjRate", virus = TRUE)

YAASSSS!!!…purrring in contentment ;)

*Note: I’ve commented out the ggsaves for the sake of making sure I don’t screw up what I already have

I hope what went through my head effectively made it through my fingers, onto the keyboard, and on the screen into this document in an intelligible way.