Managing data on a subset of dataset using tidyverse

Solution for Managing data on a subset of dataset using tidyverse
is Given Below:

Tidyverse makes data management a lot easier, and I am grateful to the developers who made it. I am familiar with the basic dplyr functions like group_by, filter, select, and mutate. However, there are times that I want to manage the data when there is a subset of the data.

Using the dplyr starwars dataset as an example, below are two tasks that I know how to manage using base R. I wonder what are the tidyverse equivalents.

library(dplyr, warn.conflicts = FALSE)
data(starwars)
dim(starwars)
#> [1] 87 14

In the starwars dataset, suppose that the data entry personnel made a mistake on two homeworlds.

Task 1: On the Tatooine homeworld, the hair color for all characters with a height at least 160 cm should be “blond”. It is easy to do this in base R, but what is the tidyverse equivalent?

starwars[starwars$homeworld == "Tatooine" & starwars$height >160, 
         "hair_color",
         drop = TRUE
] <- "blond"

Task 2: A bit more complicated. Suppose that there was a mistake made on Naboo (homeworld) where the missing mass was 80 kg, and want to exclude Gungan species on homeworld from the dataset. I also want to count the number of rows used for the Naboo homeworld only. In base R, this is straightforward, but due to the substantial data management, I need to assign an intermediate object.

# Make a subset for those from Naboo homeworld and delete from main dataset 
starwars.Naboo <- starwars %>% filter(homeworld == "Naboo")
starwars <- starwars %>% filter(homeworld != "Naboo")

# Manage the subset of starwars.Naboo 
starwars.Naboo <- starwars.Naboo %>% 
    filter(species != "Gungan") %>%
    mutate(mass = coalesce(mass, 80)) %>%
    mutate(num = n())

# Re-add starwars.Naboo. num should be missing for all other homeworlds. 
starwars2 <- bind_rows(starwars, starwars.Naboo)


# Check to ensure transformation works
starwars2 %>% 
  tail(n = 15) %>%
  print(width = Inf)
#> # A tibble: 15 x 15
#>    name            height  mass hair_color skin_color       eye_color    
#>    <chr>            <int> <dbl> <chr>      <chr>            <chr>        
#>  1 Ratts Tyerell       79    15 none       grey, blue       unknown      
#>  2 Wat Tambor         193    48 none       green, grey      unknown      
#>  3 San Hill           191    NA none       grey             gold         
#>  4 Shaak Ti           178    57 none       red, blue, white black        
#>  5 Grievous           216   159 none       brown, white     green, yellow
#>  6 Tarfful            234   136 brown      brown            blue         
#>  7 Raymus Antilles    188    79 brown      light            brown        
#>  8 Sly Moore          178    48 none       pale             white        
#>  9 Tion Medon         206    80 none       grey             black        
#> 10 R2-D2               96    32 <NA>       white, blue      red          
#> 11 Palpatine          170    75 grey       pale             yellow       
#> 12 Gregar Typho       185    85 black      dark             brown        
#> 13 Cordé              157    80 brown      light            brown        
#> 14 Dormé              165    80 brown      light            brown        
#> 15 Padmé Amidala      165    45 brown      light            brown        
#>    birth_year sex    gender    homeworld   species films     vehicles  starships
#>         <dbl> <chr>  <chr>     <chr>       <chr>   <list>    <list>    <list>   
#>  1         NA male   masculine Aleen Minor Aleena  <chr [1]> <chr [0]> <chr [0]>
#>  2         NA male   masculine Skako       Skakoan <chr [1]> <chr [0]> <chr [0]>
#>  3         NA male   masculine Muunilinst  Muun    <chr [1]> <chr [0]> <chr [0]>
#>  4         NA female feminine  Shili       Togruta <chr [2]> <chr [0]> <chr [0]>
#>  5         NA male   masculine Kalee       Kaleesh <chr [1]> <chr [1]> <chr [1]>
#>  6         NA male   masculine Kashyyyk    Wookiee <chr [1]> <chr [0]> <chr [0]>
#>  7         NA male   masculine Alderaan    Human   <chr [2]> <chr [0]> <chr [0]>
#>  8         NA <NA>   <NA>      Umbara      <NA>    <chr [2]> <chr [0]> <chr [0]>
#>  9         NA male   masculine Utapau      Pau'an  <chr [1]> <chr [0]> <chr [0]>
#> 10         33 none   masculine Naboo       Droid   <chr [7]> <chr [0]> <chr [0]>
#> 11         82 male   masculine Naboo       Human   <chr [5]> <chr [0]> <chr [0]>
#> 12         NA male   masculine Naboo       Human   <chr [1]> <chr [0]> <chr [1]>
#> 13         NA female feminine  Naboo       Human   <chr [1]> <chr [0]> <chr [0]>
#> 14         NA female feminine  Naboo       Human   <chr [1]> <chr [0]> <chr [0]>
#> 15         46 female feminine  Naboo       Human   <chr [3]> <chr [0]> <chr [3]>
#>      num
#>    <int>
#>  1    NA
#>  2    NA
#>  3    NA
#>  4    NA
#>  5    NA
#>  6    NA
#>  7    NA
#>  8    NA
#>  9    NA
#> 10     6
#> 11     6
#> 12     6
#> 13     6
#> 14     6
#> 15     6

Session_info

xfun::session_info("dplyr")
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Locale:
#>   LC_COLLATE=English_United States.1252 
#>   LC_CTYPE=English_United States.1252   
#>   LC_MONETARY=English_United States.1252
#>   LC_NUMERIC=C                          
#>   LC_TIME=English_United States.1252    
#> 
#> Package version:
#>   cli_2.5.0        crayon_1.4.1     **dplyr_1.0.5**      ellipsis_0.3.2  
#>   fansi_0.5.0      generics_0.1.0   glue_1.4.2       graphics_4.0.4  
#>   grDevices_4.0.4  lifecycle_1.0.0  magrittr_2.0.1   methods_4.0.4   
#>   pillar_1.6.1     pkgconfig_2.0.3  purrr_0.3.4      R6_2.5.0        
#>   rlang_0.4.11     stats_4.0.4      tibble_3.1.2     tidyselect_1.1.0
#>   utf8_1.2.1       utils_4.0.4      vctrs_0.3.8

EDIT 1: For Task 2, can I avoid assigning an intermediate object in tidyverse? Is there a more elegant way to use this in the code above?

Looks like you’ve done 2 already?

starwars %>%
  mutate(
    hair_color = if_else(homeworld == "Tatooine" & height >160, "blond", hair_color)
  )

Your second task can be done in one pipe by

starwars %>% 
  filter(homeworld == "Naboo", 
         species != "Gungan") %>%
  mutate(mass = coalesce(mass, 80),
         num = n()) %>% 
  bind_rows(starwars %>% filter(homeworld != "Naboo"))