[skimr] use a string vector to select which columns to skim

I want to use skim inside a function. The function accepts a string vector which identifies which columns to skim. Example:

library(skimr)
n <- 10
foo <- data.frame(pin = rnorm(n), deltap = rgamma(n, 1), t = runif(n), name = letters[1:n])
selected_columns <- c("pin", "deltap")

I want to apply skim to pin and pout only, but their names are passed as strings. How can I solve this? This is an example of the function I’d like to use:

skim_on_selected <- function(x, selected_columns){
  skim(x, selected_columns)
}

Of course this is not going to work. Also, I’d prefer to use ..., like in

skim_on_selected <- function(..., selected_columns){
  skim(..., selected_columns)
}

But I can accept a solution where the number of parameters in input to skim is fixed.

NOTE: I guess that in a sense this is a question more related to tidy evaluation, than specifically to skimr, but hopefully you can help me anyway :slight_smile:

ping @elinw @michaelquinn32 thoughts?

Why not this?

a<-c("Sepal.Length", "Petal.Length")
skimr::skim(iris[a])
2 Likes

Why not, indeed :grin: sometimes we need others to help us see things that are right in front of us. Thank you @elinw!

Hi Andrea!

Sorry for the delay.

You can also use rlang to select the columns that you want.

library(skimr)
library(rlang)
cols <- c("Sepal.Width", "Petal.Length")
skim(iris, !!!cols)

You might also consider writing your function to pass along a ... argument, which would then play nicely with all of dplyr’s select helpers (which we support).

my_skim <- function(data, ...) {
  skim(data, ...)
}

my_skim(iris, dplyr::starts_with("Sepal"))

I hope that helps. Please let me know if there is more that I can do.

1 Like

@michaelquinn32 this is even better! I prefer passing a my_skim function to the function my_fun which has to use it, rather than modifying my calling function to pass a “reduced” dataframe to skim, because I then pass my_fun to sapply and apply it to a list of files…I would select your answer as “closed”, but I cannot see a tick-box below your answer.

On second thoughts, I can do the same with @elinw answer (create a my_skim function to pass around):

my_skim <- function(data, ...) {
  skim(data[selected_columns], ...)
}

Good to have options, anyway! Thanks to both for helping!