I’d like to remove a lot of skimmers, and add a
percentage_missing, which, as the name implies, is just the ratio between
n). However, this doesn’t work:
#> Warning: package 'skimr' was built under R version 3.4.4
skim_with(numeric = list(missing = NULL, complete = NULL,
p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL,
hist = NULL,
percent_missing = n_missing/length))
#> Error in n_missing/length: non-numeric argument to binary operator
Can you help me?
PS I already asked a similar question some time ago (well, I opened an issue, actually, but I think it’s more appropriate to ask a question here). However, I’m facing a similar problem again. Maybe I’m just dense (or not enough familiar with
purrr), but I think that adding to the documentation some more examples on how to define new skimmers, would definitely help.
To use the
percent_missing function as you’re intending, it needs to be defined as a anonymous function. The easiest way to do that in skim is to use the tilde syntax from
purrr, along with a pronoun variable.
Also, it appears that you might want to take advantage of the
append = FALSE argument in
skim_with. This will drop every function except those that you explicitly define. It might save you some typing.
So, for example
skim_with(numeric = list(percent_missing = ~n_missing(.x) / length(.x)),
append = FALSE)
#> Skim summary statistics
#> n obs: 150
#> n variables: 5
#> Variable type: numeric
#> variable percent_missing
#> Petal.Length 0
#> Petal.Width 0
#> Sepal.Length 0
#> Sepal.Width 0
Alternatively you could use the traditional anonymous function syntax within R.
numeric = list(percent_missing = function(x) n_missing(x) / length(x)),
append = FALSE)
thanks @michaelquinn32! Your help with
skimr is always great good point about
append = FALSE too - I’m not dropping all existing skimmers (I still use
median for example), but probably the ones I drop exceed those I retain, so it makes more sense to define explicitly the list of skimmers I use, than to drop all those I don’t use.