Hi,
I’d like to remove a lot of skimmers, and add a percentage_missing
, which, as the name implies, is just the ratio between n_missing
(i.e., missing
) and length
(i.e., n
). However, this doesn’t work:
library(skimr)
#> Warning: package 'skimr' was built under R version 3.4.4
skim_with(numeric = list(missing = NULL, complete = NULL,
p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL,
hist = NULL,
percent_missing = n_missing/length))
#> Error in n_missing/length: non-numeric argument to binary operator
Can you help me?
PS I already asked a similar question some time ago (well, I opened an issue, actually, but I think it’s more appropriate to ask a question here). However, I’m facing a similar problem again. Maybe I’m just dense (or not enough familiar with purrr
), but I think that adding to the documentation some more examples on how to define new skimmers, would definitely help.
Hi Andrea!
To use the percent_missing
function as you’re intending, it needs to be defined as a anonymous function. The easiest way to do that in skim is to use the tilde syntax from purrr
, along with a pronoun variable.
Also, it appears that you might want to take advantage of the append = FALSE
argument in skim_with
. This will drop every function except those that you explicitly define. It might save you some typing.
So, for example
library(skimr)
skim_with(numeric = list(percent_missing = ~n_missing(.x) / length(.x)),
append = FALSE)
skim(iris, -Species)
#> Skim summary statistics
#> n obs: 150
#> n variables: 5
#>
#> Variable type: numeric
#> variable percent_missing
#> Petal.Length 0
#> Petal.Width 0
#> Sepal.Length 0
#> Sepal.Width 0
Alternatively you could use the traditional anonymous function syntax within R.
skim_with(
numeric = list(percent_missing = function(x) n_missing(x) / length(x)),
append = FALSE)
Best wishes,
Michael
1 Like
thanks @michaelquinn32! Your help with skimr
is always great good point about append = FALSE
too - I’m not dropping all existing skimmers (I still use mean
and median
for example), but probably the ones I drop exceed those I retain, so it makes more sense to define explicitly the list of skimmers I use, than to drop all those I don’t use.