[skimr] help with defining new summary functions

r
Tags: #<Tag:0x00007f57f8a993c0>

#1

Hi,

I’d like to remove a lot of skimmers, and add a percentage_missing, which, as the name implies, is just the ratio between n_missing (i.e., missing) and length (i.e., n). However, this doesn’t work:

library(skimr)
#> Warning: package 'skimr' was built under R version 3.4.4

skim_with(numeric = list(missing =  NULL, complete = NULL, 
                        p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, 
                        hist = NULL,
                        percent_missing = n_missing/length))
#> Error in n_missing/length: non-numeric argument to binary operator

Can you help me?

PS I already asked a similar question some time ago (well, I opened an issue, actually, but I think it’s more appropriate to ask a question here). However, I’m facing a similar problem again. Maybe I’m just dense :slight_smile: (or not enough familiar with purrr), but I think that adding to the documentation some more examples on how to define new skimmers, would definitely help.


#2

Hi Andrea!

To use the percent_missing function as you’re intending, it needs to be defined as a anonymous function. The easiest way to do that in skim is to use the tilde syntax from purrr, along with a pronoun variable.

Also, it appears that you might want to take advantage of the append = FALSE argument in skim_with. This will drop every function except those that you explicitly define. It might save you some typing.

So, for example

library(skimr)
skim_with(numeric = list(percent_missing = ~n_missing(.x) / length(.x)),
          append = FALSE)
skim(iris, -Species)
#> Skim summary statistics
#>  n obs: 150 
#>  n variables: 5 
#> 
#> Variable type: numeric 
#>      variable percent_missing
#>  Petal.Length               0
#>   Petal.Width               0
#>  Sepal.Length               0
#>   Sepal.Width               0

Alternatively you could use the traditional anonymous function syntax within R.

skim_with(
  numeric = list(percent_missing =  function(x) n_missing(x) / length(x)),
  append = FALSE)

Best wishes,
Michael


#3

thanks @michaelquinn32! Your help with skimr is always great :slight_smile: good point about append = FALSE too - I’m not dropping all existing skimmers (I still use mean and median for example), but probably the ones I drop exceed those I retain, so it makes more sense to define explicitly the list of skimmers I use, than to drop all those I don’t use.