Searchable metadata in help files with htmlwidgets

I figured I should write this down somewhere as it’s a very useful trick I’ve been using for internal R data packages and might be useful in other R data wrappers that rOpenSci develops and accepts.

Sometimes an R package that wraps a data source includes metadata that we want to make easily accessible and searchable. An example is field codes or abbreviations. For some data sets there can be non-trivial number of codings. We might store these in a separate table that is accessible in the R wrapper package as data or a function that looks like this.

# A tibble: 4 x 3
  field  code  description 
  <chr>  <chr> <chr>       
1 field1 A     Early State 
2 field1 B     Medium State
3 field1 C     Late Stage  
4 field1 D     Kaput    

This table is useful when provided as package data or via a function like mydata::my_metadata_table(). One can, for instance, join it to a data extract via the code field to have more descriptive data. But often this is data that users want to browse or search. I’ve found it helpful in these cases to use the DT package to embed a searchable table in the help documentation of the data.

This is a bit tricky but thanks to the \Sexp{} macros in R documentation, you can execute arbitrary code in building the HTML of your help file. The trick is creating your HTML file to include all the self-contained javascript code to run the widget. Here’s what I put in the roxygen comments to put a searchable datatable into a helpfile for mypkg::my_metadata_table()

#' \if{html}{ % Only applies to HTML help files
#'   \Sexpr[echo=FALSE, results=rd, stage=build]{
#'   #This doesn't work for pkgdown pages, so detect if the page is being built in pkgdown and skip the widget
#'   in_pkgdown <- any(grepl("as_html.tag_Sexpr", sapply(sys.calls(), function(a) paste(deparse(a), collapse = "\n"))))
#'     if(in_pkgdown) {
#'       mytext <- c('In RStudio, this help file includes a searchable table of values.')
#'     } else {
#'     tmp <- tempfile(fileext=".html")
#'      #Create a DT htmlwidget and save it to a tempfile
#'       htmlwidgets::saveWidget(DT::datatable(mypkg::my_metadata_table(), rownames = FALSE, width=700), tmp)
#'       #Read the widget file in, but remove some html tags
#'       mytext <- paste('Below is a searchable version of the database codes.',
#'       '\\\out{<div style="width:100\%">',
#'          paste(stringi::stri_subset_regex(readLines(tmp), "^</?(!DOCTYPE|meta|body|html)",negate=TRUE), collapse="\n"),
#'       '</div>}',
#'       sep="\n")
#'     }
#'     mytext
#'   }
#' }
#'
#' if{text,latex}{The HTML version of this help file includes a searchable table of the database codes}

(The package needs to import htmlwidgets, DT and stringi)

Here’s the result, in this case from the help of an internal package I created that uses this approach:

Thoughts? I’d be interested if someone has a more robust way to do this that works in pkgdown sites, as well.

6 Likes

I like it. I think I’ve heard you talk about it, but hadn’t seen it yet.

Will CRAN maintainers blow a gasket if you try to submit a pkg to CRAN with this?

I dunno, but it gets past R CMD check --as-cran without any flags, and the PDF manual renders nicely.

I’d like to put this thing on CRAN when the data this package wraps goes public, the bigger issue might be that is uses @richfitz’s datastorr to serve and cache the data, and I’m not sure whether that will be compliant with CRAN’s writing-to-disk policy.

Yes, this is my concern too (the writing to disk) especially in the current climate on CRAN. datastorr currently uses rappdirs and that is used by other CRAN packages but it does seem likely to be problematic. I think it could probably be dealt with by some options settings (e.g., options(datastorr.path = ...) or options(lemis.path = ). I’d welcome any thoughts that sit at the intersection of enjoyable for the end users and not likely to cause drama with the CRAN keepers

It seems like this is a very spottily enforced policy. For instance, I’ve been using RSelenium and it automatically downloads driver binaries via wdman if they’re missing on your local machine.

I use DT for this purpose (to show metadata) in the codebook package (in vignettes and as an addin). I ultimately had to cut it from CRAN vignettes, but only because of the size of the DT dependencies. I haven’t tried including it in help files. Would be interested to know if someone finds a way around that.

Huh, I may submit to CRAN soon and I’ll let you know how it goes. A current version of a package with this is here: https://github.com/ecohealthalliance/cites/ . It’s not very large installed or in tarball form: <500KB. I think this is because the HTML help files are generated on-the-fly on the user’s computer, so the package doesn’t contain the rendered DT.

That makes sense and may be a way for me to show codebook tables (outsides vignettes) as well. Happy to hear how it goes.

Looking again, this isn’t quite right - the help files are big - >2MB each! While the HTML is created on the fly all the DT headers are included in the pre-generated .Rd files. However, they are stored as R lazyload databases (.rdx/.rdb), and these compress down to much, much less than this (<400KB for ALL the help files).