Standards for wrapper packages, and some pracical questions

Hi all,

I was wondering if there is a draft for software standards regarding wrapper packages floating around? I am asking because the software standards guide (for regression models) has been super helpful for me in the past :slight_smile:

I am currently integrating WildBootTests.jl functionality into the fwildclusterboot package and face a handful of questions which I hope might be discussed in the standards draft.

Q1: How do I conveniently allow users who are not familiar with Julia (and who might never have heard of it) to call Julia from within R? In my use case, this means to help users to install Julia, all Julia dependencies of fwildclusterboot and link Julia and R via the JuliaConnectoR package.
Q2: How do I / can users check if the required Julia packages are installed, and what is the best way to notify package users that a new version of WildBootTests.jl is available or even required?
Q3: How can I set a global random seed in R and pass it to Julia (to make reproducibility of results easy)?

I have drafted a small package, JuliaConnectoR.utils, which is my attempt to answer some of the questions raised in Q1 and Q2.

At the moment, it includes five functions:

  • install_julia() is a wrapper around JuliaCall::install_julia() and installs Julia from within R
  • connect_julia_r() opens the .renviron file and gives some instruction on what to do so that the JuliaConnectoR finds Julia
  • install_julia_packages() is a vectorized function that allows to install Julia packages from within R
  • check_julia_system_requirements() parses the Description file and checks if the R sessionsā€™ current Julia version and version numbers of required packages match the requirements (this function is super hacky)
  • set_julia_nthreads() gives some instructions on how to control the number of threads used in Julia from within R

Hence a workflow utilizing JuliaConnectoR.utils (for R users who have never before had any contact with Julia) could look like this:

library(JuliaConnectoR.utils)

install_julia()
connect_julia_r()
devtools::install_github("s3alfisc/fwildclusterboot")
install_julia_packages("WildBootTests.jl")
check_julia_system_requirements("fwildclusterboot")
set_julia_nthreads()

Do you have any suggestions regarding ā€œbest practicesā€ for dependency management?

Regarding Q3, I found a solution in a nice blog post on R-bloggers.

The main idea is to use another R function that stochastically generates a random integer x based on the current state of the global R seed and then to use this integer as a seed value in either Julia, c++, or any other language. Hence when using the Julia or c++ based algorithms after running set.seed(1234), neither c++ or Julia receive 1234 as its internal seed value.

For c++, this could look like this:

#include <Rcpp.h>

//[[Rcpp::export]]
int stochastic_cpp(int seed){
  srand(seed);
  int x = rand();
  return x; 
}
stochastic_rcpp_fun <- function(seed = NULL){
  
  if(is.null(seed)){
    seed <- sample.int(.Machine$integer.max, 1)
  } else if(!is.null(seed)){
    set.seed(seed)
    seed <- sample.int(.Machine$integer.max, 1)
  }
  
  x <- stochastic_cpp(seed = seed)

  x
  
}

As a result, it is possible to control the stochastic behavior of c++ via set.seed():

set.seed(1234)
stochastic_rcpp_fun()
# 23829
stochastic_rcpp_fun(seed = 1234)
# 23829

Do you see any drawbacks from this approach? Should I include the ā€œstochasticallyā€ generated seed used in Julia / c++ in the function result list?

Best, Alex

Hi Alex, thanks for enquiring here. There is unfortunately no draft yet of standards for wrapper software. Those standards will hopefully appear before mid-2022, although theyā€™ll be addressed last because weā€™re very aware that they may be the most difficult of all standards to draft in a sufficiently general yet pragmatic manner. Random seeds will obviously be a key part of these standards. I imagine a standard along those lines would highlight the importance of enabling the random state in R to be maintained - so simply passing a seed like in your example would likely be insufficient. The cpp11 docs provide a nice example, where the standards would likely suggest the importance of somehow exposing the calls to GetRNGstate() and PutRNGstate(), the key being that any function should enable (not necessarily by default) some ability to call PutRNGstate().

Iā€™ll ping @adamhsparks for some more informed responses to your Julia-specific questions. Feel free to ask any further questions here, and weā€™ll aim to ping you again for your insight and opinions once we begin development of the wrapper standards. Having input from somebody who has actually gone through the process of developing a wrapper package will be very useful for us once we start tackling this task.

2 Likes

Having looked at the repositories my main comment is that I would be more verbose about Julia.

I donā€™t really see much explanation as to why youā€™d want to install Julia (and use it in this way). If youā€™re truly interested in inexperienced users, a bit more background in the README that indicates it is a different language and that it may be faster for some operations and can be accessed through this R package would be useful.

Since Iā€™m using M1 Macs Iā€™m quite aware of some of the cross-language issues that can arise. If one is using the native R but the x86 Julia or vice-versa they wonā€™t get along well. Iā€™d be more verbose about that issue as it could easily trip up the inexperienced. if I followed the directions for installing Julia through R, Iā€™d end up with the x86 version on my M1, which would run by itself but not interface with my R installation. Until JuliaCall is updated (and Julia is stable for M1 devices) more detail would be good.

Also, Iā€™d check the documentation page. I had several Page Not Found Errors and am unable to find documentation in the package for set_julia_ntreads() thatā€™s mentioned in the README.

1 Like

Hi Mark and Adam, thanks so much for your replies!

The cpp11 link is a great starting point! It looks like I should dive a little deeper into Rā€™s seed functionality and how it interfaces with C/C++, and I will also consider to only rely on R API functions for random number generation (in which case I think I would
have to drop OPEN MP support).

Thanks so much for taking the time to take a look at the repositories, @adamhsparks. I will try to be more explicit on why and when I believe that it is beneficial to use fwildclusterboot as an API to WildBootTests.jl (there are three reasons: itā€™s far more memory efficient, faster for large problems and in particular incorporates functionality not covered in ā€˜native Rā€™). Iā€™m currently working on updating the pkgdown documentation (I am aware that parts of it are in shambles at the moment).

If I am honest, I had hoped that JuliaCall::install_julia()was ā€˜matureā€™ enough to install the ā€˜rightā€™ Julia version for windows & mac without major hiccups. Thanks for making me aware that this is apparently not the case. I think I will leave the JuliaConnectoR.utils project for now and give more detailed installation instructions in the readme.

Still, I do believe that the demand for seamless installation and setup of Julia and R will increase, and I therefore see value in a package that helps R users to connect with R & Julia. :slight_smile:

Last, do you have a suggestion on how users could check if appropriate versions of Julia / WildBootTests.jl are installed @adamhsparks? E.g. fwildclusterboot currently only supports WildBootTests. 0.7.7 or higher. Should I simply be very explicit about version requirements in the readme, and leave everything else as a responsibility of the user? At the moment, there does not seem to be a convention to specify Julia package version requirements in the description file - none of the packages that wrap Julia packages via JuliaCall currently specify version requirements in the readme (e.g. convexjlr).

Anyways, Iā€™ll try to keep tabs on my learnings and hope that Iā€™ll be able to provide some meaningful feedback once you have settled on a first draft of wrapper
standards!

I donā€™t think itā€™s that JuliaCall::install_julia() is immature. Itā€™s that Julia is immature. v1.7 works natively but isnā€™t fully supported on M1 devices yet. v1.8 is supposed to be fully supported, I think? So thereā€™s friction because the native R canā€™t talk to the x86 Julia and vice-versa. So being very clear about this in the docs is the best I can think of. I would think that most users that get this far would understand the implications if thereā€™s good documentation.

Sorry, I donā€™t know how youā€™d specify package versions in the R DESCRIPTION file for Julia either. But maybe you could use Juliaā€™s native functionality to check that the proper version was installed?

1 Like