Hi all,
I was wondering if there is a draft for software standards regarding wrapper packages floating around? I am asking because the software standards guide (for regression models) has been super helpful for me in the past
I am currently integrating WildBootTests.jl functionality into the fwildclusterboot package and face a handful of questions which I hope might be discussed in the standards draft.
Q1: How do I conveniently allow users who are not familiar with Julia (and who might never have heard of it) to call Julia from within R? In my use case, this means to help users to install Julia, all Julia dependencies of fwildclusterboot
and link Julia and R via the JuliaConnectoR
package.
Q2: How do I / can users check if the required Julia packages are installed, and what is the best way to notify package users that a new version of WildBootTests.jl
is available or even required?
Q3: How can I set a global random seed in R and pass it to Julia (to make reproducibility of results easy)?
I have drafted a small package, JuliaConnectoR.utils, which is my attempt to answer some of the questions raised in Q1 and Q2.
At the moment, it includes five functions:
-
install_julia()
is a wrapper aroundJuliaCall::install_julia()
and installs Julia from within R -
connect_julia_r()
opens the.renviron
file and gives some instruction on what to do so that theJuliaConnectoR
finds Julia -
install_julia_packages()
is a vectorized function that allows to install Julia packages from within R -
check_julia_system_requirements()
parses theDescription
file and checks if the R sessionsā current Julia version and version numbers of required packages match the requirements (this function is super hacky) -
set_julia_nthreads()
gives some instructions on how to control the number of threads used in Julia from within R
Hence a workflow utilizing JuliaConnectoR.utils
(for R users who have never before had any contact with Julia) could look like this:
library(JuliaConnectoR.utils)
install_julia()
connect_julia_r()
devtools::install_github("s3alfisc/fwildclusterboot")
install_julia_packages("WildBootTests.jl")
check_julia_system_requirements("fwildclusterboot")
set_julia_nthreads()
Do you have any suggestions regarding ābest practicesā for dependency management?
Regarding Q3, I found a solution in a nice blog post on R-bloggers.
The main idea is to use another R function that stochastically generates a random integer x based on the current state of the global R seed and then to use this integer as a seed value in either Julia, c++, or any other language. Hence when using the Julia or c++ based algorithms after running set.seed(1234)
, neither c++ or Julia receive 1234
as its internal seed value.
For c++, this could look like this:
#include <Rcpp.h>
//[[Rcpp::export]]
int stochastic_cpp(int seed){
srand(seed);
int x = rand();
return x;
}
stochastic_rcpp_fun <- function(seed = NULL){
if(is.null(seed)){
seed <- sample.int(.Machine$integer.max, 1)
} else if(!is.null(seed)){
set.seed(seed)
seed <- sample.int(.Machine$integer.max, 1)
}
x <- stochastic_cpp(seed = seed)
x
}
As a result, it is possible to control the stochastic behavior of c++ via set.seed()
:
set.seed(1234)
stochastic_rcpp_fun()
# 23829
stochastic_rcpp_fun(seed = 1234)
# 23829
Do you see any drawbacks from this approach? Should I include the āstochasticallyā generated seed used in Julia / c++ in the function result list?
Best, Alex