Statistical modelling of Climate-Sensitive Infectious Disease (CSID) usually requires commons steps to harmonize and make compatible data from epidemiology and climate. Different teams do similar procedures in order to reach similar results for this purpose. But the actual coding of these steps involve lots of apparently small choices and details that, in the end, produce results very different and not directly comparable.
The modeling of CSID requires that diseases cases data and climate data be in the similar structure, usually a table containing variables about place of occurrence of disease cases, date, and climate indicators at the same date and previous dates.
For this, I created a package to handle disease datasets, with functions to imputate important variables and aggregate the disease’s individual cases into tables of counts. Those are apparently easy tasks, but the devil is in the details, like if the aggregating variables were imputated or not, and if the aggregating task result on time series of the same size, including weeks or months without cases. This package objective is to propose a standard procedure for this.
To handle the climate indicators, I created a package that converts gridded climate indicators data to zonal statistics at the same spatial scale as the disease data. The package contain functions that takes as input, climate indicators on raster format (from ERA5-Land for example) and a spatial boundaries map file that should be compatible with the disease data aggregation. The package produces tables with zonal statistics for the boundaries units, like the “average maximum temperature” at the spatial unit. This is a fairly common methodology, but with lots of small decisions involved on those steps, like making the layers projection compatible, produce raw or weighted zonal statistics, date units, etc. This package objective is also to propose a standard procedure for those steps.
To execute those tasks that use common features, the targets package is being very helpful. I managed to create a workflow that takes all the necessary inputs and creates in the end a dataset ready for statistical modeling. And some reports are also created during the execution, with the help from the tarchetypes, providing information about the steps that are important to retain and communicate with the results.
The next steps are to create functions (and maybe a package) for modeling and integrate it into the workflow.
The workflow and the packages are at the very early stage of development.
health geography, epidemiology, climatology
You can reach me at Twitter (@rfsaldanhario) and at Fosstodon (@email@example.com)