Evaluating semi-parametric nowcasts of COVID-19 hospital admissions in Germany

rOpenSci package or resource used

targets, piggyback

What did you do?

COVID-19 hospitalisations in Germany are released by date of positive test rather than by date of admission. This has some advantages when they are used as a tool for surveillance as these data are closer to the date of infection and so easier to link to underlying transmission dynamics and public health interventions. Unfortunately, however, when released in this way the latest data are right-censored meaning that final hospitalisations for a given day are initially underreported. This issue is often found in data sets used for the surveillance of infectious diseases and can lead to delayed or biased decision making. Fortunately, when data from a series of days is available we can estimate the level of censoring and provide estimates for the truncated hospitalisations adjusted for truncation with appropriate uncertainty. This is usually known as a nowcast.

In this work, we aim to evaluate a series of novel semi-parametric nowcasting model formulations in real-time and provide an example workflow to allow others to do similarly using German COVID-19 hospitalisations by date of positive test at the national level both overall and by age group, and at the state level. This project is part of a wider collaboration assessing a range of nowcasting methods whilst providing an ensemble nowcast of COVID-19 Hospital admissions in Germany by date of positive test.

All models are implemented using the epinowcast R package. The nowcasting and evaluation pipeline is implemented using the targets R package. All input data, interim data, and output data are available and should also be fully reproducible from the provided code. Please see the resources section for details. Further details on our methodology are included in our paper.

URL or code snippet for your use case

https://epiforecasts.io/eval-germany-sp-nowcasting/

Image

Sector

academic

Field(s) of application

epidemiology

Comments

In general, the targets ecosystem is well developed and easy to use. For my use case, the currently big missing features are integration with cloud compute services and transient containerised workflows. In many ways, these are not issues with targets itself but instead with the wider ecosystem of R packages supporting modern distributed workflows. I am still exploring targets workflows (see here for another example not using Rmarkdown) so any feedback, hints, tips are very much appreciated.

piggyback is a really nice and simple way to share data (via GitHub releases) that being said support for automatic file tracking vs manual uploading results would likely greatly improve the workflow. It is also not entirely clear to me if piggyback is the current recommended choice for sharing scientific data (there are quite a few options with osfr also being relatively okay to use if not seamless). Clarification of current best practices for data workflows and the tools that support them would be very useful for improving my practice.

Twitter handle

@seabbs

3 Likes

Really amazing stuff here!

Re piggyback: (as the piggyback maintainer) I think that’s still a convenient option, but it’s not suited to ‘archival’ purposes. Maybe that’s the intent here – if a forecast is updated daily, perhaps we don’t want to archive every prediction? I’m currently playing with workflows that can use piggyback storage as an option, but are technically agnostic to the storage location; allowing me to move the data to a cloud S3 bucket, a permanent archive like Zenodo, or even just local storage via https://github.com/cboettig/contentid. This can also facilitate some of the tasks targets does of not re-running code when input is unchanged.

Re targets / R generally friction on cloud & containers – I’m right there with you. Can you share more details about where the biggest friction points are for your workflows? Looks like you have some clever use of GitHub Actions (from a glance anyway) – in general I think that’s a nice model. If we can have self-updating forecasts run, score, and visualize via GitHub Actions, I find the same toolchain can usually be deployed in cloud environments.

I’m not a sophisticated user of targets, but personally I find it still leads me to complex workflows that can be hard to break apart into coherent chunks that might be run in distributed fashion. (i.e. like your diagram! it’s hard to follow all that!) In our forecasting work, we’ve found it helps immensely to take a standards-driven approach instead: discrete steps of a forecast workflow produce standardized flat-file outputs that become the input to the next stage. This reduces the number of moving pieces one stage must pass off to another (e.g. that the model forecast step passes to the forecast scoring step), and makes steps more agnostic of the software details – the necessary info is contained within a few standardized csv and metadata files that can be passed between machines in cloud-friendly, provenance-aware ways. Anyway, I’m rambling at this point but would love to compare notes.

2 Likes

targets can orchestrate discrete steps in distributed fashion on anything that packages clustermq and future can access, mainly traditional scheduler like SLURM, SGE, TORQUE, PBS, and LSF. AWS ParallelCluster instances can opt into SLURM, so cloud computing with targets is technically possible, in a way. Unfortunately, neither clustermq nor future can interface with AWS Lambda, Fargate, or Batch just yet. Hopefully https://github.com/HenrikBengtsson/future.aws.lambda will be a start. In some cases, containerization can be handled independently using an approach like in Joel Nitta’s blog: Joel Nitta: Managing bioinformatics pipelines with R.

2 Likes