COVID-19 hospitalisations in Germany are released by date of positive test rather than by date of admission. This has some advantages when they are used as a tool for surveillance as these data are closer to the date of infection and so easier to link to underlying transmission dynamics and public health interventions. Unfortunately, however, when released in this way the latest data are right-censored meaning that final hospitalisations for a given day are initially underreported. This issue is often found in data sets used for the surveillance of infectious diseases and can lead to delayed or biased decision making. Fortunately, when data from a series of days is available we can estimate the level of censoring and provide estimates for the truncated hospitalisations adjusted for truncation with appropriate uncertainty. This is usually known as a nowcast.
In this work, we aim to evaluate a series of novel semi-parametric nowcasting model formulations in real-time and provide an example workflow to allow others to do similarly using German COVID-19 hospitalisations by date of positive test at the national level both overall and by age group, and at the state level. This project is part of a wider collaboration assessing a range of nowcasting methods whilst providing an ensemble nowcast of COVID-19 Hospital admissions in Germany by date of positive test.
All models are implemented using the
epinowcast R package. The nowcasting and evaluation pipeline is implemented using the
targets R package. All input data, interim data, and output data are available and should also be fully reproducible from the provided code. Please see the resources section for details. Further details on our methodology are included in our paper.
In general, the
targets ecosystem is well developed and easy to use. For my use case, the currently big missing features are integration with cloud compute services and transient containerised workflows. In many ways, these are not issues with
targets itself but instead with the wider ecosystem of R packages supporting modern distributed workflows. I am still exploring
targets workflows (see here for another example not using
Rmarkdown) so any feedback, hints, tips are very much appreciated.
piggyback is a really nice and simple way to share data (via GitHub releases) that being said support for automatic file tracking vs manual uploading results would likely greatly improve the workflow. It is also not entirely clear to me if
piggyback is the current recommended choice for sharing scientific data (there are quite a few options with osfr also being relatively okay to use if not seamless). Clarification of current best practices for data workflows and the tools that support them would be very useful for improving my practice.