Hi all! I gave a talk and wrote a corresponding blog post a couple of weeks ago around strategies for working with a new data set, and it just so happened that all of the packages I recommended are part of rOpenSci So it makes sense to share it here (thank you @stefanie for the nudge )!
I used visdat
, skimr
, and assertr
to demonstrate how I might approach a data set on TTC (Toronto’s public transit) subway delays, after discovering there’s some ~funky (i.e., buggy) features in the data and wanting to learn more about the data + evaluate any assumptions I had about it.
Talk: Slides, GitHub Repo
Blog post: https://sharla.party/posts/new-data-strategies/
Both the talk and the blog post were very well received, so big thanks to Nick Tierney, @elinw, @michaelquinn32, and Tony Fischetti for building and maintaining fabulous tools!