Data only packages

Hi Carl:

Thanks very much. I was thinking of just doing a scripting package. In fact, my scripts are already on github. So, I am in complete agreement. I also just read the baad link you provided. That is completely consistent with my aims (and presented in a much more organized fashion). So, maybe just having some processing scripts is good enough.

Just to clarify my issue, my main goal was to make the process of managing and analyzing all of the data easier. I was thinking that writing a package would be easier if the data were in a known structure and a known location. Hence, the question about making a data-only package. (Or finding some other way to address the data in an unambiguous way.) NHANES is particularly troublesome with about 100 files per year, and 7 years of data. The variables can change year-to-year as well, and the file names are not perfectly consistent.

Just to answer your questions – haven is great, but does not work on SAS transport files (NHANES). For NAMCS, the raw data are fixed width files, but with SAS input statements. In both cases, there needs to be some processing of the raw data – in order to read it in properly, one has to process the SAS input file and then used that information to process the data file. For SEER, it is a combination of fixed width files and SAS input statements. (Thank goodness for the readr package for fixed width files.)