rnoaa: Getting county level rain data

woodhouse123 · March 16, 2021, 4:11pm

Hi there,

Using the homr page on the noaa website, I downloaded a list of all the US weather stations. I then dropped any non active stations, any stations that don’t start with US1, and merged this data with fips code dataset. My plan is go get rain data for one day for every station and then collapse at the county level.

Here is my R code so far and the stackoverflow link

df <- dataframe$ghcnd

Grabbing necessary column

This gives me an output like:

[1] "GHCND:US1AKAB0058" "GHCND:US1AKAB0015" "GHCND:US1AKAB0021" "GHCND:US1AKAB0061"
 [5] "GHCND:US1AKAB0055" "GHCND:US1AKAB0038" "GHCND:US1AKAB0051" "GHCND:US1AKAB0052"
 [9] "GHCND:US1AKAB0060" "GHCND:US1AKAB0065" "GHCND:US1AKAB0062" "GHCND:US1AKFN0016"
[13] "GHCND:US1AKFN0018" "GHCND:US1AKFN0015" "GHCND:US1AKFN0011" "GHCND:US1AKFN0013"
[17] "GHCND:US1AKFN0030" "GHCND:US1AKJB0011" "GHCND:US1AKJB0014" "GHCND:US1AKKP0005"
[21] "GHCND:US1AKMS0011" "GHCND:US1AKMS0019" "GHCND:US1AKMS0012" "GHCND:US1AKMS0020"
[25] "GHCND:US1AKMS0018" "GHCND:US1AKMS0014" "GHCND:US1AKPW0001" "GHCND:US1AKSH0002"
[29] "GHCND:US1AKVC0006" "GHCND:US1AKWH0012" "GHCND:US1AKWP0001" "GHCND:US1AKWP0002"
[33] "GHCND:US1ALAT0014" "GHCND:US1ALAT0013" "GHCND:US1ALBW0095" "GHCND:US1ALBW0087"
[37] "GHCND:US1ALBW0020" "GHCND:US1ALBW0066" "GHCND:US1ALBW0031" "GHCND:US1ALBW0082"
[41] "GHCND:US1ALBW0099" "GHCND:US1ALBW0040" "GHCND:US1ALBW0004" "GHCND:US1ALBW0085"
[45] "GHCND:US1ALBW0009" "GHCND:US1ALBW0001" "GHCND:US1ALBW0094" "GHCND:US1ALBW0013"
[49] "GHCND:US1ALBW0079" "GHCND:US1ALBW0060"

In reality, I have about 22,000 weather stations. This is just showing the first 50.

rnoaa code

library(rnoaa)
options("noaakey" = Sys.getenv("noaakey"))
Sys.getenv("noaakey")

weather <- ncdc(datasetid = 'GHCND', stationid = df, var = 'PRCP', startdate = "2020-05-30",
                enddate = "2020-05-30", add_units = TRUE)

Which produces the following error:
Error: Request-URI Too Long (HTTP 414)

However, when I subset the df into just, say, the first 100 entries, I can’t get data for more than the first 25. However, the package details say I should be able to run 10,000 queries a day.

Loop Attempt

for (i in 1:length(df)){
  weather2<-ncdc(datasetid = 'GHCND', stationid=df1[1],var='PRCP',startdate ='2020-06-30',enddate='2020-06-30',
          add_units = TRUE)
  
}

But this just produces the warning Sorry, no data found.

If anyone could give advise on what to try next that would be great

sckott · March 16, 2021, 5:25pm

Thanks for your question @woodhouse123

The 414 error is not specific to NOAA or rnoaa, its a generic error when the URL is too long. What you have with ncdc(datasetid = 'GHCND', stationid = df, var = 'PRCP', startdate = "2020-05-30", enddate = "2020-05-30", add_units = TRUE) is a very long URL because df is a long vector. There’s not an easy way for rnoaa to help users avoid this.

This is because the default limit is 25 results for ncdc(). If you look at the documentation or print out the function

ncdc <- function(datasetid=NULL, datatypeid=NULL, stationid=NULL, locationid=NULL,
  startdate=NULL, enddate=NULL, sortfield=NULL, sortorder=NULL, 
limit=25, offset=NULL,
  token=NULL, includemetadata=TRUE, add_units=FALSE, ...)
{

Also, the loop probably wouldn’t work because df1 is not defined anywhere, unless you meant df? And I’d think you’d do df[i] if your incrementer is the letter i

woodhouse123 · March 16, 2021, 5:36pm

@sckott

I’ve just re-run it like this:

df1 <- df[1:100] ## Splitting dataframe. Too big otherwise

for (i in 1:length(df1)){
  weather<-ncdc(datasetid = 'GHCND', stationid=df1[i],var='PRCP',startdate ='2020-06-30',enddate='2020-06-30',
                add_units = TRUE)
  
}

I get a bunch of warnings for no data, and then a dataset that has only a single row. The observation is the 100th station from df1.

So maybe this is just a problem with my loop ?

woodhouse123 · March 16, 2021, 6:08pm

@sckott
Alternatively, this works and seems to be the most queries I can do at once.

df1 <- df[1:125] ## Splitting dataframe. Too big otherwise


weather <- ncdc(datasetid = 'GHCND', stationid = df1, var = 'PRCP', startdate = "2020-05-30",
                enddate = "2020-05-30", add_units = TRUE, limit = 125)

If I had a way to do this in a loop so that next I the next 125, and then the next 125, etc. Until I do all ~22,000 then that would be great…

sckott · March 16, 2021, 10:45pm

maybe something like this

# say you want to split up your station ids into chunks of 5 each, as an example
z <- split(df, ceiling(seq_along(df)/5))
out <- list()
for (i in seq_along(z)) {
  out[[i]] <- ncdc(datasetid = 'GHCND', stationid = z[[i]], var = 'PRCP', 
    startdate = "2020-05-30", enddate = "2020-05-30", 
    add_units = TRUE, limit = 125)
}

woodhouse123 · March 17, 2021, 2:06pm

@sckott Thanks so much! This is really helpful. I think I’ll be able to get something like this to work. Been making some adaptations. I will keep in touch!

woodhouse123 · March 17, 2021, 5:39pm

OK I got it working I believe!

z <- split(df, ceiling(seq_along(df)/100))
out <- list()
for (i in seq_along(z)) {
  out[[i]] <- ncdc(datasetid = 'GHCND', stationid = z[[i]], var = 'PRCP', 
                   startdate = "2020-05-30", enddate = "2020-05-30", 
                   add_units = TRUE, limit = 100)
}

This allows me to download for all 21,882 stations at once. Although I believe some of them have no data.

My output is a list of 219 elements, each has two elements.

For instance, list once has out[[1]]$meta and out[[1]]$data

What I’m interested in is combining the rows from the 219 out[[i]]$data. Would this require a for loop?

sckott · March 17, 2021, 6:11pm

probably

dplyr::bind_rows(lapply(out, "[[", "data"))

woodhouse123 · March 17, 2021, 6:15pm

Woohoo! That did it!! Thank you!!

Topic		Replies	Views
Using rnoaa to get a SPDF of NCDC normals geospatial , rnoaa	4	1449	April 28, 2017
help me help you with NOAA NCDC metadata Package Use Questions rnoaa	0	1075	August 3, 2018
rnoaa - get GHCND data for all stations in a state UseCases noaa , rnoaa , climate	2	2102	September 19, 2016
ghcn and meteo_tidy_ghcnd not pulling most recent data Package Use Questions rnoaa	9	1858	May 5, 2017
Search for US airport weather stations w/ rnoaa UseCases r , weather , rnoaa , climate	4	1597	January 23, 2018

rnoaa: Getting county level rain data

Grabbing necessary column

rnoaa code

Loop Attempt

Related topics