Possible data sources


Some times I run across shiny data sources and feel like I should tell y’all. So here’s a thread for that. Also it’s possible that these are already on tap, or by another name, or have been disqualifed, and I just don’t know about it yet (you folk are so productive!). Also, I should say that I don’t know that any of these sources allow scraping html, or ftp interfacing, or api, which I understand is necessary. Also, maybe some of these overlap?

Anywho, do what you will with them.

http://waterportal.geoweb.bcogc.ca/ Water resource management data for British Columbia

http://catalogue.data.gov.bc.ca/dataset/aquatic-invasive-species-of-british-columbia Aquatic Invasive Species Occurrence data for British Columbia

http://www.niiss.org/cwis438/download/Download.php?WebSiteID=1 Invasive species occurrence data for the US

http://www.emplantbase.org/home.html Plant occurrence data for western Europe/Mediterranean

http://www.texasinvasives.org/invasives_database/ Invasive species occurrence data for Texas

http://ibis.geog.ubc.ca/biodiversity/eflora/ Flora of British Columbia


And a few more…

http://www.nobanis.org/search-alien-species/ Invasive species of Europe

http://data.kew.org/cvalues/ Chromosome number and genome size for plants and algae

http://www.tropicos.org/NameSearch.aspx?projectid=9 Chromosome number for plants

http://treeofsex.org/ Mating system information for plants and animals

http://www.bonap.org/ North American flora


i think this one you are particularly interested in?


That one is pretty great. I’ve probably mentioned it before.


List of thoughts on each:

This comment applies to many (shorthand: ask/scrape) This seems like it could be scraped, but worth asking if they provide data dumps too

@kgturner :smile:


[quote=“sckott, post:5, topic:190”]
http://waterportal.geoweb.bcogc.ca/ Water resource management data for British Columbia - hmmm, @andy_teucher any idea how to get to this data more easily programatically?[/quote]

  • Hmmm I don’t think so - the data presented on this site (which is not actually government) is aggregated from various govt agencies (federal and provincial) and they probably don’t have the licensing rights to redistribute the raw data. Some of it might be available elsewhere though - I’ll have a look.
  • If you click on the name of the data set under the “Data and Resources” heading you get taken to a form - which you have to fill in, then you’ll get an email with a url to download a zip file. A lot of the BC gov data is like this because it does an extract from the corporate database for each data request… Doing it programmatically would be hard


Thanks @andy_teucher !


So great that y’all already have packages for some of those!!! (I’m super sad that I didn’t know about those like 6 months ago.)

Re this:

http://ibis.geog.ubc.ca/biodiversity/eflora/ Flora of British Columbia - since this is at UBC, maybe you can kindly ask if they can provide dumps of their data behind each species’ page?

I can email someone, only I’m not confident on the lingo. If I literally say “dumps of your data” are they going to know what I’m talking about? Maybe I will cc you, @sckott?


Yeah, do cc me. Yeah, data dumps is general enough they should understand.


Answering on this “old” thread: would http://api.nightlights.io/ be an interesting data source for a package? “The India Lights API shows light output at night for 20 years, from 1993 to 2013, for 600,000 villages across India.”


@maelle thanks for the suggestion. Do you use it for anything? Curious what the use cases are


In our project here about air pollution in India, the night-time light intensity can be used as a proxy for “urbanisation” when doing e.g. land-use regression.

And here the development seed people wrote a story about Diwali http://india.nightlights.io/

But I don’t know how many people would really use it. :slightly_smiling:


Nice, thanks for sharing those use cases!