It would be great to get your thoughts on a new package that I am starting to develop (https://github.com/iainmwallace/DataDepository)
The idea is to make it easier for people to publish tabular datasets of interest to Google Bigquery. These datasets could be both primary datasets or datasets that were created by cleaning/standardizing/integrating one or more primary datasets before beginning an analysis. This would greatly benefit secondary users of the data.
Datasets that are stored in BigQuery are immediately available for exploring via the web UI or via a rest api. This includes combining with any other public dataset in BigQuery to create a new public dataset.
I have three datasets loaded as a proof of concept
Compound names from PubChem mapped onto InChIKeys
Compound activities from ChEMBL enhanced with InChIKeys
Count of compounds appearing in databases based on UniChem
Is this something that might be of general interest? If so, any suggestions on how best to implement it would be great