Thanks, the package currently is just two helper functions to get data frame into a table in bigquery.
The first function (data2gcs) will split a data frame into many small zipped json files that are uploaded to a google storage folder. The second function (loadJsonFiles2BigQuery) will load specified json files from google storage into a specific table. I want to add a third helper function that will let a user assign meta data to the table such as description and column descriptions.
Regarding costs, there is no fee for the server it is hosted on, rather there is a small fee for storing data (10Gb free, $0.02 for each additional Gb - i.e. 1TB for $20 per month) and a fee for querying the data (1Tb free, $5 per additional TB).
So there is a small amount available for free (in addition to the $300 sign up bonus). If you store a 10Gb table, you could run 100 select * queries per month as each query would process the entire table. Reducing the number of columns you are querying/returning reduces the amount of data processed. Additionally, it is free to view the preview of the table.
This query, to return all compounds that have been tested in clinical trials from Chembl cost 14mb
where md_max_phase >0
Hope that helps explain it, but let me know if anything isn’t clear.
Ideally, the package would make it easy for scientists to upload their own datasets in such a way that it was easy for others to find and re-use.