nodbi: NoSQL database connector

nodbi is a single interface to many NoSQL databases

So far we support the following databases:

nodbi is focused around working with data.frame’s as it’s a common data format in R, and df’s make it easy to then go downstream using other tools in your workflow.

Currently we have support for the following operations:

  • Create - all databases
  • Get - all databases
  • Delete - all databases
  • Update - just CouchDB

Install/load

Github: https://github.com/ropensci/nodbi
CRAN: CRAN - Package nodbi

# only source avail. right now, binaries avail. soon
install.packages("nodbi", type = "source") 
library("nodbi")

Initialize a connection

Before initializing connections to databases, make sure your database is started if it’s server based (there is the potential of supporting serverless databases, in which case there’s no server to start).

There’s a family of functions that start with src_ that you use to set your connection details for each database we support. You then pass that connection object on to any of the functions docdb_create, docdb_delete, docdb_get, docdb_update.

src_couchdb()
src_elastic()
src_etcd()
src_mongo()
src_redis()

Example

We’ll use Redis moving forward. First, initialize a connection (remember to start Redis first, e.g. on the command line redis-server):

(con <- src_redis())
#> $type
#> [1] "redis"
#> 
#> $version
#> [1] ‘1.1.0’
#> 
#> $con
#> <redis_api>
#>   Redis commands:
#>  ... cutoff

The con object contains connection details.

Now, let’s push a data.frame into Redis from R:

library("ggplot2")
ff <- docdb_create(con, "diamonds", diamonds)
out <- docdb_get(con, "diamonds")
NROW(out)
#> [1] 161820
head(out)
#> # A tibble: 6 x 10
#>   carat cut       color clarity depth table price     x     y     z
#>   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#> 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#> 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#> 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#> 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#> 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

And they’re identical! (even down to maintaining the factor classes in the diamonds data.frame; note that not all database connectors suppport maintaining column classes as well as the Redis one does)

identical(diamonds, out)
#> [1] TRUE

You can easily fold a NoSQL database into your data munging worklow. Let’s say you’re using dplyr, and you want to get dat out of Redis then munge with dplyr.

We can easily do with one line of code to get the data from Redis, then do whatever youre munging heart desires.

library("dplyr")
docdb_get(con, "diamonds") %>%
  group_by(cut) %>%
  summarise(mean_depth = mean(depth), mean_price = mean(price))
#> # A tibble: 5 x 3
#>   cut       mean_depth mean_price
#>   <ord>          <dbl>      <dbl>
#> 1 Fair            64.0      4359.
#> 2 Good            62.4      3929.
#> 3 Very Good       61.8      3982.
#> 4 Premium         61.3      4584.
#> 5 Ideal           61.7      3458.

Let us know what you think. Features? Bugs? Additional databases we should support?