Embed metadata in R objects


#1

I was unable find find a package that allowed creating data frames with attached metadata, beyond the comment() function. I wanted something a little more structured and found it in S4 classes

# create the class
Mframe <- setClass("Mframe",slots = c(meta = "list", data = "data.frame"))
# instantiate it
mf <- Mframe(data = fips, meta = meta)
# show it
str(mf)

A minimal example appears on rpubs

Have I reinvented a wheel?


Creating Persistent Metadata for an R Object for Data Provenace
#2

What is your specific use case? Or do you often want to do this for lots of cases?

One thing that comes to mind is https://github.com/ropensci/EML by @cboettig and @mbjones See example here https://github.com/ropensci/EML#writing-r-data-into-eml - This is sort of what you are talking about on steroids. And uses S4 system. The Ecological in EML belies the fact that EML is not specific to ecology.

You can always add attributes to a data.frame e.g.,

attr(mtcars, "notes") <- list(a = 5, b = 7)
attr(mtcars, "notes")
$a
[1] 5

$b
[1] 7

How does this approach not fit your needs?


#3

Thanks, this is very helpful. My use case is as an analyst’s notebook, so I wanted something that would accommodate as few or as many user specified notes as required, without having to conform to a pre-determined set of categories. The built-ins you point me to, attr and mostattributes do this nicely. Because the metadata is hidden in attr, I find mostattributes works better for me:

x <- fips
attr(x, “meta”) <- meta
x
state fip id
1 Alabama 1 AL
2 Alaska 2 AK
3 Arizona 4 AZ
4 Arkansas 5 AR
5 California 6 CA
6 Colorado 8 CO
7 Connecticut 9 CT
8 Delaware 10 DE
9 District of Columbia 11 DC
10 Florida 12 FL
11 Georgia 13 GA
12 Hawaii 15 HI
13 Idaho 16 ID
14 Illinois 17 IL
15 Indiana 18 IN
16 Iowa 19 IA
17 Kansas 20 KS
18 Kentucky 21 KY
19 Louisiana 22 LA
20 Maine 23 ME
21 Maryland 24 MD
22 Massachusetts 25 MA
23 Michigan 26 MI
24 Minnesota 27 MN
25 Mississippi 28 MS
26 Missouri 29 MO
27 Montana 30 MT
28 Nebraska 31 NE
29 Nevada 32 NV
30 New Hampshire 33 NH
31 New Jersey 34 NJ
32 New Mexico 35 NM
33 New York 36 NY
34 North Carolina 37 NC
35 North Dakota 38 ND
36 Ohio 39 OH
37 Oklahoma 40 OK
38 Oregon 41 OR
39 Pennsylvania 42 PA
40 Rhode Island 44 RI
41 South Carolina 45 SC
42 South Dakota 46 SD
43 Tennessee 47 TN
44 Texas 48 TX
45 Utah 49 UT
46 Vermont 50 VT
47 Virginia 51 VA
48 Washington 53 WA
49 West Virginia 54 WV
50 Wisconsin 55 WI
51 Wyoming 56 WY

attr(x, “meta”)
$Accessed
[1] “2015-07-31”

$GitBlame
[1] “Richard Careaga”

$Contact
[1] “technocrat@twitter”

$Preprocessing
[1] “FIPS Codes for the States and District of Columbia table captured manually and converted to cvs file”

$Source
[1] “https://www.census.gov/geo/reference/ansi_statetables.html

$Repository
[1] “unassigned”

$Version
[1] “1.0”

x <- fips
mostattributes(x) <- list(meta = meta)
x
[[1]]
[1] Alabama Alaska Arizona Arkansas California Colorado Connecticut
[8] Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois
[15] Indiana Iowa Kansas Kentucky Louisiana Maine Maryland
[22] Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska
[29] Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota
[36] Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota
[43] Tennessee Texas Utah Vermont Virginia Washington West Virginia
[50] Wisconsin Wyoming
51 Levels: Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana … Wyoming

[[2]]
[1] 1 2 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 53 54 55 56

[[3]]
[1] AL AK AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY
51 Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA … WY

attr(,“meta”)
attr(,“meta”)$Accessed
[1] “2015-07-31”

attr(,“meta”)$GitBlame
[1] “Richard Careaga”

attr(,“meta”)$Contact
[1] “technocrat@twitter”

attr(,“meta”)$Preprocessing
[1] “FIPS Codes for the States and District of Columbia table captured manually and converted to cvs file”

attr(,“meta”)$Source
[1] “https://www.census.gov/geo/reference/ansi_statetables.html

attr(,“meta”)$Repository
[1] “unassigned”

attr(,“meta”)$Version
[1] “1.0”

For my purposes EML is overkill, but it seems definitely the direction to go for a publication oriented standard.

Thanks, again.


#4

Have you played with R6 classes yet? https://cran.r-project.org/web/packages/R6/vignettes/Introduction.html

Seems like this R6 might be a good fit here. Or am I wrong here @richfitz I may be off here, but e.g.,

    library(R6)

    Dframe <- R6Class("Dframe",
                      public = list(
                        df = data.frame(NULL),
                        initialize = function(df) {
                          if (!missing(df)) self$df <- df
                          self$look()
                        },
                        look = function() {
                          head(self$df)
                        },
                        add_atts = function(name, attribute) {
                          attr(self$df, name) <- attribute
                        },
                        get_atts = function(name) {
                          attr(self$df, name)
                        }
                      )
    )

    dd <- Dframe$new(df = mtcars)
    dd$look()
    dd$add_atts(name = "stuff", attribute = list(a=6, b=9))
    dd$get_atts(name = "stuff")

Anyway, you could replace with mostattributes, etc. just thinking out loud