ggethos: gauging interest in ggplot extension to plot ethograms and potential collaborators

TL;DR: I am looking to collaborate in developing ggethos a ggplot2 extension to plot ethograms, please reach out!

I often observe and record behavior and plot ethograms. Ethograms come in multiple formats (lines, points, segments). I’m not sure if this is a good place to post the idea as it is, since I’m mostly looking for collaborators who can help me put it in package form.

There is a package to plot ethograms here, but it looks like it’s designed for highly structured data.

I envision something simpler where you have a data.frame consisting of one character column and you can produce a full ethogram from there. The way I see this, the information to generate the ethogram is already in that character column so, in principle, it should be the only requirement.

This is an example below

library(ggplot2)
df <- tibble::tibble(
  behavior = sample(LETTERS[1:4], size = 1000, 
                    replace = T, prob = c(0.6, 0.2, 0.1, 0.1))
)
ggplot(df, aes(x=1:nrow(df), y=behavior, group=1)) + 
  geom_path()

ggplot(df, aes(x=1:nrow(df), y=1, color=behavior)) +
  geom_point(pch="|", size=10) + 
  theme_void()

There are a few issues with this approach:

  1. It scales badly (when using data that comes from 30+ fps cameras, the number of points to plot becomes too large).
  2. It needs the user to specify some things for it to “work”

I propose using geom_segment() to better do this. We should be able to calculate an xmin and xmax, offload the math part to a helper function, and then just call geom_segment(). This potentially scales much better and produces a better experience.

ethogram <- function(df, behavior_col, behavior_levels, time_col=NULL){
  if (is.null(time_col)){
    # we assign a fake time variable
    df$time_col <- 1:nrow(df)
  } else{
    # resolution is provided by time_col
    df <- dplyr::rename(df, time_col = {{time_col}})
  }
  df <- 
  df %>% 
    dplyr::mutate(behavior = factor(behavior, levels=behavior_levels)) %>% 
    dplyr::mutate(lg = dplyr::lag(behavior, default = "first frame"),
           # check if there's continuity
           flag = lg != behavior) %>% 
    dplyr::filter(flag) %>% 
    dplyr::mutate(time_end = dplyr::lead(time_col,
                                         n = 1,
                                         default = dplyr::last(time_col)))
  return(df)
}

This function produces a behavior that is much closer to the desired output.

ethogram(df, behavior, behavior_levels = LETTERS[1:4]) %>% 
  ggplot() + 
  geom_segment(aes(x=time_col, xend=time_end, y=1, yend=1, color=behavior),
               size=10) +
  # we might only want to remove y axis in this case
  theme_void()

ethogram(df, behavior, behavior_levels = LETTERS[1:4]) %>% 
  ggplot() + 
  geom_segment(aes(x=time_col, xend=time_end, y=behavior, yend=behavior, color=behavior),
               size=10) 

Ideally, this needs robust testing and it would be much better to have it encapsulated inside a ggplot extension. The reliance on ethograms and behavioral research can also be dumped (i.e., variables named “behavior”).

I have toyed around with building a few packages that talk to ggplot2 (for example nobrainr). But this package is mostly a wrapper of ggplot. This is the first time I feel I am both out of my league in terms of making it work within the ggplot2 API and I would like to collaborate to make it work well.

I have little to no experience in the backend of ggplot2. Reading here, it looks like I’m looking to add a stat_() more than a geom_() but the needs might grow if users start to require different functionality.
I am looking for people who want to collaborate translating the functions I have into something worthy of a ggplot2 extension.