TL;DR: I am looking to collaborate in developing
ggethos
aggplot2
extension to plot ethograms, please reach out!
I often observe and record behavior and plot ethograms. Ethograms come in multiple formats (lines, points, segments). I’m not sure if this is a good place to post the idea as it is, since I’m mostly looking for collaborators who can help me put it in package form.
There is a package to plot ethograms here, but it looks like it’s designed for highly structured data.
I envision something simpler where you have a data.frame
consisting of one character
column and you can produce a full ethogram from there. The way I see this, the information to generate the ethogram is already in that character
column so, in principle, it should be the only requirement.
This is an example below
library(ggplot2)
df <- tibble::tibble(
behavior = sample(LETTERS[1:4], size = 1000,
replace = T, prob = c(0.6, 0.2, 0.1, 0.1))
)
ggplot(df, aes(x=1:nrow(df), y=behavior, group=1)) +
geom_path()
ggplot(df, aes(x=1:nrow(df), y=1, color=behavior)) +
geom_point(pch="|", size=10) +
theme_void()
There are a few issues with this approach:
- It scales badly (when using data that comes from 30+ fps cameras, the number of points to plot becomes too large).
- It needs the user to specify some things for it to “work”
I propose using geom_segment()
to better do this. We should be able to calculate an xmin
and xmax
, offload the math part to a helper function, and then just call geom_segment()
. This potentially scales much better and produces a better experience.
ethogram <- function(df, behavior_col, behavior_levels, time_col=NULL){
if (is.null(time_col)){
# we assign a fake time variable
df$time_col <- 1:nrow(df)
} else{
# resolution is provided by time_col
df <- dplyr::rename(df, time_col = {{time_col}})
}
df <-
df %>%
dplyr::mutate(behavior = factor(behavior, levels=behavior_levels)) %>%
dplyr::mutate(lg = dplyr::lag(behavior, default = "first frame"),
# check if there's continuity
flag = lg != behavior) %>%
dplyr::filter(flag) %>%
dplyr::mutate(time_end = dplyr::lead(time_col,
n = 1,
default = dplyr::last(time_col)))
return(df)
}
This function produces a behavior that is much closer to the desired output.
ethogram(df, behavior, behavior_levels = LETTERS[1:4]) %>%
ggplot() +
geom_segment(aes(x=time_col, xend=time_end, y=1, yend=1, color=behavior),
size=10) +
# we might only want to remove y axis in this case
theme_void()
ethogram(df, behavior, behavior_levels = LETTERS[1:4]) %>%
ggplot() +
geom_segment(aes(x=time_col, xend=time_end, y=behavior, yend=behavior, color=behavior),
size=10)
Ideally, this needs robust testing and it would be much better to have it encapsulated inside a ggplot extension. The reliance on ethograms and behavioral research can also be dumped (i.e., variables named “behavior”).
I have toyed around with building a few packages that talk to ggplot2
(for example nobrainr
). But this package is mostly a wrapper of ggplot. This is the first time I feel I am both out of my league in terms of making it work within the ggplot2 API and I would like to collaborate to make it work well.
I have little to no experience in the backend of ggplot2
. Reading here, it looks like I’m looking to add a stat_()
more than a geom_()
but the needs might grow if users start to require different functionality.
I am looking for people who want to collaborate translating the functions I have into something worthy of a ggplot2
extension.