I’ve been adapting some material to teach data visualisation in ggplot2 and looking to include some advice to avoid common data misrepresentation pitfalls.

I’ve been looking at the Calling b*****t visualisation blog posts, in particular, the Misleading axes on graphs post and the recommendations on when and when not to force the y-axis to include 0.

For example, in barplots, where absolute differences are of primary interest, the y-axis should almost always include 0 otherwise it would violate the principle of proportional ink to convey quantity. Indeed `ggplot2::geom_bar`

defaults to extending the y axis to 0 and does not take to the use of `+ ylim(0,...)`

.

Time-series on the other hand, in which the rate of change and trend along the x axis is of most interest and the “ink” used to represent the trend does not encode quantitative information, the y-axis need not include 0 and forcing it could indeed misrepresent features of interest in the data - when area is used instead of a line, however, proportional ink becomes important again.

Given these considerations **what are thoughts on the y-axis in box and violin plots?** `geom_box`

and `geom_violin`

default to clipping the y-axis, highlighting rightly the differences in the distributions and summaries of y data points across the x variable. But should we also be considering differences in such distributions in the context of the absolute scale of the data? ie should box and violin plots ever be plotted with a y-axis extended to include 0? If so when/why?