I’ve been adapting some material to teach data visualisation in ggplot2 and looking to include some advice to avoid common data misrepresentation pitfalls.
For example, in barplots, where absolute differences are of primary interest, the y-axis should almost always include 0 otherwise it would violate the principle of proportional ink to convey quantity. Indeed
ggplot2::geom_bar defaults to extending the y axis to 0 and does not take to the use of
Time-series on the other hand, in which the rate of change and trend along the x axis is of most interest and the “ink” used to represent the trend does not encode quantitative information, the y-axis need not include 0 and forcing it could indeed misrepresent features of interest in the data - when area is used instead of a line, however, proportional ink becomes important again.
Given these considerations what are thoughts on the y-axis in box and violin plots?
geom_violin default to clipping the y-axis, highlighting rightly the differences in the distributions and summaries of y data points across the x variable. But should we also be considering differences in such distributions in the context of the absolute scale of the data? ie should box and violin plots ever be plotted with a y-axis extended to include 0? If so when/why?