Code
# install.packages("pak") #IF NOT ALREADY INSTALLED
pak::pak("MAISRC/ggplotplus")To use the latest version of ggplotplus, you’ll need to install it from GitHub using the pak package (you only need to do this once for each release of the package):
# install.packages("pak") #IF NOT ALREADY INSTALLED
pak::pak("MAISRC/ggplotplus")Alternatively, you can get the most recent stable version from CRAN via install.packages():
install.packages("ggplotplus")Then, either way, load it alongside ggplot2:
# install.packages("ggplot2") #IF NOT ALREADY INSTALLED
library(ggplot2)
library(ggplotplus)This guide introduces the ggplotplus package, a collection of tools developed by me, Dr. Alex Bajcz, Quantitative Ecologist at the Minnesota Aquatic Invasive Species Research Center (MAISRC) at the University of Minnesota.
The tools in this package are meant to provide an opinionated, Universal-Design-oriented update to ggplot2’s existing tools and defaults, with the goal helping less-experienced and time-strapped ggplot2 users to make more accessible graphs more quickly and painlessly.
This guide offers an overview of how the tools in this package are intended to be used and is especially aimed at users who are relatively new to ggplot2 and relatively unfamiliar with its inner workings.
In a nutshell, ggplotplus’s tools overhaul the default design state of ggplot2’s graphics. While a typical one-line ggplot() call yields a functional graphic, it’ll often fall short of modern best practices in data visualization and graph design, especially with respect to universal accessibility. As such, many simple ggplot() calls do not immediately yield graphs fit for publication or sharing with a wide, global audience.
Of course, experienced users of ggplot2 can use its many powerful tools and toggles, like those in the theme(), scale_*(), and geom_*() family of functions, to redesign and customize their graphs to meet even the highest standard! But doing so requires hard-won knowledge of how these tools work and can be tedious to implement even when you have the knowledge.
Plus, even if one has the technical know-how, not everyone is familiar enough with the most-current tenets of data visualization to know what to change and what to change it to. The ggplotplus package is designed to help users of all experience levels start from a more accessible baseline without sacrificing ggplot2’s capacity for individuality, experimentation, and customization.
Accessibility is really the primary motivation for ggplotplus. Many of its changes to the default settings of ggplots are intended to improve compliance with modern accessibility standards; this guide attempts to highlight these improvements as a means of educating the reader on some of the finer points of modern accessible design, in case they are curious and motivated to learn more!
To be clear: ggplotplus is not a replacement for making careful design choices, personal artistry, solicitation of feedback, or understanding your audience’s needs and capabilities. Instead, it’s meant to be a new (and at least arguably better) starting point for your graphic design process. While the design choices baked into its tools ultimately reflect my professional judgments, they’re grounded in a thorough review of the last several decades of the data visualization literature, so they don’t only primarily reflect his personal opinions about good graph design (though they also do!). Feel free to disagree with some or even most of its opinions; it’s designed to respect all your preferences, just as ggplot2 would.
The ggplotplus package is guided by three core principles:
Better Defaults:
Many default settings in ggplot2 are functional but sub-optimal in terms of design and accessibility. ggplotplus attempts to improve upon these by, among other things, adjusting colors, shapes, gridlines, spacing, text, etc. to yield cleaner, more accessible graphs at the outset with less fiddling with toggles.
Customization:
Almost all opinions effected by ggplotplus’s tools can be overridden by the user, either through standard ggplot2 syntax or using _plus function variants and their optional inputs. You’re rarely (if ever) stuck with any of the opinions baked into these tools (even though they do have these opinions for a reason!).
Modularity:
Each tool in ggplotplus is designed to be additive and used in concert, as appropriate. Sure, you can use just one (e.g., theme_plus()), but the intention is to call every relevant tool to achieve the full effect.
The package has one primary function and then several more that are most applicable only in certain situations (some of which are common, others of which are rarer):
| Function | Purpose |
|---|---|
theme_plus() |
An opinionated version of ggplot2’s base theme_gray theme, with changes to the settings for size, spacing, legend placement, geom-specific defaults, and much more. This is the most critical function in the package and should be used in nearly every ggplot call. |
yaxis_title_plus() |
A function for relocating the y-axis title from its normal, less accessible location and orientation. Not often desirable in faceted plots, but often desirable in all other circumstances. |
scale_continuous_plus() |
Drop-in replacement for ggplot2’s scale_x/y/color/fill_continuous() functions. It attempts to ensure axis breakpoints and limits are set to ensure that the entire range of your graphed data are labeled. Useful for every continuous scale in your graph. |
geom_point_plus() |
An alternative version of ggplot2’s geom_point() function. It introduces access to a new palette of shapes designed to be more readily differentiated. Useful when point shape is being mapped to a discrete variable, especially one with more than ~3 levels. |
gridlines_plus() |
Adjusts the appearance of gridlines so they are as subtle and as infrequent as possible. Appropriate in the (relatively uncommon) instances where gridlines are useful to an end user relative to the cognitive load they add. |
direct_labels_plus() |
Allows for directly labeling groups within the plotting area in lieu of clarifying group membership via faceting or a legend for increased density of information. |
Each of these functions is discussed in more depth in the sections that follow, along with examples.
ggplot for ComparisonTo introduce you to the tools in this package, let’s start by creating a simple, one-line ggplot2 graph as a reference point. The code below generates a scatterplot of petal length vs. sepal length for iris flowers from three species. These data come from the classic iris data set, automatically included with every installation of R. Points in the scatterplot are colored according to species:
ggplot(data = iris, #<--THE DATA SET
mapping = aes(x = Petal.Length, #<--MAPPING OUR AESTHETICS. WE'RE SAYING "*THIS* VARIABLE IN THE DATA SET SHOULD USE *THAT* VISUAL CHANNEL IN THE GRAPH". HERE, WE'VE MAPPED PETAL LENGTH TO HORIZONTAL (X) POSITION AND SEPAL LENGTH TO VERTICAL (Y) POSITION.
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) #<--WE ALSO MAP SPECIES TO THE VISUAL CHANNEL OF COLOR, BUT ONLY IN THIS ONE LAYER OF THE PLOT (THE ONLY LAYER, AT THIS POINT).This graph is simple and reasonably effective—it’s perfectly suitable for exploration and informal sharing!
…However, for presentation or publication, it falls short (at least, in my professional opinion! YMMV) in several important respects. Some of these include:
Text size: The default text size for axis and legend text is too small for easy reading, especially for those with a notable amount of visual impairment (something that tends to rapidly advance with age, and the world’s population is aging). A common rule of thumb is to size text such that it feels almost too large to help ensure it’s large enough for someone with visual impairment. Said differently, one should err on the side of sizing important elements as big as they can get away with before doing so leads to other design problems.
Point size: The points are also quite small, which could hinder readability. The same mindset applies here—data elements should be as large and conspicuous as can be afforded.
Color palette: ggplot2’s default “rainbow” color palette lacks variance in luminance (how “bright” or “dark” the color is, aka how close to white/black it is). For a person with colorblindness (especially red-green colorblindness), the red and green shades are very difficult to distinguish. For those with no color vision at all, or for those using black-and-white viewing technologies (both of which are surprisingly common situations), all three colors used would be virtually indistinguishable.
Foreground/background contrast: While contrast (the ease with which nearby elements can be readily distinguished from one another) between the points and the background here is ok, it could be improved by using darker points and/or a lighter plot background. As noted above, it’d also help to have different color shades used for the points such that they contrasted better with each other. High contrast is critical for accessible and quickly interpretable graphs.
Whitespace and layout: Default ggplots can suffer from cramped layouts in some respects. When disparate elements (e.g., data points and axis labels) are very close together, it can be slower and more cognitively taxing to visually and mentally separate them. In particular, spacing in ggplots can often be tight between:
Axis titles and their corresponding axis labels.
Items within the legend(s).
Axis labels on densely labeled axes (not as much so here, but it’s a common issue)
Adding more void space (space not housing an element) between components helps readers visually parse and process them easier and faster, especially those with significant cognitive or visual impairment.
Axis readability: In ggplot2, axis and legend titles and labels default to the column names/values found in the data set unless the user sets custom text for them. These default title/label strings are often “computer-y” (e.g., they lack spaces, have unusual capitalization, and contain punctuation instead of spaces, etc.) rather than human-readable. Axis titles and labels should have standard spelling, spacing, and punctuation, and they should be human-interpretable. They should also contain units, if applicable, and avoid the use of obtuse abbreviations when possible.
Gridlines: The research on the merits of gridlines is, admittedly, mixed! They:
Can lack contrast with the background and with nearby data elements.
Increase cognitive load (the amount of mental effort needed to integrate the available information to form a conclusion) by adding visual information to the plotting area that must be parsed and either integrated or (more likely) dismissed. When they’re not needed (and, arguably, they very often aren’t!), they constitute visual clutter that slows down interpretation.
Are unhelpful when rendered for a categorical variable (the default behavior in ggplot2, though not applicable to this graph).
On the flip side, gridlines are familiar to novice readers and can thus serve as a “foothold” for them to begin to understand a graph. They can also help readers estimate exact values when this is required (though, arguably, it very often isn’t!).
By and large, though, most data visualization experts agree that graphs are not the most efficient device for conveying exact values. Text, tables, and raw-data-file sharing are all superior! The adage to remember is that graphs ought to primarily be about “vibes!”
So, gridlines are often unnecessary. However, when a designer does deem them to be valuable, research has shown they can and should be as faint and as infrequent as possible to minimize their costs relative to their benefits. This advice does not generally correspond well to ggplot2’s default behaviors.
Incomplete axis labeling: Axes should generally include tick marks and labels at both ends of the data range. This aligns with reader expectations (especially those of novice readers) and aids in efficient comprehension of the data. Axes are essentially “number lines,” and so should be completely graduated by labels, just like number lines are. On this graph, though, both axes are missing one or more labels near the upper/lower limits of the data, which makes these axes feel visually “unfinished,” as though the software just “forgot” to label all the way up to the ends!
Overplotting: When data elements partially or wholly overlap with one another (a very common problem in point-based graphs like scatterplots), it becomes hard to judge how many elements are present in a location. This limits the reader’s ability to gauge data density. This is a hard data-viz problem to fix, but ggplot2’s default settings do not attempt to address it at all; it simply stacks any number of points on top of one another invisibly.
Vertically oriented y-axis title: Research has consistently shown that text rotated away from horizontal (0 degrees) is harder to read for virtually every human on Earth—but especially for dyslexics, non-native readers of the language, users with motor or visual impairment, and those with certain technological limitations, among others. Despite this, the y-axis title is almost always rotated 90 degrees from horizontal in scientific graphs (not just in ggplots). This also means that it’s printed in a relatively unprominent location (in the left margin), so it’s not one of the first elements a reader is likely to encounter. Since many graphs are “about” the y-axis variable, the y axis title deserves a more prominent location so readers encounter it sooner.
Tick marks: The tick marks on the axes are small and easily lost visually, though they can provide many of the same benefits of gridlines with much less cognitive load (they are the “gradations” on the “number lines”). These could be more prominent
Missing axis lines: To better visually represent “number lines,” and to also better demarcate the boundary between the plotting area and the axes areas, most graphs feature axis lines. However, ggplot2’s graphs tend to lack such lines by default.
Legend placement: ggplot2’s default legend location (in the right-hand margin) is often space-inefficient, creating needlessly large voids above/below it. Plus, it tends to be encountered by readers relatively late when placed in this location. Data visualization best practices are to either integrate a legend’s information into the plotting area directly (e.g., through direct labeling of lines) or else to place the legend somewhere both more prominent and more compact.
This list isn’t exhaustive, but it illustrates how even a simple ggplot could require many adjustments to meet accessibility, clarity, and design standards. It has been my personal experience that going from a base ggplot to a publishable one often requires several hundreds of lines of code-based adjustments! Not everyone has the time, patience, or wherewithal to do that much fiddling for every graph they make: hence, ggplotplus. It bundles together many of those hundreds of lines of code so that we can all just start much closer to where most of us would want to finish!
With many of the design challenges outlined above in mind, ggplotplus introduces the theme_plus() function, its core tool. A corollary to ggplot2’s theme() function, theme_plus() is designed to work identically. More experienced ggplot2 users will know theme() to be one of ggplot2’s workhorse functions; it exposes many hundreds of toggles for adjusting a graph’s appearance, and it’s one of the functions perhaps most responsible for ggplot2’s not inconsiderable learning curve.
The intention of theme_plus() is to rewrite many dozens of the “presets” inside of ggplot2’s default theme() so that you don’t have to! Just tack it on to any ggplot call, even with no inputs, to access its benefits:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus() #<--...BUT WITH ggplotplus'S theme_plus FUNCTION TACKED ON.Hopefully, you can already see how big a difference this one function can make to the design of a graph! Let’s discuss some of these design improvements in greater detail:
Larger text throughout for better readability.
Increased spacing between key elements (like legend keys within the legend and between axis titles and their corresponding labels).
Thicker and longer axis ticks for greater visibility and easier distinguishing between them and data elements and between them and axis lines.
Remove gridlines by default (see later section on gridlines_plus() for how to selectively reintroduce them).
White background for highest contrast against most standard foreground elements (well, technically, it’s a very subtle, warm off-white, since true white can increase eyestrain with prolonged viewing).
Black axis lines at the bottom and left for visual anchoring (but no top or right borders, which typically communicate no information and thus are extraneous).
Legend moved above the plot as a horizontal stripe, which is more space-efficient and increases the likelihood it will be encountered early by readers.
All text rendered in black, rather than the default dark gray for some elements, to maximize contrast with the light background.
Of course, everything is still customizable. If you don’t love a particular choice—say, the thickness of the axis lines or the legend’s location—you can adjust them, same as you normally would using theme(), except that you can pass the overrides straight into theme_plus() instead:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(axis.line = element_line(linewidth = 0.75)) #<--WE CAN MAKE THE AXIS LINES A LITTLE THINNER IF WE WANT. Here’s how you’d relocate the legend to its standard ggplot2 position. The theme will automatically apply some enhanced default styling specific to this position instead:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(legend_pos = "right")Like the rest of the ggplotplus toolkit, theme_plus() is meant to be a smart starting point, not a “final destination.” Its outputs won’t be ideal for every plot, person, or context (that’s not even remotely possible!), but it’s designed to save you time by making it so there are fewer adjustments you need to make to get your graphs to where they ultimately need to be: fully accessible for the entire audience you want!
As previously noted, the default color palette used by ggplot2 is not particularly accessible. Fortunately, the viridis package—bundled automatically with every installation of ggplot2—includes several palettes that were specifically designed for near-universal accessibility!
The way they achieve this is by featuring continuous variance in not just in hue (i.e., the base “color,” such as “red,” “green,” or “blue,” so what is conventionally thought about as “color”) but also in luminance (light vs. dark, aka “white” vs. “black”). This multidimensional variance makes the colors in these palettes more universally distinguishable from one another for a wide range of people in a wide range of circumstances.
Thanks to enhancements made to ggplot2::theme() as of the package’s 4.0.0 Version, released September 11, 2025, one can now set default color palettes within a theme (this used to require using R’s global options and was relatively inflexible). For example, one can do something like:
###CODE TO BE ADDED TO A GGPLOT CALL TO SET A DEFAULT COLOR PALETTE
... +
ggplot2::theme(
palette.colour.discrete = disc,
palette.fill.discrete = disc,
palette.colour.continuous = cont,
palette.fill.continuous = cont
)
### WHERE disc AND cont ARE SETS OF COLORS PULLED FROM A SPECIFIC PALETTE, SUCH AS VIRIDIS. This allows theme_plus() to set more accessible default color palettes while still allowing a user to easily override these defaults with their own preferences, either by replacing one of the palette.*.* inputs above within theme_plus() or by adding a scale_*_*() call to their plot as they normally might in order to specify desired colors.
theme_plus() uses the titular viridis palette for discrete variables and the cividis palette for continuous variables. By default, viridis ranges from dark purple to light yellow, passing through blue, green, and teal. This relatively large number of hues makes it relatively easy to produce several discrete hues that are visually contrasting with each other as well as with whatever background color you use.
However, it excludes, by default, the lightest yellow hues from the viridis palette because these tend to lack contrast with other light-colored elements as well as with backgrounds in ggplots, when tend to be white. As a result, the remaining palette includes just the purple/blue/teal/green region. However, you can easily adjust this using the function’s begin_discrete and end_discrete input parameters, which take values ranging between 0 and 1:
begin_discrete = 0.28 would disable the dark purple portion of the color range, such as when you have many other dark elements or a dark background.
end_discrete = 1 would enable the light yellow portion, such as when you don’t have other light elements or a light background.
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(begin_discrete = 0.28,#<--DISALLOW DARK PURPLE REGION.
end_discrete = 1) #<--ALLOW YELLOW REGION.In this example, I’ve disabled the purple end of the palette and I’ve enabled the yellow region, which means I have some points that now lack contrast with the background. But no worries–that’s something I can fix!
To see the default color palette for a continuous variable, let’s temporarily map color to a numeric variable instead:
#A BASE GGPLOT2 GRAPH WITHOUT theme_plus() ON FOR REFERENCE
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Petal.Width)) #<--MAP COLOR TO A CONTINUOUS VARIABLE.The default ggplot2 palette for continuous variables is “blues,” which is a reasonable palette in that it does vary in luminance, making it more broadly accessible. However, it doesn’t vary in hue, which can make nearby values harder to distinguish. As such, theme_plus() changes the palette to cividis, which has three distinct hues: blue, gray, and yellow:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Petal.Width)) +
theme_plus()Because humans like to “chunk” data into discrete groups according to hue, even in circumstances where the underlying phenomenon is continuous and nothing “special” is happening at the points of hue transition, this palette’s low number of hues results in less “false binning” of the data than tends to happen when using palettes with more distinct hues (such as viridis). After all, blue, gray, and yellow points here are quite different, so the hue differences many will perceive map accurately to differences in the underlying data, and even relatively similar values are more easily distinguished from one another.
Another major feature of ggplot2’s 4.0.0 update was to allow users to set many geometry-layer-specific design settings within theme(). These used to have to be set in each individual geom_*() layer separately. For example, a user can now add a theme() call of the following form to their graphs:
###EXAMPLE ADJUSTMENT OF GEOMETRY-LAYER-SPECIFIC DESIGN SETTINGS FROM WITHIN THEME.
... +
ggplot2::theme(geom = ggplot2::element_geom(pointsize = 5)) #SET THE DEFAULT POINT SIZE FOR ALL GEOMS USING POINTS TO 5.This functionality allows theme_plus() to replace many of the default settings of many commonly used geometry layers automatically.
For example, in the above graph, one can see that theme_plus() has made:
The points are larger and thus easier to parse.
The shape of these points hollow circles (by changing the default shape from R’s shape 16 to shape 21).
The latter change is important because shape 16 only supports the color aesthetic, whereas shape 21 supports both color (for the stroke/outline) and fill (for the interior). This affords more flexibility in design choices.
Specifically, by default, theme_plus() set the interior color for points to "transparent", resulting in hollow circles. This allows partial overlaps between points to be more readily distinguishable—though, of course, total overlap would still not discernible.
There is something I can do with this design to address even “extreme overplotting,” i.e., when points would overlap fully. It first requires me to switch from mapping species to color instead of to fill. This will cause theme_plus() to pivot to making the strokes of the points black:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species)) + #<--CHANGING TO MAPPING SPECIES TO FILL.
theme_plus(begin_discrete = 0.28,
end_discrete = 1) Notice how this change fixes our earlier issue of our yellow points lacking contrast against the light background. This happens because the fill colors are now surrounded by black outlines, so our eyes will compare the fill color to the black (the most proximal other color) rather than to the background. A neat trick, based on how the science of vision works!
Even though dark purple wouldn’t have sufficient contrast against a black outline, that’s not actually a problem because the point strokes aren’t encoding information. As a result, if I wanted to make the colors being used even more distinct, I could restore the full range of the viridis palette:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species)) +
theme_plus(begin_discrete = 0, #<--EXCEPT BACK TO 0 TO INCLUDE PURPLE
end_discrete = 1) Purple points still look purple, whether I can distinguish their black outlines from their fill colors or not. Meanwhile, all points are easily distinguished from the background, so I can utilize the entire color space available in the palette!
Ok! Back to the issue of extreme overplotting here. There are two broadly effective options for addressing overplotting in graphs like this one. The first, as you’ve already seen, is to pick a shape with separate color and fill aesthetics, then map color to the stroke and set the fill to "transparent" so that partial overlap between the points is evident.
However, another, more aggressive option is to instead map stroke to a color that contrasts against the background well (such as black for a white background), map fill color to your variable of interest, and then make the points semi-transparent so that points stacked on top of one another “blend” into one another chromatically. This is easier to see than to read:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) + #<--ADD IN STRONG SEMI-TRANSPARENCY VIA ALPHA.
theme_plus(begin_discrete = 0,
end_discrete = 1) At alpha = 0.3, individual points maintain sufficient contrast with the background, but points in dense regions of the data space layer on top of each other, resulting in darker (or even blended) colors. This makes it possible for a reader to understand not just the position and group identity of points but their density as well.
Note: This approach also dims the black point strokes, so contrast could still be a concern for low-vision readers with the yellow fill colors. Additionally, variance in luminance will be reduced by using transparency, which can make the graph harder to view in grayscale. Color (and overplotting) in graphs is challenging to navigate! When overplotting is a serious concern, consider whether a scatterplot is the best way to represent your data, and whether you need to plot all your data to achieve your goals.
Did you notice: In the previous graph, even though I adjusted the transparency of the points using alpha, the legend key symbols retained full opacity? That’s another ggplotplus default design opinion: legend keys should always be readable!
When points are made smaller (size is adjusted downward) and/or points are made semi-transparent (alpha is adjusted downward), ggplot2’s default behavior is to apply these same changes to the legend key symbols as well. However, this might make those symbols harder to read. For example:
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3,
size = 2) + #<--MAKE POINTS MUCH SMALLER
theme_plus(begin_discrete = 0,
end_discrete = 1,
override_legend_alphasize = FALSE #<-DISABLE LEGEND KEY OPINIONS
) In this graph, I’ve reduced the size and opacity of the points, and now it’s pretty difficult to differentiate the legend keys.
Now, there might be reasons one would want points within the plotting area to be smaller and/or more transparent! But unless variation in point size/transparency conveys information about differences (e.g., these aesthetics aren’t constants, i.e., you’ve mapped them to variable data), it’s often undesirable from a readability standpoint to apply these same traits to the legend key symbols too.
So, ggplotplus will coerce legend key symbols back to full opacity and a reasonable size (5) unless you:
Set override_legend_alphasize to FALSE, as I did above (this can also be done via gridlines_plus() and yaxis_title_plus() as well!).
You set guide = 'none' inside a scale_* function or guides().
You specify your own legend overrides for size and/or alpha via guide_legend(override.aes = list(...)).
If you map just one of the two aesthetics, the overrides will automatically apply to only the other. For example:
###PLOTTING WINDOW NOW RESCALED TO A WIDTH OF 7 INCHES X 5.26 INCHES
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species,
size = Petal.Width), #<-MAP SIZE TO A COLUMN IN OUR DATA SET.
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) Here, I set size to vary as a function of a variable, so variation in point size now conveys meaning. Our point size legend (top) respects that. However, our point alpha adjustment is ignored in both legends, and the fill legend (bottom) continues to use a large, constant size.
As a quick aside, I’ll note that ggplot2, by default, already maps point size to area, not radius (see this part of the ggplot2 book). This is important because, as radius increases, area increases faster, and humans perceive differences in point area more strongly than we perceive differences in point radius. So, it’s worth applauding that this default behavior is already preventing a potential perceptional distortion.
ggplot2 (a Review)In ggplot2, you can partially control the visual appearance of your graph using aesthetics, which connect visual channels (like position, color, size, or fill) to variables in your data set or set them to constant values.
There are three (partially overlapping) ways to assign aesthetics:
Global mapping: Inside aes() in the ggplot() call. These mappings apply to all layers (or “geoms”).
Local mapping: Inside aes() within a specific geom_*() call. These only apply to the layer they are included within.
Constants: Set inside or outside aes() within a geom or inside of aes() within ggplot(). This will fix the aesthetic to a single value, e.g., setting all colors to “red,” that will override any conflicting mapping.
This system gives you flexibility, but it can be confusing—especially for new users. Let’s clarify with an example that uses two geoms: a boxplot overlaid with jittered points (jittering randomly varies a point’s location data):
#NEW PLOT
ggplot(iris,
mapping = aes(x = Species, #<--MAKE THIS SPECIES, A DISCRETE VARIABLE
y = Sepal.Length)) +
geom_boxplot() + #<--A NEW GEOM = BOXPLOT
geom_jitter(#<--SAME AS GEOM_POINT EXCEPT JITTERED.
mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus()Here’s what’s happening:
Global mappings of x = Species and y = Sepal.Length apply to both the boxplot and the jittered points layers.
The fill aesthetic is mapped to Species, but only locally within the jittered points layer, so the boxplots are unaffected and don’t get species-specific fill colors.
The alpha aesthetic is set to a constant (0.3), again only locally for the jittered points. This means all points get this value, but the boxplots are unaffected.
If you map the same aesthetic in multiple places (e.g., globally and locally), the local mapping always wins—it overrides any global specification for that layer. Also, if you try to map an aesthetic to both a variable and a constant, the constant (usually) wins.
I bring all this up just to say that all of the same rules apply when using ggplotplus. You can:
Map aesthetics globally or locally.
Map aesthetics to variables in your data set or set them to constants.
Combine the approaches as needed–conflicts will resolve as they normally do in ggplot2.
Here’s a similar plot with a few tweaks to illustrate this further:
ggplot(iris,
mapping = aes(x = Species,
y = Sepal.Length,
fill = Species, #<-- WE CAN SWITCH TO MAPPING FILL GLOBALLY SO IT SHOULD APPLY TO BOTH GEOMS.
color = Petal.Length)) + #<--WE ALSO MAP COLOR
geom_boxplot(color = "blue") + #<--WE LOCALLY OVERRIDE COLOR, SO IT'LL BE BLUE FOR ALL BOXES INSTEAD OF LINKED TO PETAL LENGTH
geom_jitter() + #<--SIMPLIFY HERE.
theme_plus()In this version:
Fill is now applied to both geoms, since it’s being mapped globally inside of ggplot().
Color is also mapped globally but overridden locally in the boxplot layer (so boxplot outlines are blue, regardless of Petal.Length).
Most common ggplot2 geoms have defaults adjusted by theme_plus(), though not all. To see a list of those receiving specific adjustments, run this command:
sort(names(geom_plus_defaults)) [1] "abline" "area" "bar" "boxplot" "col"
[6] "count" "crossbar" "curve" "density" "dotplot"
[11] "errorbar" "freqpoly" "histogram" "hline" "jitter"
[16] "line" "linerange" "point" "point_plus" "pointrange"
[21] "ribbon" "segment" "smooth" "tile" "violin"
[26] "vline"
Returning to our earlier scatterplot:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1)There’s still room for improvement in its design, especially with respect to the axis titles and labels. A key first step in redesigning these would be to make the axis titles more human-readable, intuitive, and complete, including by specifying units. You can do this using the scale_*_*() family of functions in ggplot2:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_x_continuous(name = "Petal length (cm)") #<--STANDARD WAY IN GGPLOT2 TO CHANGE THE X AXIS TITLE FOR A NUMERIC VARIABLE.This is better! But the x-axis is still lacking labels near the endpoints of the data. There are a bunch of points with petal lengths < 2 (the last label), e.g. This is a common issue with ggplot2’s default breaks-determining process. How close are the purple points to 0? The yellow points to 8? It’s hard to tell without anchors on both ends. You can fix this using scale_x_continuous_plus(), which automatically adjusts breaks and limits to ensure label coverage near the ends of the scale:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",#<--SWITCH TO THE ggplotplus VARIANT, AND SPECIFY THE SCALE.
name = "Petal length (cm)") The axis is now expanded to start with a break at 1 and to end with a break at 7, with breaks in between still chosen to be regular and “pretty,” just as in ggplot2.
As with the base scale_*_*() functions, you can pass arguments like name, expand, or labels to scale_continuous_plus(). However, you cannot manually set breaks or limits, since those are determined internally by the function.
If the resulting labels feel a little too frequent, scale_continuous_plus() contain a trick: You can set thin.labels to TRUE to convert every other label to an empty string ("") while retaining the tick marks:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) #<--WILL CONVERT EVERY OTHER LABEL TO AN EMPTY STRING.This reduces cognitive load and increases void space while preserving the visual scaffolding the ticks provide. Optional, but a nice touch!
Applying this same logic to the y-axis is as easy as adding another scale_continuous_plus() call:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y", #<--ANOTHER FOR Y.
name = "Sepal length (cm)") Here, the y-axis limits now extend down enough to add a break at 4, making the axis appear more complete and aiding interpretation.
scale can also be set to fill and color for when these are mapped to continuous variables. At the same time, there are a few considerations theme_plus() makes that apply only to color bars, which are the legend components you see when you map color or fill to a continuous variable. Let’s map a continuous variable to fill to get such a legend:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width), #<--SWAP TO A CONTINUOUS VARIABLE.
alpha = 0.3) +
theme_plus() + #<--NO LONGER NEEDED--WE'LL USE FULL CIVIDIS SCALE.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)")The theme_plus()-driven changes here include longer and fatter tick marks of pure white for maximum contrast against the colors within (at least, usually–there’s no perfect hue to use for this purpose!), a black border to prevent any contrast issues with the background, and larger dimensions for easier reading and enhanced visibility.
Note that color bars can have the same issue as axes, where the ends don’t always get labels. One can just add a third call to scale_continuous_plus() to help with that:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus() +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)") #<--ADD A THIRD FOR FILLThe left-hand side of the colorbar’s scale has expanded to include 0 so that it’s clearer to the reader exactly which color would correspond to a value of 0.
scale_continuous_plus() has another handy feature that is often of value for legend titles, specifically: set split_name to TRUE to break the legend onto many lines by spaces in the original entry:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus() +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE #DIVIDE ONTO NEW LINES BY SPACES
)You can still provide a vector of custom axis labels to scale_continuous_plus, like you can with its ggplot2 counterparts, but it requires a little trial and error. Because scale_continuous_plus must “experiment” to find the right limits and breaks for your scale, it may ultimately create breaks outside the limits shown and which are thus invisible. As such, you may need to provide more labels (including 1 or 2 blank labels on either side) than it appears you’d need. However, if you provide fewer labels than are strictly required, the function makes an educated guess about how to pad them to display properly:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus() +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6] #ONLY 6 BREAKS ARE SHOWING, BUT ACTUALLY, THERE ARE 8, BEHIND THE SCENES! NO MATTER--THE FUNCTION PADS YOUR VECTOR WITH BLANK LABELS ON EITHER SIDE AS NEEDED.
)To know how many labels you need to provide, you may need to render your plot first to see what breaks the function has chosen, then provide a vector of labels to match.
Our graph is nearly “perfect” (at least, in our estimation!)—but one significant design issue remains: the y-axis title is still vertically oriented and tucked away in the plot’s left-hand margin. Let’s change that.
In base ggplot2, you can reorient the title to be horizontal, at least, using a few hacky adjustments involving line breaks and theme tweaks:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(axis.title.y = element_text( #<--CHANGE Y AXIS TITLE TO VERTICALLY JUSTIFIED AND HORIZONTAL.
vjust = 0.5,
angle = 0)) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal\nlength\n(cm)") + #<--INSERT LINE BREAKS USING \n TO BREAK TITLE ONTO MANY LINES FOR SPACE EFFICIENCY.
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
)This works reasonably well. It’s readable, and it’s relatively simple to implement, if you know how (but that’s a big if for many ggplot2 beginners!). But it has limitations:
It steals horizontal space in the graph’s “center row” from your data, much as a right-hand legend would, forcing a similar amount of wasted void space above and below it.
It keeps the label in a relatively unprominent location, where readers may not encounter it early.
It doesn’t scale well to longer axis titles or those with long, unbreakable words. If one wants a descriptive, detail-rich title, it’s hard to achieve that with this approach.
To get around these challenges, ggplotplus includes yaxis_title_plus(), which “surgically” moves the y-axis title to above the y-axis line, left-justified to the plot margin:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus() + #<--NO THEME ADJUSTMENTS.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") + #<--NO HACKS TO TITLE NEEDED.
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus() #<--THIS HANDLES EVERYTHING!This may seem a little radical at first, but this small adjustment has many potential benefits that data visualization experts have been championing for decades:
Prominence: Readers of left-to-right, top-to-bottom languages naturally start in the upper-left (this is called the “Z reading frame,” which is to say most humans read in a pattern that traces one or more Zs, starting at the top-left of an element), so the y-axis title is now one of the first elements many readers will encounter.
Readability: The title is horizontal, making it easier and faster to read for all kinds of people in all kinds of contexts.
Space efficiency: Instead of taking up room in the “data row” of the graph, the title sits in a narrow horizontal band above the plot, freeing up the core of the graphing area for your data to shine.
Pseudo-title element: Many graph design advocates recommend against using plot titles. The reason is that, at best, titles tend to do what good y-axis titles and figure captions already do, only worse. However, research does suggest novice graph readers find titles helpful as “footholds.” By moving the y-axis title to where a plot (sub)title might go, it’s allowed to serve a similar purpose but with no risk of being repetitive with a plot title.
No ambiguity: One hazard with moving the y-axis title to anywhere other than its “normal” location is that it may become unclear what it refers to. By anchoring it above the y-axis line and left-justifying it to the left edge of the y-axis labels, there’s little, if any, ambiguity as to its purpose. This is aided by theme_plus()’s default behavior to place any legend title at the top of the plot to the right rather than to the left of the legend keys.
Moving the y-axis title in this way is backed by decades of advocacy from the data visualization community. So while it’s optional, it’s not a fringe or new idea! Give it a try–you just might find you like it better too!
As noted above, if you’re using yaxis_title_plus() and theme_plus() together and your graph has a legend, the latter will default to a horizontal stripe at the top of the graph. By default, this stripe will sit in its own “row” above the relocated y-axis title.
Sometimes, this will be necessary to get both elements to fit without clipping into one another. However, if there’s enough room, you can set nudge_top_legend_down to TRUE inside yaxis_title_plus() to nudge the legend down enough (through some trial and error) to sit parallel to the moved y-axis title:
#SAME GRAPH AS BEFORE, BUT NOW RENDERED AT A WIDTH/HEIGHT OF 9/6.
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus() +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus(nudgeTopLegendDown = TRUE) #<--MOVE TOP LEGEND(S) DOWN INTO SAME ROW AS Y AXIS TITLE. When mapping categorical data to a non-positional visual channel (such as the x or y-axes), the most common approach is to use a legend to clarify which categories in the data correspond to which styles, orientations, symbols, colors, etc. are shown. As we’ve seen, legends can be designed to be efficiently placed along a reader’s likely Z reading frame by placing them horizontally above or below the plot.
However, this can still be relatively space-inefficient and demanding on the reader relative to another option often recommended by data visualization experts: direct labeling. This is where labels are placed within the plotting area to mark groups in lieu of a legend. This concentrates information, prevents readers from having to jump their gaze back and forth between the plotting area and the legend, and reduces the risk a reader may fail to notice a legend when forming initial conclusions, among other potential benefits.
In practice, direct labeling in ggplot2 is a little cumbersome. Generally, one would use ggplot2’s annotate() or geom_text() functions to manually choose a location for each label, guessing and checking to ensure adequate sizing and placement and to avoid overlapping labels with each other or with underlying points.
To demonstrate, let’s bin petal widths so I have a second categorical variable other than Species to work with:
iris$Petal.Width.Binned = ggplot2::cut_interval(
iris$Petal.Width,
n = 4,
labels = c("Small", "Medium", "Large", "Very large")
) #<--CUT INTO 4 GROUPS BY PETAL.WIDTH VALUE
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width.Binned), #<--MOVE TO A DISCRETE FILL MAPPING.
alpha = 0.7) + #<--LESS TRANSPARENT FOR BETTER GROUP CLARITY.
theme_plus(begin_discrete = 0, #<--RETURN TO FULL DISCRETE SCALE.
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
labs(fill = "Petal width") +
yaxis_title_plus() #<--DON'T NUDGEHere’s how one might manually direct-label groups in this plot:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width.Binned),
alpha = 0.7) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
yaxis_title_plus() +
annotate(geom = "text", x = 1.5, y = 6.3, label = "Small\npetal\nwidths") + #<-MANUAL ANNOTATIONS, ONE PER GROUP, WITH AN X AND Y MANUALLY SELECTED THRU TRIAL AND ERROR.
annotate(geom = "text", x = 3.25, y = 6.25, label = "Medium\npetal\nwidths") +
annotate(geom = "text", x = 4.3, y = 7.5, label = "Large\npetal\nwidths") +
annotate(geom = "text", x = 6.8, y = 7, label = "Very large\npetal\nwidths") +
guides(fill = "none") #<-NO LONGER A NEED FOR A LEGEND.This works, but it’s a bit tedious and requires manual trial and error to get the positioning right. Plus, in this case, there’s some ambiguity about which groups go with which labels for the two right-most groups.
direct_labels_plus() leverages the geom_label_repel() function from the ggrepel package. The latter allows you to place labels on points, lines, or other data elements that will then repel one another using an algorithm so they don’t bunch up. direct_labels_plus() extends that functionality to include repelling labels away from not only each other but the data elements they might be labeling too so the labels will not (generally) cover up the data. Here’s how to use it:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width.Binned),
alpha = 0.7) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
yaxis_title_plus() +
guides(fill = "none") +
direct_labels_plus(data = iris, #<--PROVIDE THE DATA AGAIN.
x = Petal.Length, #<-SPECIFY THE X AND Y VARIABLES AGAIN.
y = Sepal.Length,
group = Petal.Width.Binned, #<--SPECIFY VARIABLE TO LABEL BY.
placement = "top", #<-SHOULD LABELS BE TOWARDS THE TOP, RIGHT, BOTTOM, OR LEFT EDGE OF A GROUP'S POINTS/LINES?
geometry = "point", #POINTS OR LINES TO LABEL?
key_labels = c("Small" = "Small\npetal\nwidths", #PROVIDE key_labels A NAMED VECTOR TO REPLACE THE CONTENTS OF LABELS USING "OLD NAME" = "NEW NAME" FORMAT.
"Medium" = "Medium\npetal\nwidths",
"Large" = "Large\npetal\nwidths",
"Very large" = "Very large\npetal\nwidths")
)Here, at least, this configuration works well–there’s no ambiguity about which fill colors go with which labels, there’s no longer a need for a legend, and there was ample void space within the plotting area to facilitate this approach.
While I wouldn’t necessarily recommend using direct labeling in the context of faceting (“small multiples”), direct_labels_plus() will work with faceting so long as you supply either or both faceting variables to facet_vars:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width.Binned),
alpha = 0.7) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
yaxis_title_plus() +
guides(fill = "none") +
direct_labels_plus(data = iris,
x = Petal.Length,
y = Sepal.Length,
group = Petal.Width.Binned,
facet_vars = "Species", #<--SUPPLY A 1- OR 2-LENGTH CHARACTER VECTOR OF THE VARIABLES YOU'LL USE IN FACETING.
placement = "top",
geometry = "point",
size = 5,
box.padding = 0.5,
max.overlaps = Inf,
segment.size = 1,
min.segment.length = 0,
key_labels = c("Small" = "Small\npetal\nwidths",
"Medium" = "Medium\npetal\nwidths",
"Large" = "Large\npetal\nwidths",
"Very large" = "Very large\npetal\nwidths")
) +
facet_grid(. ~ Species) #<--FACET INTO COLUMNS BY SPECIESdirect_labels_plus() will also work with line-type graphs:
##COMPLETELY DIFFERENT DATA TO SHOW DIRECT LABELLING ON A LINE GRAPH
ggplot(Orange, aes(
x = age,
y = circumference,
color = factor(Tree, levels = unique(sort(as.numeric(Tree)))) #<--TO MAKE CATEGORICAL DATA.
)) +
geom_line() +
theme_plus() +
yaxis_title_plus() +
scale_continuous_plus(scale = "x", name = "Age (years)") +
scale_continuous_plus(scale = "y", name = "Circumference (cm)") +
labs(color = "Tree") +
scale_color_discrete(guide = "none") +
direct_labels_plus(
data = Orange,
x = age,
y = circumference,
group = Tree,
placement = "right",
geometry = "line",
size = 5,
box.padding = 0.5,
max.overlaps = Inf,
segment.size = 1,
min.segment.length = 0,
key_labels = paste("Tree", 1:5) #<--IF PROVIDING AN UNNAMED VECTOR, THE GROUPS WILL BE RELABELED IN ALPHANUMERIC ORDER.
)This functionality is relatively new and highly experimental; its implementation will likely continue to change and develop in the short term. It will also depend on the continued development of the ggrepel package, upon which direct_labels_plus() depends.
More practically, though, this functionality only makes good sense to use in situations where a legend would otherwise be needed, groups aren’t already self-evident, there’s void space enough within the plotting area for sufficiently sized labels, the labels won’t overlap the data elements, and a standard coordinate system is being used.
While this can work with transformed data, it determines label locations on the data it’s provided, not on any transformed versions ggplot2 might subsequently create. So, for example, if you intend to plot log-transformed y data, pre-transform them before providing them to direct_labels_plus(). Similarly, this will not work with ggplot2::geom_smooth() unless you provide fitted values from the smoothing function to direct_labels_plus().
At present, this function is not designed to work with polygonal data elements, though such functionality might be added in the future. Similarly, this function does not yet work with geom_ribbon(), geom_segment(), or geom_curve() though it might in the future.
Notably, this function is not designed to work with non-standard coordinate systems like those produced by coord_flip() or coord_polar(), though you could possibly have success using it if you provide transformed data that make sense in the new coordinate systems.
If you are not initially able to get a satisfying configuration of labels, in particular because they are overlapping the data you’re trying to show/label, I recommend cycling through all available options for placement, as some might be more workable than others for a given graph. You may also consider adjusting the size parameter down or the box.padding parameter up (see ?ggrepel::geom_label_repel() for details), but keep accessibility in mind as you do so. adj_fact can also be adjusted, similarly to ggrepel::geom_label_repel()’s nudge_y and nudge_x, to alter the target positions for labels in an attempt to place them more favorably. Lastly, consider raising the force value for ggrepel::geom_label_repel() to increase the intensity of the repulsion.
However, none of these options–nor this functionality overall–is guaranteed to work well in every case, especially when the density of data elements is high, the categories to label are numerous, and/or the volume of void space in the plotting is low. Discretion is advised!
One (arguably) controversial opinion coded into ggplotplus’s theme_plus() function is the total removal of gridlines. While some readers rely on them or expect them, many (probably the majority!) tend to find them distracting or visually cluttering in most situations. In fact, data viz experts often advise they not be used most of the time.
If you’d like to restore them, however, you can: you could manually add them using theme_plus(). It’ll let you! But there’s an easier (and more opinionated) option: gridlines_plus(). This function selectively reintroduces only the major gridlines (not minor ones) in only directions mapped to numeric variables (not discrete ones), and renders them as faintly as possible:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus(nudgeTopLegendDown = TRUE) +
gridlines_plus() #<--REINTRODUCE THOUGHTFUL GRIDLINESThese gridlines default to "gray90". Prior research suggests gridlines this faint relative to the background color will be just visible enough for those who need them while being faint enough to easily fade into the background for everyone else.
However, if you want to tweak them in some respect or another, gridlines_plus() allows adjustments to linetype, color, and linewidth for your convenience—no need to use theme_plus() to make those types of adjustments!
If I had just one continuous axis, gridlines_plus() would automatically detect this and draw the gridlines in only the one relevant direction:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Species)) + #<--MAKE DISCRETE
geom_boxplot() + #<--CHANGE GEOM TYPE.
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Because gridlines are primarily beneficial for helping a reader discern exact values, there’s no need for them when an axis is discrete and thus has no exact values to read! Thus, the presentation is simplified by omitting them in that direction automatically.
There might be situations wherein you might want to add very subtle gridlines, and you have more than one continuous axis, but you already know you don’t want to add gridlines along one of those two axes. For example, perhaps you have time data on the x-axis, and, even though these data are technically continuous, you don’t want gridlines added along that dimension. No worries: set the notx or noty parameter inside gridlines_plus() to TRUE:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Species)) +
geom_boxplot() +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
theme_plus() +
yaxis_title_plus() +
gridlines_plus(notx = TRUE) #<--REJECT GRIDLINES ALONG THIS SCALEIn this case, this functionally represses all gridlines because there is no other continuous axis.
ggplotplusFaceting—splitting a plot into small multiples based on one or more discrete variables—is a core ggplot2 feature, and it generally works with the tools provided by ggplotplus:
#EXAMPLE GRAPH TO SHOW FACETING FEATURES:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species) #<--MAKE ONE SMALL MULTIPLE (PANEL) PER SPECIES (ACROSS COLUMNS)As this example shows, many features in ggplotplus are compatible with faceting, though some may require some thought.
For example, using yaxis_title_plus() with facets works—the function places the relocated y-axis title below the facet strip labels. However, for some, this could look a little confusing or awkward. In general, I don’t recommend using yaxis_title_plus() when you’re faceting and have the facet strip labels at the top of the plot.
However, there are three potential workarounds. The first is simply to facet by rows instead of columns:
###INCREASE THE FIGURE HEIGHT A LITTLE TO MAKE ROOM.
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(Species ~ .) #<--FACET ACROSS ROWS INSTEAD.Another option would be to relocate the facet strip labels to the bottom of the graph so they aren’t competing with the relocated y-axis title for space:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species,
switch = "x") #<--MOVE STRIP LABELS TO BOTTOM.This honestly works better in some ways anyhow!
A third option would be to suppress the facet strip labels and instead only retain the legend, since I’m mapping Species here to both panel and color redundantly:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(strip.text = element_blank()) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species)Faceting is a relatively complex feature of ggplot2. ggplotplus’s features are mostly designed to work with it, but that doesn’t mean some creativity and future improvements won’t be involved!
Many graph design guides encourage designers to ensure the message of their graphs is clear: “Here is what you should be looking for/comparing in this graph.” This can sometimes be a challenge, especially when the data are messy or there’s a lot of them and also some obligation to show them for transparency or comparison purposes.
In these instances, color can be a useful visual channel (when used thoughtfully) to draw a reader’s attention to one or more subsets of the data while retaining but visually de-emphasizing the rest, often via de-saturation (graying). While using color in this way in ggplot2 is not overly difficult, it does require some understanding of the package’s underlying systems.
scale_focus_plus() is a wrapper of ggplot2’s scale_fill/colour_manual() functions that makes putting a graph into “focus mode” approachable. Returning to our scatterplot from earlier:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.6) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)",
thin.labels = TRUE) +
gridlines_plus() +
yaxis_title_plus()We could ask: What if we wanted the reader to focus on just one species here? Or perhaps the comparison between two species? scale_focus_plus() makes generating a version that facilitates that purpose as simple as a function call:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.6) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)",
thin.labels = TRUE) +
gridlines_plus() +
yaxis_title_plus() +
scale_focus_plus(aes = "fill", #<--ARE WE USING COLOR OR FILL TO DRAW FOCUS?
group_var = iris$Species, #THE DISCRETE VARIABLE DETERMINING THE GROUPS
focal_groups = "versicolor" #THE GROUP(S) TO DRAW FOCUS TO
)In this version, our eyes are immediately drawn to the versicolor group; bright and dark, vivid colors are especially good at drawing focus, whereas de-saturated, muted color in the middle of the luminance scale tend to “recede into the background,” where they are easily ignored until when we choose to engage with them.
We can pass instructions through to ggplot2::scale_fill_manual() here, if we’d like, such as custom legend labels and a legend title:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.6) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)",
thin.labels = TRUE) +
gridlines_plus() +
yaxis_title_plus() +
scale_focus_plus(aes = "fill",
group_var = iris$Species,
focal_groups = "versicolor",
labels = c("I. setosa", "I. versicolor", "I. virginica"), #<--REPLACE LABELS
name = "" #<--BLANK LEGEND TITLE.
)guide = "none" would eliminate the legend entirely, which could be appropriate in graphs where a different visual channel is already communicating group differences (e.g., a bar plot where the x axis does this).
By default, non-focal groups are not differentiated from each other. This is also ok in instances where another visual channel differentiates them (or when identifying distinct, non-focal groups is unnecessary). However, if you want them to be differentiated, set diff_nonfocal to TRUE:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.6) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)",
thin.labels = TRUE) +
gridlines_plus() +
yaxis_title_plus() +
scale_focus_plus(aes = "fill",
group_var = iris$Species,
focal_groups = "versicolor",
labels = c("I. setosa", "I. versicolor", "I. virginica"),
name = "",
diff_nonfocal = TRUE #<--USE DIFFERENT SHADES FOR THE DIFFERENT NON-FOCAL GROUPS.
)This allows you to create “focus mode” versions of plots that would still be accessible in grayscale or for those with color-vision impairments, so long as you maintain the variance in luminance that the function strives for.
You can specify multiple focal groups, and you can provide custom colors for these if you prefer:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.6) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)",
thin.labels = TRUE) +
gridlines_plus() +
yaxis_title_plus() +
scale_focus_plus(aes = "fill",
group_var = iris$Species,
focal_groups = c("versicolor", "virginica"), #<--MAKE THIS A VECTOR AND ADD GROUPS
labels = c("I. setosa", "I. versicolor", "I. virginica"),
name = "",
custom_focal = c("versicolor" = "turquoise", "virginica" = "orange") #<--A NAMED VECTOR LINKS COLORS TO VALUES.
)At present, scale_focus_plus() does not interact with the package’s coaching messages system. As such, you may receive coaching messages that scale_focus_plus() is somehow addressing–feel free to ignore or disable these in those instances.
scale_focus_plus() also only controls the color and fill aesthetics. However, alpha, size, linewidth and perhaps others could all be used in a similar way to draw focus. No variants of scale_focus_plus() are planned to access those other aesthetics at this time, but this can be revisited in future versions.
Lastly, scale_focus_plus() is designed to work with discrete (categorical) data only. However, continuous data could be binned into discrete categories to make them compatible; see ?scale_focus_plus() for details and an example.
Thus far, this guide has mostly focused on graphs that use color (hue, luminance, and saturation) to communicate differences in the data. ggplotplus is designed to revamp how ggplot2 approaches color to yield graphs that are more accessible and more interpretable while still having the aesthetic and engagement benefits of color.
However, any graph that uses color to communicate difference can only be so accessible and interpretable. A small but sizable number of humans are completely colorblind, and color perception weakens as we age. Additionally, many people still read and/or view graphs in contexts wherein colors may not be readily distinguished.
As such, data viz advocates would remind us all there are other visual channels (ways of communicating difference in a graph) beyond color available to us, including many already baked into ggplot2!
One of these, point shape, is a useful and classic channel for communicating difference in scatterplots like those in the guide so far.
However, ggplot2 only makes available the same 26 shapes available in base R:
pch_values = 0:25
#THE SHAPES AVAILABLE IN R/GGPLOT2
plot(pch_values,
rep(1, length(pch_values)),
pch = 0:25, cex = 2)While there is certainly some variation between these shapes, many are perceptually similar, making them difficult to distinguish quickly: Research has suggested that humans more readily distinguish shapes when they vary strongly from each other along three axes:
Openness (how “full” or “empty” they are)
Spikiness (how “pointy” and “angular” they are versus how “rounded” they are) and
Intersectionality (how “crossed,” if at all, their interior elements are)
Notably, only the last five shapes of those bundled with R have separate outline (stroke) and interior (fill) channels, allowing them to bear separate outline and fill colors, and those same five shapes are also relatively similar with respect to openness and intersectionality, making them harder to distinguish from each other. This limits the versatility of this visual channel.
As such, ggplotplus introduces geom_point_plus(), a variant of ggplot2’s geom_point() layer that introduces access to nine new, intentionally crafted shapes that vary as much as possible along the three axes described above while also being able to bear separate stroke and fill colors:
geom_point_plus_shapes()These shapes can be used to communicate difference whenever using color (alone) might be undesirable, insufficient, and/or unnecessary:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ], #<--RESTRICT DATA VOLUME TO REDUCE OVERPLOTTING FOR ILLUSTRATION.
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))), #<--MAP SHAPE TO ANY CATEGORICAL VARIABLE WITH 9 OR FEWER LEVELS!
legend_title = "Petal length (binned)"
) + #<--SPECIFY NEW LEGEND TITLE HERE FOR CONVENIENCE.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()As the example above shows, you can specify a new shape legend title inside geom_point_plus() using the legend_title parameter for convenience. You can also use the chosen_shapes parameter to specify which exact shapes from the new palette you want to use:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle") #<--SPECIFY SPECIFIC SHAPES YOU WANT.
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()As previously noted, all these shapes have separate fill and color aesthetics, so these color-based aesthetics can be set to constants or even mapped to variables (although I wouldn’t necessarily recommend doing the latter in many cases):
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle"),
fill = viridis::viridis(1,0.5,0.5) #<--GO AHEAD, MAP COLOR/FILL TO CONSTANTS OR EVEN TO VARIABLES!
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()There is also a geom_jitter_plus(), if you’d like to add a little variance to your point locations, e.g., to reduce overplotting:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_jitter_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle"),
fill = viridis::viridis(1,0.5,0.5),
width = 0.5, #<--USE WIDTH, HEIGHT, AND SEED TO CONTROL THE JITTERING.
height = 0.35,
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Note that, at present, shape cannot be mapped globally to access these shapes–it must be mapped locally in geom_point_plus(). Functionality to access these points in different ways and geoms may be added in the future.
ggplotplus ShapesTo respect the legacy of R graphics, geom_point_plus() also has access to the five base R shapes (pch 21:25) that can take both fill and color aesthetics. These are accessible via their standard pch numbers, e.g. 23, or by names: “circle”, “square”, “diamond”, “triangle_up”, and “triangle_down”. These can be mix-and-matched with the other shapes added by ggplotplus:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "21", "plus", "triangle_up", "octagon", "economy", "diamond"), #<--YOU CAN ACCESS BASE R SHAPES 21-25 VIA NAME OR NUMBER
fill = viridis::viridis(1,0.5,0.5)
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()ggplotplus For a SessionMore radically, you can add your own shapes for use in scatterplots!
The first step to doing so is to generate a list of vertices for your shape. Shapes like those in ggplotplus are essentially “connect the dot puzzles,” wherein consecutive vertices on a coordinate plane are joined in order. Polygons formed after and within the first are treated as “holes” in the base shape. Coordinates must lie with [-1, 1], with a rough diameter of 0.4, in both the horizontal and vertical directions. The base shape should have a piece value of 1; all vertices in each hole should have a unique piece value >1.
For example, a basic 5-pointed star, with no holes, might have coordinates like these:
(test_star = data.frame(
x = c(
0.000,
0.118,
0.380,
0.190,
0.235,
0.000,
-0.235,
-0.190,
-0.380,
-0.118
),
y = c(
0.400,
0.124,
0.124,
-0.047,
-0.324,
-0.153,
-0.324,
-0.047,
0.124,
0.124
),
piece = 1
))I can register this shape with ggplotplus for the session using add_shape_plus():
add_shape_plus(name = "star", #<--THE NAME TO USE TO ACCESS THIS SHAPE.
shape = test_star
)Once registered, you can refer to this shape by name in any later calls:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "21", "plus", "star", "octagon", "economy", "diamond"), #<--YOU CAN NOW ACCESS THE STAR SHAPE, IF IT'S BEEN REGISTERED
fill = viridis::viridis(1,0.5,0.5)
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Feel free to get creative to make highly distinctive shapes! Generative AI models are, in my experience, very good at turning shape ideas into compatible lists of vertices and holes, if given the parameters described above.
Note that shape registrations last only for the session; you will need to re-register shapes each new session.
One sort of strange function in ggplot2 is coord_flip(), which flips the graph’s coordinate system so that the x axis runs vertically and the y axis runs horizontally (it should be noted that this function is now deprecated in ggplot2 Version 4). ggplotplus’s tools try to work with this function, if you use it:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
coord_flip() #<--CHECK THIS OUT--THE Y AXIS TITLE IS REALLY THE X AXIS TITLE!Here’s a “final” version of the scatterplot using (almost) all of ggplotplus’s tools together:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),#<--MAKE THE TAXON NAMES ITALIC LIKE THEY OUGHT TO BE!
begin_discrete = 0,
end_discrete = 1) +
yaxis_title_plus() +
gridlines_plus() I’d say that looks pretty nice, but it’s, at worst, an evidence-based starting point upon which to build something truly impactful and distinctive!
Just as with ggplot2, I recommend users use ggplot2::ggsave() to export all their graphs from R! Using RStudio’s Plots pane to export graphs makes it harder to control the resolution, width, and size of your exported graphs and to ensure consistency from one version of a graph to the next.
Specifically, I highly recommend specifying a dpi of at least 300 (the default) to ggsave() as well as a width and height relevant to the intended use of the graph.
For example, full page width in a typical scientific journal is approximately 7 inches, and a typical aspect ratio (width:height) is approximate 1.33, yielding a typical height around 5.26 inches. So, a reasonable ggsave() call might look like this:
###HYPOTHETICAL GGSAVE COMMAND
p = ggplot
ggplot2::ggsave(filename = "Myplot.png",
plot = p,
dpi = 600,
width = 7,
height = 5.26)However, as any experienced ggplot2 user knows, a graph’s design may look great in RStudio’s Plots pane, when it’s been sized to one set of dimensions, and then may look terrible when it’s saved sized to a different set of dimensions! In particular, font and line width sizes often do not automatically scale well to a range of output dimensions.
ggplotplus’s theme_plus() function recognizes this and exposes two inputs: export_width and export_height. When using it, specify the ultimate dimensions you intend to export your graph at (e.g., 7 x 5.26), and it’ll adjust the default font and line sizes (though not any custom sizes you’ve provided!) accordingly to look (reasonably) good at your output dimensions:
###PLOTTING WINDOW NOW RESCALED TO A WIDTH OF 7 INCHES X 5.26 INCHES
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),
begin_discrete = 0,
end_discrete = 1,
export_width = 7, #<--SET THE OUTPUT WIDTH AND HEIGHT YOU EXPECT AND SIZES OF ELEMENTS WILL ADJUST SOMEWHAT ACCORDINGLY (HERE, A LITTLE SMALLER)
export_height = 5.26) +
yaxis_title_plus() +
gridlines_plus() While ggplotplus goes out of its way to try to align your graphs with data visualization best practices by default, there are certain aspects it cannot necessary coerce for you. However, it can notice when these aspects are suboptimal and gently let you know and give you some ideas for what to do differently.
For example, humans struggle to perceive the differences between colors when there are more than ~7-12 of these within a plot. Beyond this threshold, at least one pair of colors will be difficult for many to tell apart. In the rare instances where so many discrete levels are needed, color is a suboptimal visual channel to use.
If you try to use this many, ggplotplus will (politely) recommend different courses of action:
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(
fill = factor(Petal.Width)), #<--THIS WILL REQUEST A VERY LARGE NUMBER OF DISCRETE COLORS.
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1)
Note: You've mapped color and/or fill to a discrete variable with > 7 levels. Even when using a color palette designed for maximum contrast and discernability (such as viridis), most humans are not able to readily distinguish all colors from one another in any palette beyond about 7 colors. Consider using a different visual channel, filtering or consolidating to a smaller number of levels, or layering on a second visual channel (such as shape or line type). Alternatively, consider shuffling the color values to make dissimilar colors appear nearer to one another to facilitate comparisons. Also, consider using scale_focus_plus() or direct_labels_plus() in these circumstances. Set enable_coaching to FALSE to disable these messages.
Note: For your x and y scale(s), you didn't apparently set a title different than the name of the column mapped to that scale. This is not generally recommended. Column names tend to be machine- rather than human-readable, lack typical spacing, capitalization, and punctuation usage, and they tend to lack units. Consider using ggplot2::labs() to provide these scales with new, human-readable and informative titles. Set enable_coaching to FALSE to disable these messages.
You’ll also note that it encourages you to consider renaming your scale titles to something more informative and human-readable too!
If these coaching messages get too tedious for you, though, you can set enable_coaching to FALSE in yaxis_title_plus(), theme_plus(), or gridlines_plus() to turn all these checks and messages off. You can also disable them for the duration of the session by disabling them in the R options via options(ggplotplus.enable_coaching = FALSE).
While much probably still needs to be done to make ggplotplus easily and fully compatible with the rest of the “ggplot2niverse,” it should be compatible with both cowplot and patchwork, although its outputs will need to be converted. For example:
library(cowplot)
P = ggplot(iris, #<--PACK GRAPH INTO OBJECT.
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),
begin_discrete = 0,
end_discrete = 1,
enable_coaching = F) +
yaxis_title_plus() +
gridlines_plus()
cowplot::plot_grid(
ggplotplus_to_cowplot(P), #<--USE ggplotplus_to_cowplot ON THE PACKED OBJECT.
ggplotplus_to_cowplot(P)
)A similar workflow is available for patchwork:
library(patchwork)
P = ggplot(iris, #<--PACK GRAPH INTO OBJECT.
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),
begin_discrete = 0,
end_discrete = 1,
enable_coaching = F) +
yaxis_title_plus() +
gridlines_plus()
(ggplotplus_to_patchwork(P) | #<--USE ggplotplus_to_patchwork() ON PACKED OBJECT.
ggplotplus_to_patchwork(P)) +
patchwork::plot_annotation(
title = "ggplotplus + patchwork"
)If you encounter a compatibility issue with another package you depend on, please file an issue on the package’s Github repository!