Code
# install.packages("pak") #IF NOT ALREADY INSTALLED
pak::pak("MAISRC/ggplotplus")To use the ggplotplus package, you’ll need to first install it from GitHub using the pak package (you only need to do this once for each release of the package):
# install.packages("pak") #IF NOT ALREADY INSTALLED
pak::pak("MAISRC/ggplotplus")Then, load it alongside ggplot2:
# install.packages("ggplot2") #IF NOT ALREADY INSTALLED
library(ggplot2)
library(ggplotplus)This guide introduces the ggplotplus package, a collection of tools developed by Dr. Alex Bajcz, Quantitative Ecologist at the Minnesota Aquatic Invasive Species Research Center (MAISRC) at the University of Minnesota.
The tools in this package are meant to provide an opinionated, Universal-Design-oriented update to ggplot2’s existing tools and defaults, with the goal helping less-experienced and time-strapped ggplot2 users to make more accessible graphs more quickly and painlessly.
This guide offers an overview of how the tools in this package are intended to be used and is especially aimed at users who are relatively new to ggplot2 and relatively unfamiliar with its inner workings.
In a nutshell, ggplotplus’s tools overhaul the default design state of ggplot2’s graphics. While a typical one-line ggplot() call yields a functional graphic, it’ll often fall short of modern best practices in data visualization and graph design, especially with respect to universal accessibility. As such, many simple ggplot() calls do not immediately yield graphs fit for publication or sharing with a wide, global audience.
Of course, experienced users of ggplot2 can use its many powerful tools and toggles, like those in the theme(), scale_*(), and geom_*() family of functions, to redesign and customize their graphs to meet even the highest standard! But doing so requires hard-won knowledge of how these tools work and can be tedious to implement even when you have the knowledge.
Plus, even if one has the technical know-how, not everyone is familiar enough with the most-current tenets of data visualization to know what to change and what to change it to. The ggplotplus package is designed to help users of all experience levels start from a more accessible baseline without sacrificing ggplot2’s capacity for individuality, experimentation, and customization.
Accessibility is really the primary motivation for ggplotplus. Many of its changes to the default settings of ggplots are intended to improve compliance with modern accessibility standards; this guide attempts to highlight these improvements as a means of educating the reader on some of the finer points of modern accessible design, in case they are curious and motivated to learn more!
To be clear: ggplotplus is not a replacement for making careful design choices, personal artistry, solicitation of feedback, or understanding your audience’s needs and capabilities. Instead, it’s meant to be a new (and at least arguably better) starting point for your graphic design process. While the design choices baked into its tools ultimately reflect Dr. Bajcz’s professional judgments, they’re grounded in a thorough review of the last several decades of the data visualization literature, so they don’t only primarily reflect his personal opinions about good graph design (though they also do!). Feel free to disagree with some or even most of its opinions; it’s designed to respect all your preferences, just as ggplot2 would.
The ggplotplus package is guided by three core principles:
Better Defaults:
Many default settings in ggplot2 are functional but sub-optimal in terms of design and accessibility. ggplotplus attempts to improve upon these by, among other things, adjusting colors, shapes, gridlines, spacing, text, etc. to yield cleaner, more accessible graphs at the outset with less fiddling with toggles.
Customization:
Almost all opinions effected by ggplotplus’s tools can be overridden by the user, either through standard ggplot2 syntax or using _plus function variants and their optional inputs. You’re rarely (if ever) stuck with any of the opinions baked into these tools (even though they do have these opinions for a reason!).
Modularity:
Each tool in ggplotplus is designed to be additive and used in concert, as appropriate. Sure, you can use just one (e.g., theme_plus()), but the intention is to call every relevant tool to achieve the full effect.
The package has one primary function and then several more that are most applicable only in certain situations (some of which are common, others of which are rarer):
| Function | Purpose |
|---|---|
theme_plus() |
An opinionated version of ggplot2’s base theme_gray theme, with changes to the settings for size, spacing, legend placement, geom-specific defaults, and much more. This is the most critical function in the package and should be used in nearly every ggplot call. |
yaxis_title_plus() |
A function for relocating the y-axis title from its normal, less accessible location and orientation. Not often desirable in faceted plots, but often desirable in all other circumstances. |
scale_continuous_plus() |
Drop-in replacement for ggplot2’s scale_x/y/color/fill_continuous() functions. It attempts to ensure axis breakpoints and limits are set to ensure that the entire range of your graphed data are labeled. Useful for every continuous scale in your graph. |
geom_point_plus() |
An alternative version of ggplot2’s geom_point() function. It introduces access to a new palette of shapes designed to be more readily differentiated. Useful when point shape is being mapped to a discrete variable, especially one with more than ~3 levels. |
gridlines_plus() |
Adjusts the appearance of gridlines so they are as subtle and as infrequent as possible. Appropriate in the (relatively uncommon) instances where gridlines are useful to an end user relative to the cognitive load they add. |
Each of these functions is discussed in more depth in the sections that follow, along with examples.
ggplot for ComparisonTo introduce you to the tools in this package, let’s start by creating a simple, one-line ggplot2 graph as a reference point. The code below generates a scatterplot of petal length vs. sepal length for iris flowers from three species. These data come from the classic iris data set, automatically included with every installation of R. Points in the scatterplot are colored according to species:
ggplot(data = iris, #<--THE DATA SET
mapping = aes(x = Petal.Length, #<--MAPPING OUR AESTHETICS. WE'RE SAYING "*THIS* VARIABLE IN THE DATA SET SHOULD USE *THAT* VISUAL CHANNEL IN THE GRAPH". HERE, WE'VE MAPPED PETAL LENGTH TO HORIZONTAL (X) POSITION AND SEPAL LENGTH TO VERTICAL (Y) POSITION.
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) #<--WE ALSO MAP SPECIES TO THE VISUAL CHANNEL OF COLOR, BUT ONLY IN THIS ONE LAYER OF THE PLOT (THE ONLY LAYER, AT THIS POINT).This graph is simple and reasonably effective—it’s perfectly suitable for exploration and informal sharing!
…However, for presentation or publication, it falls short (at least, in Dr. Bajcz’s professional opinion! YMMV) in several important respects. Some of these include:
Text size: The default text size for axis and legend text is too small for easy reading, especially for those with a notable amount of visual impairment (something that tends to rapidly advance with age, and the world’s population is aging). A common rule of thumb is to size text such that it feels almost too large to help ensure it’s large enough for someone with visual impairment. Said differently, one should err on the side of sizing important elements as big as they can get away with before doing so leads to other design problems.
Point size: The points are also quite small, which could hinder readability. The same mindset applies here—data elements should be as large and conspicuous as can be afforded.
Color palette: ggplot2’s default “rainbow” color palette lacks variance in luminance (how “bright” or “dark” the color is, aka how close to white/black it is). For a person with colorblindness (especially red-green colorblindness), the red and green shades are very difficult to distinguish. For those with no color vision at all, or for those using black-and-white viewing technologies (both of which are surprisingly common situations), all three colors used would be virtually indistinguishable.
Foreground/background contrast: While contrast (the ease with which nearby elements can be readily distinguished from one another) between the points and the background here is ok, it could be improved by using darker points and/or a lighter plot background. As noted above, it’d also help to have different color shades used for the points such that they contrasted better with each other. High contrast is critical for accessible and quickly interpretable graphs.
Whitespace and layout: Default ggplots can suffer from cramped layouts in some respects. When disparate elements (e.g., data points and axis labels) are very close together, it can be slower and more cognitively taxing to visually and mentally separate them. In particular, spacing in ggplots can often be tight between:
Axis titles and their corresponding axis labels.
Items within the legend(s).
Axis labels on densely labeled axes (not as much so here, but it’s a common issue)
Adding more void space (space not housing an element) between components helps readers visually parse and process them easier and faster, especially those with significant cognitive or visual impairment.
Axis readability: In ggplot2, axis and legend titles and labels default to the column names/values found in the data set unless the user sets custom text for them. These default title/label strings are often “computer-y” (e.g., they lack spaces, have unusual capitalization, and contain punctuation instead of spaces, etc.) rather than human-readable. Axis titles and labels should have standard spelling, spacing, and punctuation, and they should be human-interpretable. They should also contain units, if applicable, and avoid the use of obtuse abbreviations when possible.
Gridlines: The research on the merits of gridlines is, admittedly, mixed! They:
Can lack contrast with the background and with nearby data elements.
Increase cognitive load (the amount of mental effort needed to integrate the available information to form a conclusion) by adding visual information to the plotting area that must be parsed and either integrated or (more likely) dismissed. When they’re not needed (and, arguably, they very often aren’t!), they constitute visual clutter that slows down interpretation.
Are unhelpful when rendered for a categorical variable (the default behavior in ggplot2, though not applicable to this graph).
On the flip side, gridlines are familiar to novice readers and can thus serve as a “foothold” for them to begin to understand a graph. They can also help readers estimate exact values when this is required (though, arguably, it very often isn’t!).
By and large, though, most data visualization experts agree that graphs are not the most efficient device for conveying exact values. Text, tables, and raw-data-file sharing are all superior! The adage to remember is that graphs ought to primarily be about “vibes!”
So, gridlines are often unnecessary. However, when a designer does deem them to be valuable, research has shown they can and should be as faint and as infrequent as possible to minimize their costs relative to their benefits. This advice does not generally correspond well to ggplot2’s default behaviors.
Incomplete axis labeling: Axes should generally include tick marks and labels at both ends of the data range. This aligns with reader expectations (especially those of novice readers) and aids in efficient comprehension of the data. Axes are essentially “number lines,” and so should be completely graduated by labels, just like number lines are. On this graph, though, both axes are missing one or more labels near the upper/lower limits of the data, which makes these axes feel visually “unfinished,” as though we just “forgot” to label all the way up to the ends!
Overplotting: When data elements partially or wholly overlap with one another (a very common problem in point-based graphs like scatterplots), it becomes hard to judge how many elements are present in a location. This limits the reader’s ability to gauge data density. This is a hard data-viz problem to fix, but ggplot2’s default settings do not attempt to address it at all; it simply stacks any number of points on top of one another invisibly.
Vertically oriented y-axis title: Research has consistently shown that text rotated away from horizontal (0 degrees) is harder to read for virtually every human on Earth—but especially for dyslexics, non-native readers of the language, users with motor or visual impairment, and those with certain technological limitations, among others. Despite this, the y-axis title is almost always rotated 90 degrees from horizontal in scientific graphs (not just in ggplots). This also means that it’s printed in a relatively unprominent location (in the left margin), so it’s not one of the first elements a reader is likely to encounter. Since many graphs are “about” the y-axis variable, the y axis title deserves a more prominent location so readers encounter it sooner.
Tick marks: The tick marks on the axes are small and easily lost visually, though they can provide many of the same benefits of gridlines with much less cognitive load (they are the “gradations” on the “number lines”). These could be more prominent
Missing axis lines: To better visually represent “number lines,” and to also better demarcate the boundary between the plotting area and the axes areas, most graphs feature axis lines. However, ggplot2’s graphs tend to lack such lines by default.
Legend placement: ggplot2’s default legend location (in the right-hand margin) is often space-inefficient, creating needlessly large voids above/below it. Plus, it tends to be encountered by readers relatively late when placed in this location. Data visualization best practices are to either integrate a legend’s information into the plotting area directly (e.g., through direct labeling of lines) or else to place the legend somewhere both more prominent and more compact.
This list isn’t exhaustive, but it illustrates how even a simple ggplot could require many adjustments to meet accessibility, clarity, and design standards. It has been Dr. Bajcz’s personal experience that going from a base ggplot to a publishable one often requires several hundreds of lines of code-based adjustments! Not everyone has the time, patience, or wherewithal to do that much fiddling for every graph they make: hence, ggplotplus. It bundles together many of those hundreds of lines of code so that we can all just start much closer to where most of us would want to finish!
With many of the design challenges outlined above in mind, ggplotplus introduces the theme_plus() function, its core tool. A corollary to ggplot2’s theme() function, theme_plus() is designed to work identically. More experienced ggplot2 users will know theme() to be one of ggplot2’s workhorse functions; it exposes many hundreds of toggles for adjusting a graph’s appearance, and it’s one of the functions perhaps most responsible for ggplot2’s not inconsiderable learning curve.
The intention of theme_plus() is to rewrite many dozens of the “presets” inside of ggplot2’s default theme() so that you don’t have to! Just tack it on to any ggplot call, even with no inputs, to access its benefits:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus() #<--...BUT WITH ggplotplus'S theme_plus FUNCTION TACKED ON.Hopefully, you can already see how big a difference this one function can make to the design of a graph! Let’s discuss some of these design improvements in greater detail:
Larger text throughout for better readability.
Increased spacing between key elements (like legend keys within the legend and between axis titles and their corresponding labels).
Thicker and longer axis ticks for greater visibility and easier distinguishing between them and data elements and between them and axis lines.
Remove gridlines by default (see later section on gridlines_plus() for how to selectively reintroduce them).
White background for highest contrast against most standard foreground elements (well, technically, it’s a very subtle, warm offwhite, since true white can increase eyestrain with prolonged viewing).
Black axis lines at the bottom and left for visual anchoring (but no top or right borders, which typically communicate no information and thus are extraneous).
Legend moved above the plot as a horizontal stripe, which is more space-efficient and increases the likelihood it will be encountered early by readers.
All text rendered in black, rather than the default dark gray for some elements, to maximize contrast with the light background.
Of course, everything is still customizable. If you don’t love a particular choice—say, the thickness of the axis lines or the legend’s location—you can adjust them, same as you normally would using theme(), except that you can pass the overrides straight into theme_plus() instead:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(axis.line = element_line(linewidth = 0.75)) #<--WE CAN MAKE THE AXIS LINES A LITTLE THINNER IF WE WANT. Here’s how you’d relocate the legend to its standard ggplot2 position. The theme will automatically apply some enhanced default styling specific to this position instead:
#THE EXACT SAME GGPLOT COMMAND AS BEFORE...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(legend_pos = "right")Like the rest of the ggplotplus toolkit, theme_plus() is meant to be a smart starting point, not a “final destination.” Its outputs won’t be ideal for every plot, person, or context (that’s not even remotely possible!), but it’s designed to save you time by making it so there are fewer adjustments you need to make to get your graphs to where they ultimately need to be: fully accessible for the entire audience you want!
As previously noted, the default color palette used by ggplot2 is not particularly accessible. Fortunately, the viridis package—bundled automatically with every installation of ggplot2—includes several palettes that were specifically designed for near-universal accessibility!
The way they achieve this is by featuring continuous variance in not just in hue (i.e., the base “color,” such as “red,” “green,” or “blue,” so what is conventionally thought about as “color”) but also in luminance (light vs. dark, aka “white” vs. “black”). This multidimensional variance makes the colors in these palettes more universally distinguishable from one another for a wide range of people in a wide range of circumstances.
Thanks to enhancements made to ggplot2::theme() as of the package’s 4.0.0 Version, released September 11, 2025, one can now set default color palettes within a theme (this used to require using R’s global options and was relatively inflexible). For example, one can do something like:
###CODE TO BE ADDED TO A GGPLOT CALL TO SET A DEFAULT COLOR PALETTE
... +
ggplot2::theme(
palette.colour.discrete = disc,
palette.fill.discrete = disc,
palette.colour.continuous = cont,
palette.fill.continuous = cont
)
### WHERE disc AND cont ARE SETS OF COLORS PULLED FROM A SPECIFIC PALETTE, SUCH AS VIRIDIS. This allows theme_plus() to set more accessible default color palettes while still allowing a user to easily override these defaults with their own preferences, either by replacing one of the palette.*.* inputs above within theme_plus() or by adding a scale_*_*() call to their plot as they normally might in order to specify desired colors.
theme_plus() uses the titular viridis palette for discrete variables and the cividis palette for continuous variables. By default, viridis ranges from dark purple to light yellow, passing through blue, green, and teal. This relatively large number of hues makes it relatively easy to produce several discrete hues that are visually contrasting with each other as well as with whatever background color you use.
However, it excludes, by default, the lightest yellow hues from the viridis palette because these tend to lack contrast with other light-colored elements as well as with backgrounds in ggplots, when tend to be white. As a result, the remaining palette includes just the purple/blue/teal/green region. However, you can easily adjust this using the function’s begin_discrete and end_discrete input parameters, which take values ranging between 0 and 1:
begin_discrete = 0.28 would disable the dark purple portion of the color range, such as when you have many other dark elements or a dark background.
end_discrete = 1 would enable the light yellow portion, such as when you don’t have other light elements or a light background.
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Species)) +
theme_plus(begin_discrete = 0.28,#<--DISALLOW DARK PURPLE REGION.
end_discrete = 1) #<--ALLOW YELLOW REGION.In this example, we’ve disabled the purple end of the palette and we’ve enabled the yellow region, which means we have some points that now lack contrast with the background. But no worries–that’s something we can fix!
To see the default color palette for a continuous variable, let’s temporarily map color to a numeric variable instead:
#A BASE GGPLOT2 GRAPH WITHOUT theme_plus() ON FOR REFERENCE
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Petal.Width)) #<--MAP COLOR TO A CONTINUOUS VARIABLE.The default ggplot2 palette for continuous variables is “blues,” which is a reasonable palette in that it does vary in luminance, making it more broadly accessible. However, it doesn’t vary in hue, which can make nearby values harder to distinguish. As such, theme_plus() changes the palette to cividis, which has three distinct hues: blue, gray, and yellow:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(color = Petal.Width)) +
theme_plus()Because humans like to “chunk” data into discrete groups according to hue, even in circumstances where the underlying phenomenon is continuous and nothing “special” is happening at the points of hue transition, this palette’s low number of hues results in less “false binning” of the data than tends to happen when using palettes with more distinct hues (such as viridis). After all, blue, gray, and yellow points here are quite different, so the hue differences many will perceive map accurately to differences in the underlying data, and even relatively similar values are more easily distinguished from one another.
Another major feature of ggplot2’s 4.0.0 update was to allow users to set many geometry-layer-specific design settings within theme(). These used to have to be set in each individual geom_*() layer separately. For example, a user can now add a theme() call of the following form to their graphs:
###EXAMPLE ADJUSTMENT OF GEOMETRY-LAYER-SPECIFIC DESIGN SETTINGS FROM WITHIN THEME.
... +
ggplot2::theme(geom = ggplot2::element_geom(pointsize = 5)) #SET THE DEFAULT POINT SIZE FOR ALL GEOMS USING POINTS TO 5.This functionality allows theme_plus() to replace many of the default settings of many commonly used geometry layers automatically.
For example, in the above graph, we can see that theme_plus() has made:
The points are larger and thus easier to parse.
The shape of these points hollow circles (by changing the default shape from R’s shape 16 to shape 21).
The latter change is important because shape 16 only supports the color aesthetic, whereas shape 21 supports both color (for the stroke/outline) and fill (for the interior). This gives us more flexibility in our design.
Specifically, by default, theme_plus() set the interior color for points to "transparent", resulting in hollow circles. This allows partial overlaps between points to be more readily distinguishable—though, of course, total overlap would still not discernible.
There is something we can do with our design to address even “extreme overplotting,” i.e., when points would overlap fully. It first requires us to switch from mapping species to color instead of to fill. This will cause theme_plus() to pivot to making the strokes of the points black:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species)) + #<--CHANGING TO MAPPING SPECIES TO FILL.
theme_plus(begin_discrete = 0.28,
end_discrete = 1) Notice how this change fixes our earlier issue of our yellow points lacking contrast against the light background. This happens because the fill colors are now surrounded by black outlines, so our eyes will compare the fill color to the black (the most proximal other color) rather than to the background. A neat trick, based on how the science of vision works!
Even though dark purple wouldn’t have sufficient contrast against a black outline, that’s not actually a problem because the point strokes aren’t encoding information. As a result, if we wanted top make the colors being used even more distinct, we could restore the full range of the viridis palette:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species)) +
theme_plus(begin_discrete = 0, #<--EXCEPT BACK TO 0 TO INCLUDE PURPLE
end_discrete = 1) Purple points still look purple, whether we can distinguish their black outlines from their fill colors or not. Meanwhile, all points are easily distinguished from the background, so we can utilize the entire color space available to us in the palette!
Ok! Back to the issue of extreme overplotting here. There are two broadly effective options for addressing overplotting in graphs like this one. The first, as we’ve already seen, is to pick a shape with separate color and fill aesthetics, then map color to the stroke and set the fill to "transparent" so that partial overlap between the points is evident.
However, another, more aggressive option is to instead map stroke to a color that contrasts against the background well (such as black for a white background), map fill color to your variable of interest, and then make the points semi-transparent so that points stacked on top of one another “blend” into one another chromatically. This is easier to see than to read:
#SAME GRAPH AS BEFORE EXCEPT...
ggplot(data = iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) + #<--ADD IN STRONG SEMI-TRANSPARENCY VIA ALPHA.
theme_plus(begin_discrete = 0,
end_discrete = 1) At alpha = 0.3, individual points maintain sufficient contrast with the background, but points in dense regions of the data space layer on top of each other, resulting in darker (or even blended) colors. This makes it possible for a reader to understand not just the position and group identity of points but their density as well.
Note: This approach also dims the black point strokes, so contrast could still be a concern for low-vision readers with the yellow fill colors. Additionally, variance in luminance will be reduced by using transparency, which can make the graph harder to view in grayscale. Color (and overplotting) in graphs is challenging to navigate! When overplotting is a serious concern, consider whether a scatterplot is the best way to represent your data, and whether you need to plot all your data to achieve your goals.
ggplot2 (a Review)In ggplot2, you can partially control the visual appearance of your graph using aesthetics, which connect visual channels (like position, color, size, or fill) to variables in your data set or set them to constant values.
There are three (partially overlapping) ways to assign aesthetics:
Global mapping: Inside aes() in the ggplot() call. These mappings apply to all layers (or “geoms”).
Local mapping: Inside aes() within a specific geom_*() call. These only apply to the layer they are included within.
Constants: Set inside or outside aes() within a geom or inside of aes() within ggplot(). This will fix the aesthetic to a single value, e.g., setting all colors to “red,” that will override any conflicting mapping.
This system gives you flexibility, but it can be confusing—especially for new users. Let’s clarify with an example that uses two geoms: a boxplot overlaid with jittered points (jittering randomly varies a point’s location data):
#NEW PLOT
ggplot(iris,
mapping = aes(x = Species, #<--MAKE THIS SPECIES, A DISCRETE VARIABLE
y = Sepal.Length)) +
geom_boxplot() + #<--A NEW GEOM = BOXPLOT
geom_jitter(#<--SAME AS GEOM_POINT EXCEPT JITTERED.
mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus()Here’s what’s happening:
Global mappings of x = Species and y = Sepal.Length apply to both the boxplot and the jittered points layers.
The fill aesthetic is mapped to Species, but only locally within the jittered points layer, so the boxplots are unaffected and don’t get species-specific fill colors.
The alpha aesthetic is set to a constant (0.3), again only locally for the jittered points. This means all points get this value, but the boxplots are unaffected.
If you map the same aesthetic in multiple places (e.g., globally and locally), the local mapping always wins—it overrides any global specification for that layer. Also, if you try to map an aesthetic to both a variable and a constant, the constant (usually) wins.
We bring all this up just to say that all of the same rules apply when using ggplotplus. You can:
Map aesthetics globally or locally.
Map aesthetics to variables in your data set or set them to constants.
Combine the approaches as needed–conflicts will resolve as they normally do in ggplot2.
Here’s a similar plot with a few tweaks to illustrate this further:
ggplot(iris,
mapping = aes(x = Species,
y = Sepal.Length,
fill = Species, #<-- WE CAN SWITCH TO MAPPING FILL GLOBALLY SO IT SHOULD APPLY TO BOTH GEOMS.
color = Petal.Length)) + #<--WE ALSO MAP COLOR
geom_boxplot(color = "blue") + #<--WE LOCALLY OVERRIDE COLOR, SO IT'LL BE BLUE FOR ALL BOXES INSTEAD OF LINKED TO PETAL LENGTH
geom_jitter() + #<--SIMPLIFY HERE.
theme_plus()In this version:
Fill is now applied to both geoms, since it’s being mapped globally inside of ggplot().
Color is also mapped globally but overridden locally in the boxplot layer (so boxplot outlines are blue, regardless of Petal.Length).
Most common ggplot2 geoms have defaults adjusted by theme_plus(), though not all. To see a list of those receiving specific adjustments, run this command:
sort(names(geom_plus_defaults)) [1] "abline" "area" "bar" "boxplot" "col"
[6] "count" "crossbar" "curve" "density" "dotplot"
[11] "errorbar" "freqpoly" "histogram" "hline" "jitter"
[16] "line" "linerange" "point" "point_plus" "pointrange"
[21] "ribbon" "segment" "smooth" "tile" "violin"
[26] "vline"
Returning to our earlier scatterplot:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1)There’s still room for improvement in its design, especially with respect to the axis titles and labels. A key first step in redesigning these would be to make the axis titles more human-readable, intuitive, and complete, including by specifying units. You can do this using the scale_*_*() family of functions in ggplot2:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_x_continuous(name = "Petal length (cm)") #<--STANDARD WAY IN GGPLOT2 TO CHANGE THE X AXIS TITLE FOR A NUMERIC VARIABLE.This is better! But the x-axis is still lacking labels near the endpoints of the data. There are a bunch of points with petal lengths < 2 (the last label), e.g. This is a common issue with ggplot2’s default breaks-determining process. How close are the purple points to 0? The yellow points to 8? It’s hard to tell without anchors on both ends. You can fix this using scale_x_continuous_plus(), which automatically adjusts breaks and limits to ensure label coverage near the ends of the scale:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",#<--SWITCH TO THE ggplotplus VARIANT, AND SPECIFY THE SCALE.
name = "Petal length (cm)") The axis is now expanded to start with a break at 1 and to end with a break at 7, with breaks in between still chosen to be regular and “pretty,” just as in ggplot2.
As with the base scale_*_*() functions, you can pass arguments like name, expand, or labels to scale_continuous_plus(). However, you cannot manually set breaks or limits, since those are determined internally by the function.
If the resulting labels feel a little too frequent, scale_continuous_plus() contain a trick: You can set thin.labels to TRUE to convert every other label to an empty string ("") while retaining the tick marks:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) #<--WILL CONVERT EVERY OTHER LABEL TO AN EMPTY STRING.This reduces cognitive load and increases void space while preserving the visual scaffolding the ticks provide. Optional, but a nice touch!
Applying this same logic to the y-axis is as easy as adding another scale_continuous_plus() call:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y", #<--ANOTHER FOR Y.
name = "Sepal length (cm)") Here, the y-axis limits now extend down enough to add a break at 4, making the axis appear more complete and aiding interpretation.
scale can also be set to fill and color for when these are mapped to continuous variables. At the same time, there are a few considerations theme_plus() makes that apply only to color bars, which are the legend components you see when you map color or fill to a continuous variable. Let’s map a continuous variable to fill to get such a legend:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width), #<--SWAP TO A CONTINUOUS VARIABLE.
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)")The theme_plus()-driven changes here include longer and fatter tick marks of pure white for maximum contrast against the colors within (at least, usually–there’s no perfect hue to use for this purpose!), a black border to prevent any contrast issues with the background, and larger dimensions for easier reading and enhanced visibility.
Note that color bars can have the same issue as axes, where the ends don’t always get labels. We can just add a third call to scale_continuous_plus() to help with that:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)") #<--ADD A THIRD FOR FILLThe left-hand side of the colorbar’s scale has expanded to include 0 so that it’s clearer to the reader exactly which color would correspond to a value of 0.
scale_continuous_plus() has another handy feature that is often of value for legend titles, specifically: set split_name to TRUE to break the legend onto many lines by spaces in the original entry:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE #<--ADD A THIRD FOR FILL
)You can still provide a vector of custom axis labels to scale_continuous_plus, like you can with its ggplot2 counterparts, but it requires a little trial and error. Because scale_continuous_plus must “experiment” to find the right limits and breaks for your scale, it may ultimately create breaks outside the limits shown and which are thus invisible. As such, you may need to provide more labels (including 1 or 2 blank labels on either side) than it appears you’d need. However, if you provide fewer labels than are strictly required, the function makes an educated guess about how to pad them to display properly:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6] #ONLY 6 BREAKS ARE SHOWING, BUT ACTUALLY, THERE ARE 8, BEHIND THE SCENES! NO MATTER--THE FUNCTION PADS YOUR VECTOR WITH BLANK LABELS ON EITHER SIDE AS NEEDED.
)To know how many labels you need to provide, you may need to render your plot first to see what breaks the function has chosen, then provide a vector of labels to match.
Our graph is nearly “perfect” (at least, in our estimation!)—but one signifcant design issue remains: the y-axis title is still vertically oriented and tucked away in the plot’s left-hand margin. Let’s change that.
In base ggplot2, you can reorient the title to be horizontal, at least, using a few hacky adjustments involving line breaks and theme tweaks:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1,
axis.title.y = element_text( #<--CHANGE Y AXIS TITLE TO VERTICALLY JUSTIFIED AND HORIZONTAL.
vjust = 0.5,
angle = 0)) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal\nlength\n(cm)") + #<--INSERT LINE BREAKS USING \n TO BREAK TITLE ONTO MANY LINES FOR SPACE EFFICIENCY.
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
)This works reasonably well. It’s readable, and it’s relatively simple to implement, if you know how (but that’s a big if for many ggplot2 beginners!). But it has limitations:
It steals horizontal space in the graph’s “center row” from your data, much as a right-hand legend would, forcing a similar amount of wasted void space above and below it.
It keeps the label in a relatively unprominent location, where readers may not encounter it early.
It doesn’t scale well to longer axis titles or those with long, unbreakable words. If we want a descriptive, detail-rich title, it’s hard to achieve that with this approach.
To get around these challenges, ggplotplus includes yaxis_title_plus(), which “surgically” moves the y-axis title to above the y-axis line, left-justified to the plot margin:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) + #<--NO THEME ADJUSTMENTS.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") + #<--NO HACKS TO TITLE NEEDED.
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus() #<--THIS HANDLES EVERYTHING!This may seem a little radical at first, but this small adjustment has many potential benefits that data visualization experts have been championing for decades:
Prominence: Readers of left-to-right, top-to-bottom languages naturally start in the upper-left (we read this the “Z reading frame,” which is to say we read in a pattern that traces one or more Zs, starting at the top-left of an element), so the y-axis title is now one of the first elements many readers will encounter.
Readability: The title is horizontal, making it easier and faster to read for all kinds of people in all kinds of contexts.
Space efficiency: Instead of taking up room in the “data row” of the graph, the title sits in a narrow horizontal band above the plot, freeing up the core of the graphing area for your data to shine.
Pseudo-title element: Many graph design advocates recommend against using plot titles. The reason is that, at best, titles tend to do what good y-axis titles and figure captions already do, only worse. However, research does suggest novice graph readers find titles helpful as “footholds.” By moving the y-axis title to where a plot (sub)title might go, we allow it to serve a similar purpose but with no risk of being repetitive with a plot title.
No ambiguity: One hazard with moving the y-axis title to anywhere other than its “normal” location is that it may become unclear what it refers to. By anchoring it above the y-axis line and left-justifying it to the left edge of the y-axis labels, there’s little, if any, ambiguity as to its purpose. This is aided by theme_plus()’s default behavior to place any legend title at the top of the plot to the right rather than to the left of the legend keys.
Moving the y-axis title in this way is backed by decades of advocacy from the data visualization community. So while it’s optional, it’s not a fringe or new idea! Give it a try–you just might find you like it better too!
As noted above, if you’re using yaxis_title_plus() and theme_plus() together and your graph has a legend, the latter will default to a horizontal stripe at the top of the graph. By default, this stripe will sit in its own “row” above the relocated y-axis title.
Sometimes, this will be necessary to get both elements to fit without clipping into one another. However, if there’s enough room, you can set nudge_top_legend_down to TRUE inside yaxis_title_plus() to nudge the legend down enough (through some trial and error) to sit parallel to the moved y-axis title:
#SAME GRAPH AS BEFORE, BUT NOW RENDERED AT A WIDTH/HEIGHT OF 9/6.
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) + #<--NO THEME ADJUSTMENTS.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") + #<--NO HACKS TO TITLE NEEDED.
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus(nudgeTopLegendDown = TRUE) #<--MOVE TOP LEGEND(S) DOWN INTO SAME ROW AS Y AXIS TITLE. One (arguably) controversial opinion coded into ggplotplus’s theme_plus() function is the total removal of gridlines. While some readers rely on them or expect them, many (probably the majority!) tend to find them distracting or visually cluttering in most situations. In fact, data viz experts often advise they not be used most of the time.
If you’d like to restore them, however, you can: you could manually add them using theme_plus(). It’ll let you! But there’s an easier (and more opinionated) option: gridlines_plus(). This function selectively reintroduces only the major gridlines (not minor ones) in only directions mapped to numeric variables (not discrete ones), and renders them as faintly as possible:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Petal.Width),
alpha = 0.3) +
theme_plus(begin_discrete = 0,
end_discrete = 1) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
scale_continuous_plus(scale = "fill",
name = "Petal width (cm)",
split_name = TRUE,
labels = LETTERS[1:6]
) +
yaxis_title_plus(nudgeTopLegendDown = TRUE) +
gridlines_plus() #<--REINTRODUCE THOUGHTFUL GRIDLINESThese gridlines default to "gray90". Prior research suggests gridlines this faint relative to the background color will be just visible enough for those who need them while being faint enough to easily fade into the background for everyone else.
However, if you want to tweak them in some respect or another, gridlines_plus() allows adjustments to linetype, color, and linewidth for your convenience—no need to use theme_plus() to make those types of adjustments!
If we had just one continuous axis, gridlines_plus() would automatically detect this and draw the gridlines in only the one relevant direction:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Species)) + #<--MAKE DISCRETE
geom_boxplot() + #<--CHANGE GEOM TYPE.
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Because gridlines are primarily beneficial for helping a reader discern exact values, there’s no need for them when an axis is discrete and thus has no exact values to read! Thus, we can simplify the presentation by omitting them in that direction automatically.
There might be situations wherein you might want to add very subtle gridlines, and you have more than one continuous axis, but you already know you don’t want to add gridlines along one of those two axes. For example, perhaps you have time data on the x-axis, and, even though these data are technically continuous, you don’t want gridlines added along that dimension. No worries: set the notx or noty parameter inside gridlines_plus() to TRUE:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Species)) +
geom_boxplot() +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
theme_plus() +
yaxis_title_plus() +
gridlines_plus(notx = TRUE) #<--REJECT GRIDLINES ALONG THIS SCALEIn this case, this functionally represses all gridlines because there is no other continuous axis.
ggplotplusFaceting—splitting a plot into small multiples based on one or more discrete variables—is a core ggplot2 feature, and it generally works with the tools provided by ggplotplus:
#EXAMPLE GRAPH TO SHOW FACETING FEATURES:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species) #<--MAKE ONE SMALL MULTIPLE (PANEL) PER SPECIES (ACROSS COLUMNS)As this example shows, many features in ggplotplus are compatible with faceting, though some may require some thought.
For example, using yaxis_title_plus() with facets works—the function places the relocated y-axis title below the facet strip labels. However, for some, this could look a little confusing or awkward. In general, we don’t recommend using yaxis_title_plus() when you’re faceting and have the facet strip labels at the top of the plot.
However, there are three potential workarounds. The first is simply to facet by rows instead of columns:
###INCREASE THE FIGURE HEIGHT A LITTLE TO MAKE ROOM.
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(Species ~ .) #<--FACET ACROSS ROWS INSTEAD.Another option would be to relocate the facet strip labels to the bottom of the graph so they aren’t competing with the relocated y-axis title for space:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species,
switch = "x") #<--MOVE STRIP LABELS TO BOTTOM.This honestly works better in some ways anyhow!
A third option would be to suppress the facet strip labels and instead only retain the legend, since we’re mapping Species here to both panel and color redundantly:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
theme_plus(strip.text = element_blank()) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
yaxis_title_plus() +
gridlines_plus() +
facet_grid(. ~ Species)Faceting is a relatively complex feature of ggplot2. ggplotplus’s features are mostly designed to work with it, but that doesn’t mean some creativity and future improvements won’t be involved!
Thus far, we’ve mostly focused on graphs that use color (hue, luminance, and saturation) to communicate differences in our data. ggplotplus is designed to revamp how ggplot2 approaches color to yield graphs that are more accessible and more interpretable while still having the aesthetic and engagement benefits of color.
However, any graph that uses color to communicate difference can only be so accessible and interpretable. A small but sizable number of humans are completely colorblind, and color perception weakens as we age. Additionally, many people still read and/or view graphs in contexts wherein colors may not be readily distinguished.
As such, data viz advocates would remind us there are other visual channels (ways of communicating difference in a graph) beyond color available to us, including many already baked into ggplot2!
One of these, point shape, is a useful and classic channel for communicating difference in scatterplots like those we’ve made so far.
However, ggplot2 only makes available the same 26 shapes available in base R:
pch_values = 0:25
#THE SHAPES AVAILABLE IN R/GGPLOT2
plot(pch_values,
rep(1, length(pch_values)),
pch = 0:25, cex = 2)While there is certainly some variation between these shapes, many are perceptually similar, making them difficult to distinguish quickly: Research has suggested that humans more readily distinguish shapes when they vary strongly from each other along three axes:
Openness (how “full” or “empty” they are)
Spikiness (how “pointy” and “angular” they are versus how “rounded” they are) and
Intersectionality (how “crossed,” if at all, their interior elements are)
Notably, only the last five shapes of those bundled with R have separate outline (stroke) and interior (fill) channels, allowing them to bear separate outline and fill colors, and those same five shapes are also relatively similar with respect to openness and intersectionality, making them harder to distinguish from each other. This limits the versatility of this visual channel.
As such, ggplotplus introduces geom_point_plus(), a variant of ggplot2’s geom_point() layer that introduces access to nine new, intentionally crafted shapes that vary as much as possible along the three axes described above while also being able to bear separate stroke and fill colors:
geom_point_plus_shapes()These shapes can be used to communicate difference whenever using color (alone) might be undesirable, insufficient, and/or unnecessary:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ], #<--RESTRICT DATA VOLUME TO REDUCE OVERPLOTTING FOR ILLUSTRATION.
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))), #<--MAP SHAPE TO ANY CATEGORICAL VARIABLE WITH 9 OR FEWER LEVELS!
legend_title = "Petal length (binned)"
) + #<--SPECIFY NEW LEGEND TITLE HERE FOR CONVENIENCE.
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()As the example above shows, you can specify a new shape legend title inside geom_point_plus() using the legend_title parameter for convenience. You can also use the chosen_shapes parameter to specify which exact shapes from the new palette you want to use:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle") #<--SPECIFY SPECIFIC SHAPES YOU WANT.
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()As previously noted, all these shapes have separate fill and color aesthetics, so these color-based aesthetics can be set to constants or even mapped to variables (although we wouldn’t necessarily recommend doing the latter in many cases):
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle"),
fill = viridis::viridis(1,0.5,0.5) #<--GO AHEAD, MAP COLOR/FILL TO CONSTANTS OR EVEN TO VARIABLES!
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()There is also a geom_jitter_plus(), if you’d like to add a little variance to your point locations, e.g., to reduce overplotting:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_jitter_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "lotus", "plus", "flower", "octagon", "economy", "waffle"),
fill = viridis::viridis(1,0.5,0.5),
width = 0.5, #<--USE WIDTH, HEIGHT, AND SEED TO CONTROL THE JITTERING.
height = 0.35,
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Note that, at present, shape cannot be mapped globally to access these shapes–it must be mapped locally in geom_point_plus(). Functionality to access these points in different ways and geoms may be added in the future.
ggplotplus ShapesTo respect the legacy of R graphics, geom_point_plus() also has access to the five base R shapes (pch 21:25) that can take both fill and color aesthetics. These are accessible via their standard pch numbers, e.g. 23, or by names: “circle”, “square”, “diamond”, “triangle_up”, and “triangle_down”. These can be mix-and-matched with the other shapes added by ggplotplus:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "21", "plus", "triangle_up", "octagon", "economy", "diamond"), #<--YOU CAN ACCESS BASE R SHAPES 21-25 VIA NAME OR NUMBER
fill = viridis::viridis(1,0.5,0.5)
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()ggplotplus For a SessionMore radically, you can add your own shapes for use in scatterplots!
The first step to doing so is to generate a list of vertices for your shape. Shapes like those in ggplotplus are essentially “connect the dot puzzles,” wherein consecutive vertices on a coordinate plane are joined in order. Polygons formed after and within the first are treated as “holes” in the base shape. Coordinates must lie with [-1, 1], with a rough diameter of 0.4, in both the horizontal and vertical directions. The base shape should have a piece value of 1; all vertices in each hole should have a unique piece value >1.
For example, a basic 5-pointed star, with no holes, might have coordinates like these:
(test_star = data.frame(
x = c(
0.000,
0.118,
0.380,
0.190,
0.235,
0.000,
-0.235,
-0.190,
-0.380,
-0.118
),
y = c(
0.400,
0.124,
0.124,
-0.047,
-0.324,
-0.153,
-0.324,
-0.047,
0.124,
0.124
),
piece = 1
))We can register this shape with ggplotplus for the session using add_shape_plus():
add_shape_plus(name = "star", #<--THE NAME TO USE TO ACCESS THIS SHAPE.
shape = test_star
)Once registered, you can refer to this shape by name in any later calls:
set.seed(123)
ggplot(
iris[sample(1:nrow(iris), 30, replace = FALSE), ],
mapping = aes(x = Petal.Length, y = Sepal.Length)
) +
geom_point_plus(
mapping = aes(shape = factor(round(Petal.Length))),
legend_title = "Petal length (binned)",
chosen_shapes = c("oval", "21", "plus", "star", "octagon", "economy", "diamond"), #<--YOU CAN NOW ACCESS THE STAR SHAPE, IF IT'S BEEN REGISTERED
fill = viridis::viridis(1,0.5,0.5)
) +
scale_continuous_plus(scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus()Feel free to get creative to make highly distinctive shapes! Generative AI models are, in our experience, very good at turning shape ideas into compatible lists of vertices and holes, if given the parameters described above.
Note that shape registrations last only for the session; you will need to re-register shapes each new session.
One sort of strange function in ggplot2 is coord_flip(), which flips the graph’s coordinate system so that the x axis runs vertically and the y axis runs horizontally (it should be noted that this function is now deprecated in ggplot2 Version 4). ggplotplus’s tools try to work with this function, if you use it:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
theme_plus() +
yaxis_title_plus() +
gridlines_plus() +
coord_flip() #<--CHECK THIS OUT--THE Y AXIS TITLE IS REALLY THE X AXIS TITLE!Here’s a “final” version of our scatterplot using all of ggplotplus’s tools together:
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),#<--MAKE THE TAXON NAMES ITALIC LIKE THEY OUGHT TO BE!
begin_discrete = 0,
end_discrete = 1) +
yaxis_title_plus() +
gridlines_plus() I’d say that looks pretty nice, but it’s, at worst, an evidence-based starting point upon which to build something truly impactful and distinctive!
Just as with ggplot2, we recommend users use ggplot2::ggsave() to export all their graphs from R! Using RStudio’s Plots pane to export graphs makes it harder to control the resolution, width, and size of your exported graphs and to ensure consistency from one version of a graph to the next.
Specifically, we highly recommend specifying a dpi of at least 300 (the default) to ggsave() as well as a width and height relevant to the intended use of the graph.
For example, full page width in a typical scientific journal is approximately 7 inches, and a typical aspect ratio (width:height) is approximate 1.33, yielding a typical height around 5.26 inches. So, a reasonable ggsave() call might look like this:
###HYPOTHETICAL GGSAVE COMMAND
p = ggplot
ggplot2::ggsave(filename = "Myplot.png",
plot = p,
dpi = 600,
width = 7,
height = 5.26)However, as any experienced ggplot2 user knows, a graph’s design may look great in RStudio’s Plots pane, when it’s been sized to one set of dimensions, and then may look terrible when it’s saved sized to a different set of dimensions! In particular, font and line width sizes often do not automatically scale well to a range of output dimensions.
ggplotplus’s theme_plus() function recognizes this and exposes two inputs: export_width and export_height. When using it, specify the ultimate dimensions you intend to export your graph at (e.g., 7 x 5.26), and it’ll adjust the default font and line sizes (though not any custom sizes you’ve provided!) accordingly to look (reasonably) good at your output dimensions:
###PLOTTING WINDOW NOW RESCALED TO A WIDTH OF 7 INCHES X 5.26 INCHES
ggplot(iris,
mapping = aes(x = Petal.Length,
y = Sepal.Length)) +
geom_point(mapping = aes(fill = Species),
alpha = 0.3) +
scale_continuous_plus(
scale = "x",
name = "Petal length (cm)",
thin.labels = TRUE) +
scale_continuous_plus(
scale = "y",
name = "Sepal length (cm)") +
scale_fill_discrete(
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme_plus(
legend.text = element_text(face = "italic"),
begin_discrete = 0,
end_discrete = 1,
export_width = 7, #<--SET THE OUTPUT WIDTH AND HEIGHT YOU EXPECT AND SIZES OF ELEMENTS WILL ADJUST SOMEWHAT ACCORDINGLY (HERE, A LITTLE SMALLER)
export_height = 5.26) +
yaxis_title_plus() +
gridlines_plus()