Constructing Graphs

So far with ggplot2 we have seen a lot of the different tools and capabilities that is has. We have not however discussed how we proceed to build a good graphic. This example comes from Hadley Wickham

We will begin by looking at what every graph needs:

Data

Aesthetics

Then we will look at other features that we may want to add:

Stat transforms

Position Adjustments

Data

All graphs need data in the form of a data frame. Many times ggplot will perform behind the scenes operations on the data and create a new set in the background.

Aesthetic Mappings

Aesthetic mappings are defined by aes(). These describe how variables are mapped to visual properties. In short we can map data to x and y values, color, sizes and shapes. We can call aesthetics in the initial call or in multiple layers.

Examples of Aesthetics

We will use the code below to look at the various ways in which we can add aesthetics into graphs:

library(gridExtra)
ggplot(data, aes(dep_delay, arr_delay, colour = carrier)) +
    geom_point()
ggplot(data, aes(dep_delay, arr_delay)) +
    geom_point(aes(colour = carrier))
ggplot(data, aes(dep_delay)) +
    geom_point(aes(y = arr_delay, colour = carrier))
ggplot(data) +
    geom_point(aes(dep_delay, arr_delay, colour = carrier))

1. The first graph we will see is how we first made a scatter plot. We use the colour=carrier in the original aes() function.

2. The second graph we we will see is where we just specified the points at the first aes() function, then at the geom_point() layer we color by carrier.

3. In the third plot we initially start with just the x axis data and then in the geom_point() layer we add the y and the color by carrier.

4. Finally in the last graph we just add of the data in and then specify the points and coloring.

Note that you cannot tell the difference between these graphs. All of them display the same aesthetics and the same data, however the order in which you add things in differed. The end result is the same

Aesthetics in Layers

We can add override or remove aesthetic mappings depending on what we are doing.

Operation Layer Aesthetics Result
Add aes(color = carrier) aes(data, arr_delay, color=carrier)
Override aes(y = dep_delay) aes(data, dep_delay)
Remove aes(y=NULL) aes(data)

Below we will see some examples of this.

library(gridExtra)
p1 = ggplot(data, aes(dep_delay, arr_delay, colour = carrier)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    theme(legend.position = "none")

p2 = ggplot(data, aes(dep_delay, arr_delay)) +
    geom_point(aes(colour = carrier)) +
    geom_smooth(method = "lm", se = FALSE) +
    theme(legend.position = "none")

grid.arrange(p1,p2, ncol=2)

In the graph on left we will see that we have added the color to the carrier in the initial calling of the function. This leads to having a smoothing for each of the carriers. In the graph on the right we only color the points by carrier and therefore the smooth is over the entire data which was not specified to be split into groups.

What happens here is that if you add the aes() in the initial calling of ggplot then the feature carries through all the layers. If you add aes() in a layer then the aesthetics are for that particular layer.

Settings vs Mappings

We can map an aesthetic to a certain variables or we can set it to be a constant.

library(gridExtra)
p1 = ggplot(data, aes(dep_delay, arr_delay)) +
      geom_point(color = "darkblue")
p2 = ggplot(data, aes(dep_delay, arr_delay)) + 
      geom_point(aes(color="darkblue"))
grid.arrange(p1,p2, ncol=2)

In the first plot we can see setting an aesthetic of the color dark blue. In the second plot we create a new variable called darkblue, since this only has one value it returns a pinkish color in scale.

We could also map the value and then override the default scale

ggplot(data, aes(dep_delay, arr_delay))+
      geom_point(aes(color="darkblue")) + 
      scale_color_identity()

Other Aesthetic Mappings

Sometimes we map aesthetics to constant values. This allows us to distinguish between layers.

ggplot(data, aes(dep_delay, arr_delay)) + 
    geom_point() + 
    geom_smooth(aes(color="lm"), method="lm", se=F) + 
    geom_smooth(aes(color="loess"), method="loess", se=F) + 
  labs(color = "Method")

We can see here that we now have added 2 smooth layers and we asked that it be colored by that particular smooth.

Statistical Transforms

Many times we wish to do more than what we have seen at this point. We wish to add different statistical features to the graph. stat transforms the data. This is typically just a summary of some sort. Useful ones are smoothing or identity. You typically do not call them directly but the geom does.

ggplot(data, aes(carrier, dep_delay)) + 
    geom_point() + 
    stat_summary(geom = "point", fun.y = "mean", color = "red", size = 3)

ggplot(data, aes(carrier, dep_delay)) + 
    geom_point()+ 
    geom_point(stat = "summary", fun.y = "mean", color = "red", size = 3)

You can see that we have called one layer with a stat_summary() function and asked for the mean. Both of these produce the graph below.

Position Adjustments

We can use position adjustments to tweak the position of elements.

For example with bars:

position_stack() stack overlapping bars

position_fill() stack overlapping bars and scale to 1

position_dodge() place overlapping bars next to each other.

dplot <- ggplot(diamonds, aes(color, fill = cut)) +
    xlab(NULL) + ylab(NULL) + theme(legend.position = "none")
    # position stack is the default for bars, so geom_bar()
    # is equivalent to geom_bar(position = "stack").
    p1 = dplot + geom_bar()
    p2 = dplot + geom_bar(position = "fill")
    p3 = dplot + geom_bar(position = "dodge")
grid.arrange(p1,p2,p3, ncol=3)

In the three graphs we will see the differences between the different position functions:

## Error in eval(expr, envir, enclos): could not find function "grid.arrange"