# Constructing Graphs

So far with `ggplot2` we have seen a lot of the different tools and capabilities that is has. We have not however discussed how we proceed to build a good graphic. This example comes from Hadley Wickham

We will begin by looking at what every graph needs:

Data

Aesthetics

Then we will look at other features that we may want to add:

Stat transforms

## Data

All graphs need data in the form of a data frame. Many times ggplot will perform behind the scenes operations on the data and create a new set in the background.

## Aesthetic Mappings

Aesthetic mappings are defined by `aes()`. These describe how variables are mapped to visual properties. In short we can map data to `x` and `y` values, color, sizes and shapes. We can call aesthetics in the initial call or in multiple layers.

### Examples of Aesthetics

We will use the code below to look at the various ways in which we can add aesthetics into graphs:

``````library(gridExtra)
ggplot(data, aes(dep_delay, arr_delay, colour = carrier)) +
geom_point()
ggplot(data, aes(dep_delay, arr_delay)) +
geom_point(aes(colour = carrier))
ggplot(data, aes(dep_delay)) +
geom_point(aes(y = arr_delay, colour = carrier))
ggplot(data) +
geom_point(aes(dep_delay, arr_delay, colour = carrier))``````

1. The first graph we will see is how we first made a scatter plot. We use the `colour=carrier` in the original `aes()` function.

2. The second graph we we will see is where we just specified the points at the first `aes()` function, then at the `geom_point()` layer we color by carrier.

3. In the third plot we initially start with just the `x` axis data and then in the `geom_point()` layer we add the `y` and the color by carrier.

4. Finally in the last graph we just add of the data in and then specify the points and coloring. Note that you cannot tell the difference between these graphs. All of them display the same aesthetics and the same data, however the order in which you add things in differed. The end result is the same

#### Aesthetics in Layers

We can add override or remove aesthetic mappings depending on what we are doing.

Operation Layer Aesthetics Result
Add `aes(color = carrier)` `aes(data, arr_delay, color=carrier)`
Override `aes(y = dep_delay)` `aes(data, dep_delay)`
Remove `aes(y=NULL)` `aes(data)`

Below we will see some examples of this.

``````library(gridExtra)
p1 = ggplot(data, aes(dep_delay, arr_delay, colour = carrier)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme(legend.position = "none")

p2 = ggplot(data, aes(dep_delay, arr_delay)) +
geom_point(aes(colour = carrier)) +
geom_smooth(method = "lm", se = FALSE) +
theme(legend.position = "none")

grid.arrange(p1,p2, ncol=2)``````

In the graph on left we will see that we have added the color to the carrier in the initial calling of the function. This leads to having a `smoothing` for each of the carriers. In the graph on the right we only color the points by carrier and therefore the smooth is over the entire data which was not specified to be split into groups. What happens here is that if you add the `aes()` in the initial calling of `ggplot` then the feature carries through all the layers. If you add `aes()` in a layer then the aesthetics are for that particular layer.

## Settings vs Mappings

We can map an aesthetic to a certain variables or we can set it to be a constant.

``````library(gridExtra)
p1 = ggplot(data, aes(dep_delay, arr_delay)) +
geom_point(color = "darkblue")
p2 = ggplot(data, aes(dep_delay, arr_delay)) +
geom_point(aes(color="darkblue"))
grid.arrange(p1,p2, ncol=2)``````

In the first plot we can see setting an aesthetic of the color dark blue. In the second plot we create a new variable called darkblue, since this only has one value it returns a pinkish color in scale. We could also map the value and then override the default scale

``````ggplot(data, aes(dep_delay, arr_delay))+
geom_point(aes(color="darkblue")) +
scale_color_identity()`````` ## Other Aesthetic Mappings

Sometimes we map aesthetics to constant values. This allows us to distinguish between layers.

``````ggplot(data, aes(dep_delay, arr_delay)) +
geom_point() +
geom_smooth(aes(color="lm"), method="lm", se=F) +
geom_smooth(aes(color="loess"), method="loess", se=F) +
labs(color = "Method")``````

We can see here that we now have added 2 smooth layers and we asked that it be colored by that particular smooth. ## Statistical Transforms

Many times we wish to do more than what we have seen at this point. We wish to add different statistical features to the graph. `stat` transforms the data. This is typically just a summary of some sort. Useful ones are smoothing or identity. You typically do not call them directly but the `geom` does.

``````ggplot(data, aes(carrier, dep_delay)) +
geom_point() +
stat_summary(geom = "point", fun.y = "mean", color = "red", size = 3)

ggplot(data, aes(carrier, dep_delay)) +
geom_point()+
geom_point(stat = "summary", fun.y = "mean", color = "red", size = 3)``````

You can see that we have called one layer with a `stat_summary()` function and asked for the mean. Both of these produce the graph below. We can use position adjustments to tweak the position of elements.

For example with bars:

`position_stack()` stack overlapping bars

`position_fill()` stack overlapping bars and scale to 1

`position_dodge()` place overlapping bars next to each other.

``````dplot <- ggplot(diamonds, aes(color, fill = cut)) +
xlab(NULL) + ylab(NULL) + theme(legend.position = "none")
# position stack is the default for bars, so geom_bar()
# is equivalent to geom_bar(position = "stack").
p1 = dplot + geom_bar()
p2 = dplot + geom_bar(position = "fill")
p3 = dplot + geom_bar(position = "dodge")
grid.arrange(p1,p2,p3, ncol=3)``````

In the three graphs we will see the differences between the different position functions:

``## Error in eval(expr, envir, enclos): could not find function "grid.arrange"``