Tibbles

Previously we have worked with data in the form of

Vectors

Lists

Arrays

Dataframes

What is a Tibble????

“Tibbles” are a new modern data frame. It keeps many important features of the original data frame. It removes many of the outdated features. They are another amazing feature added to R by Hadley Wickham. We will use them in the tidyverse to replace the older outdated dataframe that we just learned about.

Compared to Data Frames

tibble never changes the input type.

No more worry of characters being automatically turned into strings.

A tibble can have columns that are lists.

A tibble can have non-standard variable names.

can start with a number or contain spaces.

To use this refer to these in a backtick.

It only recycles vectors of length 1.

It never creates row names.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.3.2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Warning: package 'tidyr' was built under R version 3.3.2
try <- tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
try
## # A tibble: 3 × 2
##       x          y
##   <int>     <list>
## 1     1  <int [5]>
## 2     2 <int [10]>
## 3     3 <int [20]>

We can see that y is displayed as a list. If we try to do this with a traditional data frame we get:

try <- as_data_frame(c(x = 1:3, y = list(1:5, 1:10, 1:20)))
try
Error: Variables must be length 1 or 20. Problem variables: 'y1', 'y2'

We can use a non standard name in our Tibble as well:

names(data.frame(`crazy name` = 1))
## [1] "crazy.name"
names(tibble(`crazy name` = 1))
## [1] "crazy name"

Notice that the dataframe replaced the name that we wanted because it could not handle a space being in the name.

Coercing into Tibbles

A tibble can be made by coercing as_tibble(). This works similar to as.data.frame(). It is a very efficient process though.

l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters

microbenchmark::microbenchmark(
  as_tibble(l),
  as.data.frame(l)
)
## Unit: microseconds
##              expr      min       lq      mean    median       uq      max
##      as_tibble(l)  309.250  327.099  376.2002  344.7265  386.004 1689.046
##  as.data.frame(l) 1390.507 1464.361 1614.3087 1543.3465 1690.608 3104.097
##  neval cld
##    100  a 
##    100   b

Microbenchmarking is a way to calculate the average times spent on an object. You can see how much faster it is to create a tibble than a dataframe. This will make a large difference in a data analysis.

Tibbles vs Data Frames

There are a couple key differences between tibbles and data frames.

Printing.

Subsetting.

Printing

Tibbles only print the first 10 rows and all the columns that fit on a screen. – Each column displays its data type.

You will not accidentally print too much.

tibble(
  a = lubridate::now() + runif(1e3) * 86400,
  b = lubridate::today() + runif(1e3) * 30,
  c = 1:1e3,
  d = runif(1e3),
  e = sample(letters, 1e3, replace = TRUE)
)
## # A tibble: 1,000 × 5
##                      a          b     c          d     e
##                 <dttm>     <date> <int>      <dbl> <chr>
## 1  2017-02-19 09:02:23 2017-03-09     1 0.02150370     f
## 2  2017-02-19 01:42:10 2017-03-09     2 0.08031493     k
## 3  2017-02-19 05:36:59 2017-03-08     3 0.11670172     u
## 4  2017-02-19 18:49:56 2017-03-09     4 0.24552337     h
## 5  2017-02-19 04:15:06 2017-03-05     5 0.11232662     b
## 6  2017-02-19 10:00:27 2017-03-09     6 0.52834632     m
## 7  2017-02-19 13:42:43 2017-03-16     7 0.78928491     v
## 8  2017-02-19 17:02:27 2017-03-16     8 0.80388276     h
## 9  2017-02-19 15:09:33 2017-03-19     9 0.45767339     d
## 10 2017-02-19 09:14:04 2017-02-25    10 0.18177950     t
## # ... with 990 more rows

Subsetting

We can index a tibble in the manners we are used to

df$x

df[["x"]]

df[[1]]

We can also use a pipe which we will learn about later.

df %>% .$x

df %>% .[["x"]]

df <- tibble(
  x = runif(5),
  y = rnorm(5)
)

df$x
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df[["x"]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df[[1]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486

The above commands should seem very familiar after the previous work but wit the piping or chaining we can do the same:

df %>% .$x
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[["x"]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[[1]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486