# Lists in R

Within R a list is a structure that can combine objects of different types. We will learn how to create and work with lists in this section.

## Creating Lists

A list is actually a vector but it does differ in comparison to the other types of vectors which we have been using in this class.

Other vectors are atomic vectors

A list is a type of vector called a recursive vector.

### An Example Database

We first consider a patient database where we want to store their

Name

Amount of bill due

A Boolean indicator of whether or not they have insurance.

We then have 3 types of information here:

character

numerical

logical.

To create a list of one patient we say

a <- list(name="Angela", owed="75", insurance=TRUE)
a
## $name ## [1] "Angela" ## ##$owed
## [1] "75"
##
## $insurance ## [1] TRUE ## Indexing Lists With vectors, arrays and matrices we saw that indexing them was very similar with the exception of dimensions. However a list is very different. Notice that unlike a typical vector this prints out in multiple parts. This also allows us to help with indexing as we will see below. There is another easy way to create this same list Note that below we us double brackets and a character in order to index:  a.alt <- vector(mode="list") a.alt[["name"]] <- "Angela" a.alt[["owed"]] <- 75 a.alt[["insurance"]] <- TRUE a.alt ##$name
## [1] "Angela"
##
## $owed ## [1] 75 ## ##$insurance
## [1] TRUE

We could then create a list like this for all of our patients. Our database would then be a list of all of these individual lists.

## Operations on Lists

With vectors, arrays and matrices, there was really only one way to index them. However with lists there are multiple ways:

Below are three different ways in which we can index a list:

a[["name"]]
## [1] "Angela"
a[[1]]
## [1] "Angela"
a$name ## [1] "Angela" ## Double vs Single Brackets All of the previous are ways to index data in a list. Notice that in two of the above we used double brackets. Next we see the difference between double and single brackets. a[1] ##$name
## [1] "Angela"
class(a[1])
## [1] "list"

With the single bracket we have a list with the name element only.

a[[1]]
## [1] "Angela"
class(a[[1]])
## [1] "character"

Now with double brackets we actually extract our value out and have a character. So the single bracket returns a list with your indexed object(s) contained in it and the double bracket returns the element with the particular class that represents that element. Depending on your goals you may want to use single or double brackets.

a$age <- 27 a ##$name
## [1] "Angela"
##
## $owed ## [1] "75" ## ##$insurance
## [1] TRUE
##
## $age ## [1] 27 In order to delete an element from a list we set it to NULL. a$owed <- NULL
a
## $name ## [1] "Angela" ## ##$insurance
## [1] TRUE
##
## $age ## [1] 27 ## List Components and Values In order to know what kind of information is included in a list we can look at the names() function names(a) ## [1] "name" "insurance" "age" ## Unlisting To find the values of things we could go ahead and unlist them a.un <- unlist(a) a.un ## name insurance age ## "Angela" "TRUE" "27" class(a.un) ## [1] "character" If There is Character data in the original list that unlisted everything will be in character format. If your list contained all numerical elements than the class would be numerical. ## Applying Functions to Lists Just like arrays and matrices we can use an apply() function. Specifically we have lapply() and sapply()functions for lists. With the original apply() function we could specify whether the function was applied to either the rows or the columns. With the case of lists both functions are applied to elements of the list. We will create the list n below: #Number list n <- list(1:5, 6:37) n ## [[1]] ## [1] 1 2 3 4 5 ## ## [[2]] ## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ## [24] 29 30 31 32 33 34 35 36 37 With this list we see that we have two separate vectors of numbers included. Then let us see the results of either using lapply() and sapply() lapply(n, median) ## [[1]] ## [1] 3 ## ## [[2]] ## [1] 21.5 The lapply() function returns a list with the median of each of the original lists. sapply(n, median) ## [1] 3.0 21.5 While the sapply() function returns a vector of the medians. ## Recursive Lists Earlier it was mentioned that a list is a recursive vector. This is because we can actually have lists within lists. For example let us go back to our patient data. s <- list(name="Chandra", insurance="TRUE", age=36) patients <- list(a,s) patients ## [[1]] ## [[1]]$name
## [1] "Angela"
##
## [[1]]$insurance ## [1] TRUE ## ## [[1]]$age
## [1] 27
##
##
## [[2]]
## [[2]]$name ## [1] "Chandra" ## ## [[2]]$insurance
## [1] "TRUE"
##
## [[2]]$age ## [1] 36 ## Final Notes on Lists It is important to remember how we can call these features of lists. Many of you will want to use R for model building and regressions. You almost never want to use the generated output from R. For example R does not automatically return the confidence intervals with a regression. The output from most regression functions in R is actually a list. What this means is I can extract the elements from the list that I want in order to build tables that display the exact information that I want it to. This is why we take the time to discuss how to search what is in a list and how to access it. ## Example with Output of a List x <- rnorm(500,10, 3) y <- 3*x + rnorm(500, 0, 2) ## Example with Output of a List fit <- lm(y~x) fit ## ## Call: ## lm(formula = y ~ x) ## ## Coefficients: ## (Intercept) x ## -0.3911 3.0224 ## Example with Output of a List So R just gave me the coefficients back but no other information. This means my knowledge of accessing lists is key. names(fit) ## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "xlevels" "call" "terms" "model" ## Example with Output of a List I can see that R actually has a lot more information that they did not display for me. Next I consider a function where it summarizes the information from this model ## Example with Output of a List summary <- summary(fit) summary ## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.0785 -1.3921 0.0552 1.4027 6.3620 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3911 0.3169 -1.234 0.218 ## x 3.0224 0.0305 99.082 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.072 on 498 degrees of freedom ## Multiple R-squared: 0.9517, Adjusted R-squared: 0.9516 ## F-statistic: 9817 on 1 and 498 DF, p-value: < 2.2e-16 ## Example with Output of a List names(summary) ## [1] "call" "terms" "residuals" "coefficients" ## [5] "aliased" "sigma" "df" "r.squared" ## [9] "adj.r.squared" "fstatistic" "cov.unscaled" ## Conclusion of Lists R has so much information about regression that is never even displayed unless I dig deeper. Understanding lists and accessing information means you can output custom tables that look much more professional than what R gives you. # Quick Check Practice  set.seed(1234) x = rnorm(500, 10, 3) y = runif(1)*x + rnorm(500,0,3.4) model = lm(y~x)   # 1. Find the summary of model and assign it as summary. # 2. What does list summary contain? Extract the coefficients summary and assign this as coeff. # 3. Extract the coefficients summary and assign this as coeff. # 4. Print coeff. # 5. What is the class of coeff? # 6. From coeff extract the column of p-values.   # 1. Find the summary of model and assign it as summary. summary = summary(model) # 2. What does list summary contain? names(summary) # 3. Extract the coefficients summary and assign this as coeff. coeff = summary$coefficients # 4. Print coeff. coeff # 5. What is the class of coeff? typeof(coeff) # 6. From coeff extract the column of p-values. coeff[,4] 

 test_error() test_correct({ test_object("summary") }, { test_function("summary") }) test_function("names") test_object("coeff") test_function("typeof") test_output_contains("coeff[,4]") success_msg("Great Job") 

# On Your Own: Swirl Practice

In order to learn R you must do R. Follow the steps below in your RStudio console:

1. Run this command to pick the course:
swirl()

You will be promted to choose a course. Type whatever number is in front of 02 Getting Data. This will then take you to a menu of lessons. For now we will just use lesson 4. Type 4 to choose lapply and sapply then follow all the instructions until you are finished.

Once you are finished with the lesson come back to this course and continue.