Vectors in R

The fundamental data type in R is the vector. A vector is a sequence of data elements all of the same type.

Creating Vectors

There are various ways to create vectors but one of the most common is the concatenation operator. This takes arguments and places them all in a vector.

 x <- c(1,5,2, 6)
 x
## [1] 1 5 2 6
 is.vector(x)
## [1] TRUE

Note that c() orders the values in the vector in the order in which they were entered.

Vector Arithmetic

We can do arithmetic with vectors in a similar manner as we have with integers. When we use operators we are doing something element by element or “elementwise.”

 y <- c(1,6,4,8)
 x+y
## [1]  2 11  6 14

Notice that we did not add all of the values together but we added both of the first values from x and y, then the second values and so on. This is important to note because you must make sure whether you wish to have elementwise operations or not:

 x*y
## [1]  1 30  8 48
 x/y
## [1] 1.0000000 0.8333333 0.5000000 0.7500000
x %% y
## [1] 0 5 2 6

Recycling

Recycling can be a very big problem within data analysis. Recycling is what happens when performing operations on vectors that do not have the same length. Let’s consider what happens:

 z<- c(1,2 ,6 ,8, 9, 10)
 
 x+z
## Warning in x + z: longer object length is not a multiple of shorter object
## length
## [1]  2  7  8 14 10 15

Notice that R does warn us but you may not always catch this warning in a large data analysis. Knowing your data is very important. Our intuition would make us think that we could not perform this operation when the length of both vectors are not the same.

R does this operation still by re-writing x such that we have x <- c(1 , 5, 2, 6 , 1, 5). This is what we call recycling. The shorter vector is increased by repeating the values until it matches the length of the longer vector.

x+ z
## Warning in x + z: longer object length is not a multiple of shorter object
## length
## [1]  2  7  8 14 10 15
c(1 , 5, 2, 6 , 1, 5) + z
## [1]  2  7  8 14 10 15

Functions on Vectors

We considered functions on specific data values but we can actually put vectors into most functions in R. One of the simplest functions can help us with knowing information about Recycling that we encountered before. This is the length() function.

length(x)
## [1] 4
length(y)
## [1] 4
length(z)
## [1] 6

Then length vector is very important with the writing of functions which we will get to in a later unit. We can use any() and all() in order to answer logical questions about elements

any(x>3)
## [1] TRUE

We see that there must be at least one x that is greater than 3.

all(x>3)
## [1] FALSE

However, not all values of x are larger than 3.

Other Functions for Vectors

There area various other functions that can be run on vectors, some of these you may seen before:

mean() finds the arithmetic mean of a vector.

median() finds the median of a vector.

sd() and var() finds the standard deviation and variance of a vector respectively.

min() and max() finds the minimum and maximum of a vector respectively.

sort() returns a vector that is sorted.

summary() returns a 5 number summary of the numbers in a vector.

The which() Function

Some functions help us work with the data more to return values in which we are interested in. For example, above we asked if any elements in vector x were greater than 3. The which() function will tell us the elements that are.

 which(x>3)
## [1] 2 4

Quick Check Practice


#You will find out more about the runif command in a few weeks.
set.seed(1234)
x = runif(5000, 1, 8)


# Do Not Print X as it is a long vector
# 1. What is the length of x?
# 2. How many values in your vector x are below 2?
# 3. How many values in your vector x are above 7?
# 4. How many values in your vector x are below 3 or above 8?


# Do Not Print X as it is a long vector
# 1. What is the length of x?
length(x)
# 2. How many values in your vector x are below 2?
length(which(x<2))
# 3. How many values in your vector x are above 7?
length(which(x>7))
# 4. How many values in your vector x are below 3 or above 8?
length(which(x<3 | x>8))


test_error()
test_output_contains("5000")
test_function("which", args="x")
test_output_contains("696")
test_function("which", args="x")
test_output_contains("729")
test_function("which", args="x")
test_output_contains("1411")
success_msg("Great Job")

Use your knowledge of indexing and functions that return Booleans.

On Your Own: Swirl Practice

In order to learn R you must do R. Follow the steps below in your RStudio console:

Run this command to pick the course:

swirl()

You will be promted to choose a course. Type whatever number is in front of 02 Getting Data. This will then take you to a menu of lessons. For now we will just use lesson 1. Type 1 to choose Vectors then follow all the instructions until you are finished.

Once you are finished with the lesson come back to this course and continue.

Indexing Vectors

We can call specific elements of a vector by using the following:

x[] is a way to call up a specific element of a vector.

x[1] is the first element.

x[3] is the third element.

x[-3] is a vector with everything but the third element.

We can start of by checking what we have stored so far:

 ls()
## [1] "x" "y" "z"

Now, that we see the vectors available we can try indexing x:

 x[3]
## [1] 2
 x[-3]
## [1] 1 5 6

Note that x[3] returns the third element and x[-3] returns everything but the third element.

Replacing Values

We have seen how to subtract an element from a vector but we can use the same information to place it back in. We start with the same vector x that we started with.

 x
## [1] 1 5 2 6
 x<-x[-3]
 x
## [1] 1 5 6

Now we have removed the 3rd value. We can then add the original element back in

 x <- c(x[1:2], 2, x[3])
 x
## [1] 1 5 2 6

Indexing with Booleans

Before we used any(x > 3) and which(x > 3). Now we can see not only their position in the vector, but indexing allows us to return their values.

 x[x > 3]
## [1] 5 6

Quick Check Practice


#You will find out more about the runif command in a few weeks.
set.seed(1234)
x = runif(5000, 1, 8)


# Do Not Print X as it is a long vector
# 1. How many times does the value 2 occur?
# 2. What are the 3400th - 3402th elements in your vector?
# 3. What is the smallest value in x?
# 4. What is the largest value in x?


# Do Not Print X as it is a long vector
# 1. How many times does the value 2 occur?
length(which(x==2))
# 2. What are the 3400th - 3402th elements in your vector?
x[3400:3402]
# 3. What is the smallest value in x?
min(x)
# 4. What is the largest value in x?
max(x)


test_error()
test_function("length", args="x")
test_function("which", args="x")
test_output_contains("0")
test_output_contains("x[3400:3402]")
test_output_contains("1.002393")
test_output_contains("7.99")
success_msg("Great Job")

Use your knowledge of indexing and functions that return Booleans.

More Ways to Create Vectors

There are multiple ways we can create a vector but we must let R know what we are doing

 w
Error: object 'w' not found

We see that w does not exist right now but we can add values to it:

w <- NULL
w
## NULL

We define w to be empty

 w[1] <- 3
 w[2] <- 15
 w
## [1]  3 15

When we are working with large amounts of data in can be helpful to tell R exactly how long we wish to have the vector before filling it

 w1 <- vector(length=2)
 w1[1] <- 3
 w1[2] <- 15
 
 w1
## [1]  3 15

We can also use these methods below with concatenation and sequences

 w2 <-  c(3,15)
 w2
## [1]  3 15
 w3 <- seq(from=3, to=15, length=2)
 w3
## [1]  3 15
 w4 <- seq(from=3, to=15, by=12)
 w4
## [1]  3 15

Aside from these ways to create the specific vector of (3,15) we can create vectors a couple more ways as well

 w5 <- 3:10
 w5
## [1]  3  4  5  6  7  8  9 10
 w6 <- rep(8,6)
 w6
## [1] 8 8 8 8 8 8

We can also repeat a vector

 w7 <- rep(c(1,2,3),3)
 w7
## [1] 1 2 3 1 2 3 1 2 3

Naming Vector Elements

With vectors it can be important to assign names to the values. Then when doing plots or considering maximum and minimums, instead of being given a numerical place within the vector we can be given a specific name of what that value represents. For example say that vector x represents the number of medications of 4 unique patients. We could then use the name() function to assign names to the values

x
## [1] 1 5 2 6
names(x)
## NULL
names(x) <- c("Patient A", "Patient B", "Patient C", "Patient D")
x
## Patient A Patient B Patient C Patient D 
##         1         5         2         6

On Your Own: Swirl Practice

In order to learn R you must do R. Follow the steps below in your RStudio console:

  1. Run this command to pick the course:
swirl()

You will be promted to choose a course. Type whatever number is in front of 02 Getting Data. This will then take you to a menu of lessons. For now we will just use lesson 3. Type 3 to choose vapply and tapply then follow all the instructions until you are finished.

Once you are finished with the lesson come back to this course and continue.