Exercises for 1.2: Variables & the R workspace

Exercise 1.2.1: Introduction to variables

  • Create a variable called potato whose value corresponds to the number of potatos you’ve eaten in the last week. Or something equally stupid. It doesn’t matter.
  • Print out the value of potato by typing the variable name.
  • Do it again using the print() function
  • Calculate the square root of potato using the sqrt() function.
  • Print out the value of potato again to verify that the value of potato hasn’t changed
  • Reassign the value of potato to potato * 2
  • Print out the new value of potato to verify that it has changed

Solution 1.2.1:

Okay let’s create the potato varible and print it out using two different commands

potato <- 20
potato
## [1] 20
print(potato)
## [1] 20

Let’s calculate the square root of potato and then verify that this hasn’t changed the value of potato itself

sqrt(potato)
## [1] 4.472136
potato
## [1] 20

Now let’s double the number of potatos stored in potato, and check that we have successfully changed the value:

potato <- potato * 2
potato
## [1] 40

Bonus 1.2.1:

R actually has several different ways of assigning variables. You can do assignments with a right-pointing arrow -> too. Here’s an example:

potato * 2 -> potato
potato
## [1] 80

Another possibility that you’ll see people doing sometimes is using = instead of <-. That’s totally legitimate, but there are some nitpicky people who feel that <- is the more appropriate one to use. The two behave identically except in special cases:

potato = potato * 2
potato
## [1] 160

Personally I prefer to use <- when making variable assignments, and to use = when specifying arguments within a function. I find it cleaner. If you want to use = for both purposes you can. Just dont use <- within function calls like this sqrt(x <- 10), not unless you know what you’re doing: this produces slightly odd behaviour which can be very handy for experienced users, but will be confusing to novices.

Oh, and since we’re talking about it you can also use the assign() function:

assign( 'potato', potato*2 )
potato
## [1] 320

And one last warning: R also has two additional operators <<- and ->>. These operators will search every environment in the scope chain until they find an existing variable with the to-be-assigned name, and reassign it within that environment. This is powerful but dangerous behaviour: don’t use these operators at all unless you actually understand what that last sentence means!

Exercise 1.2.2: Variables of different types

  • you’ve already created a numeric variable: now try making a character (string) variable and a logical variable
  • try creating (numeric) variables with “special” values: Inf (infinity), -Inf (minus infinity), NaN (“not a number”)
  • try creating a variable with a “missing” value NA
  • try creating a variable with a “non-existant” value NULL

Solution 1.2.2:

Here’s an example of creating a character variable, and two different ways of specifying a logical variable:

food <- "potato and carrot"
food
## [1] "potato and carrot"
isFoodGood <- TRUE
isFoodGood
## [1] TRUE
isThePartyRight <- 2+2 == 5
isThePartyRight
## [1] FALSE

Directly specifying an infinite number, or doing it indirectly as the result of a calculation that yields an infinite answer:

x1 <- Inf
x1
## [1] Inf
x2 <- 1/0
x2
## [1] Inf

Negative infinity is the same:

y1 <- -Inf
y1
## [1] -Inf
y2 <- -1/0
y2
## [1] -Inf

“Not a number” is essentially the same concept as “undefined” in mathematics. It corresponds to the resutls of mathematical calculations that mathematicians have deemed to be “meaningless”. The canonical example is 0/0:

z1 <- NaN
z1
## [1] NaN
z2 <- 0/0
z2
## [1] NaN

NA (not available) is something you’ll run into a lot. It’s how R refers to missing data. Here we’re talking about the statistical concept of “missingness”: notionally there does exist an actual value that should go here (e.g., what that person would have said if they’d answered the question, or the RT that would have been recorded if my machine hadn’t broken):

m <- NA
m
## [1] NA

NULL also refers to a kind of missingness, but it’s not a statistical concept at all. It’s more of a computing concept. It says that the variable in question either doesn’t exist, or it fundamentally has no value. It’s not like NA, in which the value does “exist” in some sense (you just don’t know it or didn’t obtain it), and it’s not like NaN in which the “value” does exist, and you know that it’s just not a meaningful number. NULL is the way a computer represents the fact that you know that there an absence of value of any kind. Anyway…

n <- NULL
n
## NULL

Bonus 1.2.2:

There’s a few niceties about character data that you need to know about. For instance, notice that we have to use " or ' to tell R that the enclosed data is a string (i.e., character data). What do we do if we need to include " or ' as actual characters in the string? There are a few solutions. First, if you only need to include single quotes or double quotes, then use the other one as your “quoting” character!

name <- "O'Malley"
name
## [1] "O'Malley"
quote <- '"It was the best of times, it was the blurst of times" -The Simpsons'
quote
## [1] "\"It was the best of times, it was the blurst of times\" -The Simpsons"

Okay, that probably doesn’t look like what you’d intended the quote to look like, right? But that’s actually misleading: the print() function in R (which is what is actually displaying the results even though all you did was type the variable name) is displaying the quote in a “logical” form similar to what you’d type. If you want to see a “raw” output, use the cat() function, like this:

cat(quote)
## "It was the best of times, it was the blurst of times" -The Simpsons

If you need to do both, then you can instruct R to treat the quote mark as an actual character by placing the escape character \ in front of it, like this:

annoying <- '"O\'Malley\'s Bar" is a Nick Cave song'
cat( annoying )
## "O'Malley's Bar" is a Nick Cave song

But what if you want to include an \ in the text? You have to escape that too, like this:

alsoAnnoying <- 'The escape character is \\'
cat( alsoAnnoying )
## The escape character is \

Exercise 1.2.3: Creating vectors

  • Create a numeric vector with three elements using c()
  • Create a character vector with three elements using c()
  • Create a numeric vector called age whose elements contain the ages of three people you know, where the names of each element correspond to the names of those people

Digression 1.2.3

At this point I’m starting to get a little tired of typing all the variable names out over and over to get R to display the results, so I’m going to use a “trick”. If you enclose the entire command in parentheses, the result gets printed to screen. In other words, this doesn’t print anything out

x<-1

but this does:

(x<-1)
## [1] 1

I’ll start using that trick a lot from now on.

Solution 1.2.3

(numbers <- c(6,2,7))
## [1] 6 2 7
(words <- c("robocop","is","bleeding"))
## [1] "robocop"  "is"       "bleeding"
(age <- c( "dan"=36, "alex"=4, "fiona"=0 ))
##   dan  alex fiona 
##    36     4     0

Exercise 1.2.4: Indexing vectors

  • use “indexing by number” to get R to print out the first element of one of the vectors you created in Exercise 1.2.3.
  • use negative indices to get R to print out everything except the first element of that vector
  • use logical indexing to return all the ages of all people in age greater than (say) 25 (or some other number if that makes the results more interesting)
  • use indexing by name to return the age of one of the people whose ages you’ve stored in age

Solution 1.2.4:

words[1]
## [1] "robocop"
words[-1]
## [1] "is"       "bleeding"
age[age > 25]
## dan 
##  36
age["alex"] 
## alex 
##    4

Bonus 1.2.4:

Everything in R is case-sensitive. If you try to refer to the variable age as Age or AGE, R won’t be able to find it. The same is true for variable names. So if I’d referred to "Alex" rather than“alex”` R would be unable to find the relevant value. Here are some examples:

AGE[1]
age["Alex"]

Unfortunately, I used R Markdown to write these exercises, and the newer versions of R Markdown are smart enough to detect errors in the code. So I can’t actually show you the results of the commands above, because they produce errors!

Exercise 1.2.5: Variables inside data frames

For this exercise, we’ll use one of the data frames that comes bundled with R, rather than trying to create a new one. The airquality data frame contains 153 cases and 6 variables. You can’t actually see it in the workspace because R is storing it in a “hidden” location (sort of).

  • Type airquality at the command line to see what it looks like. (I won’t include the output for this in the solution set because it’s 153 lines long!)
  • Use the $ method to print out the Wind variable in airquality
  • Print out the third element of the Wind variable

Solution 1.2.5:

airquality$Wind
##   [1]  7.4  8.0 12.6 11.5 14.3 14.9  8.6 13.8 20.1  8.6  6.9  9.7  9.2 10.9
##  [15] 13.2 11.5 12.0 18.4 11.5  9.7  9.7 16.6  9.7 12.0 16.6 14.9  8.0 12.0
##  [29] 14.9  5.7  7.4  8.6  9.7 16.1  9.2  8.6 14.3  9.7  6.9 13.8 11.5 10.9
##  [43]  9.2  8.0 13.8 11.5 14.9 20.7  9.2 11.5 10.3  6.3  1.7  4.6  6.3  8.0
##  [57]  8.0 10.3 11.5 14.9  8.0  4.1  9.2  9.2 10.9  4.6 10.9  5.1  6.3  5.7
##  [71]  7.4  8.6 14.3 14.9 14.9 14.3  6.9 10.3  6.3  5.1 11.5  6.9  9.7 11.5
##  [85]  8.6  8.0  8.6 12.0  7.4  7.4  7.4  9.2  6.9 13.8  7.4  6.9  7.4  4.6
##  [99]  4.0 10.3  8.0  8.6 11.5 11.5 11.5  9.7 11.5 10.3  6.3  7.4 10.9 10.3
## [113] 15.5 14.3 12.6  9.7  3.4  8.0  5.7  9.7  2.3  6.3  6.3  6.9  5.1  2.8
## [127]  4.6  7.4 15.5 10.9 10.3 10.9  9.7 14.9 15.5  6.3 10.9 11.5  6.9 13.8
## [141] 10.3 10.3  8.0 12.6  9.2 10.3 10.3 16.6  6.9 13.2 14.3  8.0 11.5
airquality$Wind[3]
## [1] 12.6

Exercise 1.2.6: Working with data frames

  • Create a new data frame called aq that includes only the first 10 cases. Hint: typing c(1,2,3,4,5,6,7,8,9,10) is tedious. R allows you to use 1:10 as a shorthand method!
  • Use logical indexing to print out all days (ie. cases) in aq where the Ozone level was higher than 20. (Note how the output deals with the NA values)
  • Use subset() to do the same thing. Notice the difference in the output.
  • Create a TooWindy variable inside aq, which is a logical variable that is TRUE if Windy is greater than 10, and FALSE otherwise
  • Delete that variable

Solution 1.2.6

(aq <- airquality[1:10,]) 
##    Ozone Solar.R Wind Temp Month Day
## 1     41     190  7.4   67     5   1
## 2     36     118  8.0   72     5   2
## 3     12     149 12.6   74     5   3
## 4     18     313 11.5   62     5   4
## 5     NA      NA 14.3   56     5   5
## 6     28      NA 14.9   66     5   6
## 7     23     299  8.6   65     5   7
## 8     19      99 13.8   59     5   8
## 9      8      19 20.1   61     5   9
## 10    NA     194  8.6   69     5  10
aq[3,"Wind"]
## [1] 12.6
aq[3,3]
## [1] 12.6
aq[ aq$Ozone > 20, ]
##      Ozone Solar.R Wind Temp Month Day
## 1       41     190  7.4   67     5   1
## 2       36     118  8.0   72     5   2
## NA      NA      NA   NA   NA    NA  NA
## 6       28      NA 14.9   66     5   6
## 7       23     299  8.6   65     5   7
## NA.1    NA      NA   NA   NA    NA  NA

The reason why we ended up with two rows with NA values everywhere is that there are two cases (5 and 10) where the Ozone level is missing. So R doesn’t know if it should include those cases or not. By default then, what it does is replace the whole row with NA values. This makes a certain kind of mechanical sense, but isn’t always what humans want. Usually you’d just want it to ignore those rows. If you run into that situation, it’s worth using the subset() function:

subset( aq, Ozone > 20 )
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 6    28      NA 14.9   66     5   6
## 7    23     299  8.6   65     5   7

The output here is probably a lot closer to what you want in most cases.

aq$TooWindy <- aq$Wind > 10
aq
##    Ozone Solar.R Wind Temp Month Day TooWindy
## 1     41     190  7.4   67     5   1    FALSE
## 2     36     118  8.0   72     5   2    FALSE
## 3     12     149 12.6   74     5   3     TRUE
## 4     18     313 11.5   62     5   4     TRUE
## 5     NA      NA 14.3   56     5   5     TRUE
## 6     28      NA 14.9   66     5   6     TRUE
## 7     23     299  8.6   65     5   7    FALSE
## 8     19      99 13.8   59     5   8     TRUE
## 9      8      19 20.1   61     5   9     TRUE
## 10    NA     194  8.6   69     5  10    FALSE
aq$TooWindy <- NULL
aq
##    Ozone Solar.R Wind Temp Month Day
## 1     41     190  7.4   67     5   1
## 2     36     118  8.0   72     5   2
## 3     12     149 12.6   74     5   3
## 4     18     313 11.5   62     5   4
## 5     NA      NA 14.3   56     5   5
## 6     28      NA 14.9   66     5   6
## 7     23     299  8.6   65     5   7
## 8     19      99 13.8   59     5   8
## 9      8      19 20.1   61     5   9
## 10    NA     194  8.6   69     5  10

Exercise 1.2.7: Creating factors

  • Create a factor corresponding to a categorical variable that can take on three levels: "male","female","other"

Solution 1.2.7

factor( x = c("male","male","female","other","female") )
## [1] male   male   female other  female
## Levels: female male other
factor( x = c(3,3,1,2,1), labels=c("female","other","male") )
## [1] male   male   female other  female
## Levels: female other male

Bonus 1.2.7

If there are legitimate levels that never appear, you can tell R that by being explicit about what the possible values are:

factor( x=c("male","male","female","female"), levels=c("female","other","male"))
## [1] male   male   female female
## Levels: female other male
factor( x=c(3,3,1,1), levels=c(1,2,3), labels=c("female","other","male"))
## [1] male   male   female female
## Levels: female other male

Exercise 1.2.8: Removing variables

  • clear the workspace using Rstudio’s “clear all” button
  • create a variable and then remove it using rm().
  • try to print out the value of the removed variable just to see what happens!

Solution 1.2.8:

These commands produce errors, so I can’t show you the output. But this is what you should try:

(x <- 1)
rm(x)
x