1 Getting started with R
1.1 TL;DR
- Each R command works in the way:
command(arg1 = "text", arg2 = object, arg3 = TRUE, arg4 = FALSE)
- You can find help for each command using
?command
e.g.?setwd
- Use the assignment operator
<-
to assign a value (e.g. character string, dataset, set of numbers) to an object - Check the type and structure of an object with
str(object)
- Most functions come from a package. Use
library(packagename)
to load a package andinstall.packages("packagename")
to install one.
1.2 Installing R and RStudio
Just as with any piece of software, R needs to be installed. We will install two things:
R and RStudio.
R is freely distributed online, and you can download it from the R homepage, which is:
At the top of the page – under the heading “Download and Install R” – you’ll see separate links for Windows users, Mac users, and Linux users for the most recent version. If you follow the relevant link, you’ll see that the online instructions are pretty self-explanatory. For our purposes, any of the recent versions of R are fine (for detailed instructions see e.g. here).
Note you need to download and install R before doing the same for RStudio (below).
We are also choosing to install a second piece of software: RStudio. We do this because using a modern piece of software like RStudio just makes everything much easier (it is not required though). Now download and install the Desktop version (direct link).
To illustrate what RStudio looks like, Figure 1.1 shows a screenshot of an R session in progress. In this screenshot, you can see that it’s running on a Mac, but it looks almost identical no matter what operating system you have.

Figure 1.1: An R session in progress running through RStudio. The picture shows RStudio running on a Mac, but the Windows interface is almost identical.
1.2.1 Getting Started
Now open up RStudio where the first thing you’ll see (assuming that you’re looking at the R console, that is) is a whole lot of text that doesn’t make much sense. It should look something like this:
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Most of this text is pretty uninteresting, and when doing real data analysis you’ll never really pay much attention to it. The important part of it is this…
>
… which has a flashing cursor next to it. That’s the command prompt. When you see this, it means that R ready for a command.
1.3 Typing commands at the R console
One of the easiest things you can do with R is use it as a simple calculator, so it’s a good place to start. For instance, try typing 10 + 20
, and hitting enter.2 When you do this, you’ve entered a command, and R will “execute” that command. What you see on screen now will be this:
> 10 + 20
1] 30 [
In the following places of this manual, I will not include the >
symbol, to make your life easier and so you can simply copy the commands to your clipboard. This then looks like this:
10 + 20
[1] 30
The resulting output in this handbook (not on your computer) will have ##
in front of it.
We can of course also use R to conduct other simple calculations (Table 1.1).
Operation | Operator | Example Input | Example Output |
---|---|---|---|
addition | + |
10 + 2 | 12 |
subtraction | - |
9 - 3 | 6 |
multiplication | * |
5 * 5 | 25 |
division | / |
10 / 3 | 3 |
power | ^ |
5 ^ 2 | 25 |
1.3.1 R is (a bit) flexible with spacing
Other than Stata, the space is not normally an essential part of a command - therefore R is somewhat flexible with spacing so when we typed 10 + 20
before, we could equally have done this
10 + 20
[1] 30
or this
10+20
[1] 30
and I would get exactly the same answer.
If we do not see the >
symbol at the start of the line, it means R is not yet ready for us to use a command. There are two (main) reasons why this might be the case:
- the calculation of the previous command has not finished yet (so just wait)
- R is waiting for us to finish the current command (i.e. we have not finished a command yet)
In the second instance, we will see something like this:
> 10+
+
Clearly we have forgotten to add a second argument (what to add to 10) when we hit Enter on 10+
, so a +
is displayed in the next line.
> 10 +
+ 20
[1] 30
1.4 Commands and Syntax
Now that we have used R as a simple calculator, we want to use some more advanced commands as well. For this we look at the general syntax of R. For this, we will consider the “traditional” Base R and the modern tidyverse, which we tackle a bit later ??.
1.4.1 R commands
Fundamentally, an R command consists of:
- the name of the command, which cannot contain spaces
- a set of round brackets ()
- a number of arguments, which begin with the name of the argument, an equal sign
=
and are separated by a comma,
, if there are multiple arguments.
Conceptually this would be:
acommandname(argumentone = "textinput", argumenttwo = variableinput, argumentthree = TRUE, argumentfour = FALSE)
1.4.2 Installing and loading packages
Almost all of the functions you might want to use in R come in packages. A package is basically just a big collection of functions, data sets and other R objects that are all grouped together under a common name. Some packages are already installed when you put R on your computer, but the vast majority of them of R packages are out there on the internet, waiting for you to download, install and use them.
There’s a critical distinction between having a package installed on your computer, and having a package loaded in R. As of this writing, there are thousands of R packages freely available online. When you install R on your computer, you only get about 30 or so with the basic R installation. When you install a package, it is downloaded and installed on your computer. However, just because something is on your computer doesn’t mean R is using it right now. In order for R to be able to use one of your installed packages, that package must also be “loaded.”
A package must be installed before it can be loaded.
A package must be loaded before it can be used.
This two step process might seem a little odd at first, but the designers of R had very good reasons to do it this way,3 and you get the hang of it pretty quickly.
So let’s download and install our first package, which is called palmerpenguins
and contains exactly what you think it contains; data on Penguins 🐧. We are going to use this type of data in a minute.
To install this package, the simply use the command install.packages()
. So here we go:
install.packages("palmerpenguins")
This command takes several options and inputs, but most of the time we’ll only need the first one, which takes the name of the package to be installed as a character vector - this is why we surround the package name with quotation marks " "
.
Now you will see some additional output, which you can ignore for now.
Provided everything installed as expected, we can now load the data using the simple command library()
:
Fantastic, now we have the full functionality of the palmerpenguins
package at our disposal!
penguins
# A tibble: 344 x 8
species island bill_length_mm bill_depth_mm flipper_length_~ body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torge~ 39.1 18.7 181 3750
2 Adelie Torge~ 39.5 17.4 186 3800
3 Adelie Torge~ 40.3 18 195 3250
4 Adelie Torge~ NA NA NA NA
5 Adelie Torge~ 36.7 19.3 193 3450
6 Adelie Torge~ 39.3 20.6 190 3650
7 Adelie Torge~ 38.9 17.8 181 3625
8 Adelie Torge~ 39.2 19.6 195 4675
9 Adelie Torge~ 34.1 18.1 193 3475
10 Adelie Torge~ 42 20.2 190 4250
# ... with 334 more rows, and 2 more variables: sex <fct>, year <int>
1.5 Storing a number as a variable
One of the most important things to be able to do in R (or any programming language, for that matter) is to store information in variables. At a conceptual level you can think of a variable as label for a certain piece of information, or even several different pieces of information. When doing statistical analysis in R all of your data will be stored as variables in R. However, before we delve into all the messy details of data sets and statistical analysis, let’s look at the very basics for how we create variables and work with them.
1.5.1 Variable assignment using <-
and ->
Since we’ve been working with numbers so far, let’s start by creating variables to store our numbers. Suppose we’re trying to calculate how much money someone is going to make from a text book. Let’s create a variable called sales
. What we want to do is assign a value to the variable sales
, and that value should be 350
. We do this by using the assignment operator, which is <-
(Keyboard shortcut is Alt + -
). Here’s how we do it:
sales <- 350
When you hit enter, R doesn’t print out any output.4 It just gives you another command prompt. However, behind the scenes R has created a variable called sales
and given it a value of 350
. You can check that this has happened by asking R to print the variable on screen. And the simplest way to do that is to type the name of the variable and hit enter5.
sales
[1] 350
Let’s see that in action:

Figure 1.2: A simple example how to store a value in a variable, check it in the environment panel of RStudio and printing the value of the variable.
So that’s nice to know. Anytime you can’t remember what R has got stored in a particular variable, you can just type the name of the variable and hit enter.
In addition to the <-
operator, we can also use ->
which just reverses the direction and =
. We’ll discuss the use of =
in Section 1.5.3. An example for using ->
is below.
350 -> sales
1.5.2 Doing calculations using variables
In addition to defining a sales
variable that counts the number of copies of a book that are sold, we also create a variable called royalty
, indicating how much money is earned per copy. Let’s say that the royalties are about $7 per book:
sales <- 350
royalty <- 7
The nice thing about variables (in fact, the whole point of having variables) is that we can do anything with a variable that we ought to be able to do with the information that it stores. That is, since R allows me to multiply 350
by 7
350 * 7
[1] 2450
it also allows us to multiply sales
by royalty
sales * royalty
[1] 2450
As far as R is concerned, the sales * royalty
command is the same as the 350 * 7
command. Not surprisingly, I can assign the output of this calculation to a new variable, which I’ll call revenue
. And when we do this, the new variable revenue
gets the value 2450
. So let’s do that, and then get R to print out the value of revenue
so that we can verify that it’s done what we asked:
revenue <- sales * royalty
revenue
[1] 2450
A slightly more subtle thing we can do is reassign the value of our variable, based on its current value. For instance, suppose the person selling the book receives a bonus of $550. The simplest way to capture this is by a command like this:
revenue <- revenue + 550
revenue
[1] 3000
In this calculation, R has taken the old value of revenue
(i.e., 2450) and added 550 to that value, producing a value of 3000. This new value is assigned to the revenue
variable, overwriting its previous value. Beware: This is not great style though because if you are interrupting your work at some point, the variable does not tell you whether this is before or after that person has received the bonus. My suggestion would be to use revenue_bonus
or something like this as a new variable name.
There are a few rules and a lot of conventions that govern how a variable can be named and you can find them in section ??.
Before moving on, it’s worth noting that – in the same way that R allows us to put multiple operations together into a longer command, like 1 + 2*4
for instance – it also lets us put functions together and even combine functions with operators if we so desire. For example, the following is a perfectly legitimate command:
[1] 3
When R executes this command, starts out by calculating the value of abs(-8)
, which produces an intermediate value of 8
. Having done so, the command simplifies to sqrt( 1 + 8 )
. To solve the square 4oot[^basics-6] it first needs to add 1 + 8
to get 9
, at which point it evaluates sqrt(9)
, and so it finally outputs a value of`34.
1.5.3 Function arguments, their names and their defaults
For functions, we need to consider “named” arguments, and “default” values for arguments. Let’s consider the round()
function, which can be used to round some value to the nearest whole number. For example, I could type this:
round( 3.1415 )
[1] 3
Pretty straightforward, really. However, suppose I only wanted to round it to two decimal places: that is, I want to get 3.14
as the output. The round()
function supports this, by allowing you to input a second argument to the function that specifies the number of decimal places that you want to round the number to. In other words, I could do this:
round( 3.14165, 2 )
[1] 3.14
What’s happening here is that I’ve specified two arguments: the first argument is the number that needs to be rounded (i.e., 3.1415
), the second argument is the number of decimal places that it should be rounded to (i.e., 2
), and the two arguments are separated by a comma. In this simple example, it’s quite easy to remember which one argument comes first and which one comes second, but for more complicated functions this is not easy. Fortunately, most R functions make use of argument names. For the round()
function, for example the number that needs to be rounded is specified using the x
argument, and the number of decimal points that you want it rounded to is specified using the digits
argument. Because we have these names available to us, we can specify the arguments to the function by name. We do so like this:
round( x = 3.1415, digits = 2 )
[1] 3.14
Notice that this is kind of similar in spirit to variable assignment (Section 1.5), except that I used =
here, rather than <-
. In both cases we’re specifying specific values to be associated with a label. It’s important that you use =
in this context.
As you can see, specifying the arguments by name involves a lot more typing, but it’s also a lot easier to read. Because of this, the commands in this book will usually specify arguments by name,6 since that makes it clearer to you what I’m doing. However, one important thing to note is that when specifying the arguments using their names, it doesn’t matter what order you type them in. But if you don’t use the argument names, then you have to input the arguments in the correct order. In other words, these three commands all produce the same output.5.
but this one does not…
round( 2, 3.14165 )
How do you find out what the correct order is? There’s a few different ways, but the easiest one is to look at the help documentation for the function (see Section 1.11. However, if you’re ever unsure, it’s probably best to actually type in the argument name.
Okay, so that’s the first thing I said you’d need to know: argument names. The second thing you need to know about is default values. Notice that the first time I called the round()
function I didn’t actually specify the digits
argument at all, and yet R somehow knew that this meant it should round to the nearest whole number. How did that happen? The answer is that the digits
argument has a default value of 0
, meaning that if you decide not to specify a value for digits
then R will act as if you had typed digits = 0
. This is quite handy: the vast majority of the time when you want to round a number you want to round it to the nearest whole number, and it would be pretty annoying to have to specify the digits
argument every single time. On the other hand, sometimes you actually do want to round to something other than the nearest whole number, and it would be even more annoying if R didn’t allow this! Thus, by having digits = 0
as the default value, we get the best of both worlds.
We can use the autocomplete ability in RStudio to help us with the argument names (as seen in ??) when we use the “tab” key on our keyboard.
See also Section 1.11 for more information on each command.
1.6 Storing many numbers as a vector
As with sales
above, we can also store more than one number in a variable. We can store a series of numbers in a vector.
1.6.1 Creating a vector
Let’s stick with the example of the book selling. Let’s suppose the book was sold 100 times in February, 200 times in March and 50 times in April. We store all this data in a variable called sales.by.month
. The simplest way to do this in R is to use the combine function, c()
. To do so, all we have to do is type all the numbers you want to store, separated with a comma, like this:
sales.by.month <- c(0, 100, 200, 50)
sales.by.month
[1] 0 100 200 50
To use the correct terminology here, we have a single variable here called sales.by.month
: this variable is a vector that consists of 4 elements.
1.6.2 Getting information out of vector
Suppose I want to pull out the February sales data only. February is the second month of the year, so let’s try this:
sales.by.month[2]
[1] 100
And if we want to save this again:
february.sales <- sales.by.month[2]
february.sales
[1] 100
If we want to extract January and April, we need to use another vector that we create using the c()
command.
sales.by.month[c(1,4)]
[1] 0 50
If we want everything but January:
sales.by.month[-1]
[1] 100 200 50
But we could also use:
sales.by.month[2:4]
[1] 100 200 50
See Section 1.9 for more information on extracting information.
1.6.3 Altering the elements of a vector
Sometimes you’ll want to change the values stored in a vector. We can use the assign command again:
sales.by.month[3] <- 600
We can also add an element; either by using a specific placement, such as:
sales.by.month[5] <- 25
sales.by.month
[1] 0 100 600 50 25
Or using the append()
, edit()
or fix()
functions. I won’t discuss them in detail right now, but you can check them out on your own.
1.6.4 Useful things to know about vectors
You can use the length()
function to how many elements there are in a vector:
length( x = sales.by.month )
[1] 5
To calculate monthly revenue, we can multiply each element in the sales.by.month
vector by 7
. R makes this pretty easy, as the following example shows:
sales.by.month * 7
[1] 0 700 4200 350 175
In other words, when you multiply a vector by a single number, all elements in the vector get multiplied. The same is true for addition, subtraction, division and taking powers.
Suppose we wanted to know how much money the books are making per day, rather than per month we needd to do something slightly different. Firstly, I’ll create two new vectors:
days.per.month <- c(31, 28, 31, 30, 31)
profit <- sales.by.month * 7
We now want to divide every element of profit
by the corresponding element of days.per.month
:
profit / days.per.month
[1] 0.000000 25.000000 135.483871 11.666667 5.645161
1.7 Storing text data
A lot of the time your data will be numeric in nature, but not always. Sometimes your data really needs to be described using text, not using numbers. We can save “hello” in a character string:
greeting <- "hello"
greeting
[1] "hello"
When interpreting this, it’s important to recognise that the quote marks here aren’t part of the string itself. They’re just something that we use to make sure that R knows to treat the characters that they enclose as a piece of text data, known as a character string.
R stores the entire word "hello"
as a single element: our greeting
variable is not a vector of five different letters. Rather, it has only the one element, and that element corresponds to the entire character string "hello"
. To illustrate this, if I actually ask R to find the first element of greeting
, it prints the whole string:
greeting[1]
[1] "hello"
Of course, there’s no reason why I can’t create a vector of character strings. For instance, if we were to continue with the example of my attempts to look at the monthly sales data for my book, one variable I might want would include the names of all 5 months used above. To do so, I could type in a command like this:
months <- c("January", "February", "March", "April", "May")
months
[1] "January" "February" "March" "April" "May"
This is a character vector containing 5 elements, each of which is the name of a month. So if I wanted R to tell me the name of the fourth month, all I would do is this:
months[4]
[1] "April"
1.8 Storing “true or false” data
A key concept in that a lot of R relies on is the idea of a logical value. A logical value is an assertion about whether something is true or false. This is implemented in R in a pretty straightforward way. There are two logical values, namely TRUE
and FALSE
. Despite the simplicity, a logical values are very useful things. Let’s see how they work.
1.8.1 Assessing mathematical truths
If I ask it to calculate 2 + 2
, it always gives the same answer:
2 + 2
[1] 4
Of course, so far R is just doing the calculations. I haven’t asked it to explicitly assert that \(2+2 = 4\) is a true statement. If I want R to make an explicit judgement, I can use a command like this:
2 + 2 == 4
[1] TRUE
What I’ve done here is use the equality operator, ==
, to force R to make a “true or false” judgement.7 Okay, let’s see what R thinks of the Party sloga2:
2+2 == 5
[1] FALSE
If I try to force R to believe that two plus two is five by making an assignment statement like 2 + 2 = 5
or 2 + 2 <- 5
. When I do this, here’s what happens:
2 + 2 = 5
Error in 2 + 2 = 5: target of assignment expands to non-language object
R doesn’t like this very much. It recognises that 2 + 2
is not a variable (that’s what the “non-language object” part is saying), and it won’t let you try to “reassign” it.
1.8.2 Logical operations
So now we’ve seen logical operations at work, but so far we’ve only seen the simplest possible example. You probably won’t be surprised to discover that we can combine logical operations with other operations and functions in a more complicated way, like this:
3*3 + 4*4 == 5*5
[1] TRUE
or this
sqrt( 25 ) == 5
[1] TRUE
Not only that, but as Table 1.2 illustrates, there are several other logical operators that you can use, corresponding to some basic mathematical concepts.
operation | operator | example input | answer |
---|---|---|---|
less than | < | 2 < 3 | TRUE |
less than or equal to | <= | 2 <= 2 | TRUE |
greater than | > | 2 > 3 | FALSE |
greater than or equal to | >= | 2 >= 2 | TRUE |
equal to | == | 2 == 3 | FALSE |
not equal to | != | 2 != 3 | TRUE |
Hopefully these are all pretty self-explanatory: for example, the less than operator <
checks to see if the number on the left is less than the number on the right. If it’s less, then R returns an answer of TRUE
:
99 < 100
[1] TRUE
but if the two numbers are equal, or if the one on the right is larger, then R returns an answer of FALSE
, as the following two examples illustrate:
100 < 100
[1] FALSE
100 < 99
[1] FALSE
In contrast, the less than or equal to operator <=
will do exactly what it says. It returns a value of TRUE
if the number of the left hand side is less than or equal to the number on the right hand side. So if we repeat the previous two examples using <=
, here’s what we get:
100 <= 100
[1] TRUE
100 <= 99
[1] FALSE
And at this point I hope it’s pretty obvious what the greater than operator >
and the greater than or equal to operator >=
do! Next on the list of logical operators is the not equal to operator !=
which – as with all the others – does what it says it does. It returns a value of TRUE
when things on either side are not identical to each other. Therefore, since \(2+2\) isn’t equal to \(5\), we get:
2 + 2 != 5
[1] TRUE
We’re not quite done yet. There are three more logical operations that are worth knowing about, listed in Table 1.3.
operation | operator | example input | answer |
---|---|---|---|
not | ! | !(1==1) | FALSE |
or | | | (1==1) | (2==3) | TRUE |
and | & | (1==1) & (2==3) | FALSE |
These are the not operator !
, the and operator &
, and the or operator |
. Like the other logical operators, their behaviour is more or less exactly what you’d expect given their names. For instance, if I ask you to assess the claim that “either \(2+2 = 4\) or \(2+2 = 5\)” you’d say that it’s true. Since it’s an “either-or” statement, all we need is for one of the two parts to be true. That’s what the |
operator does:
(2+2 == 4) | (2+2 == 5)
[1] TRUE
On the other hand, if I ask you to assess the claim that “both \(2+2 = 4\) and \(2+2 = 5\)” you’d say that it’s false. Since this is an and statement we need both parts to be true. And that’s what the &
operator does:
(2+2 == 4) & (2+2 == 5)
[1] FALSE
Finally, there’s the not operator, which is simple but annoying to describe in English. If I ask you to assess my claim that “it is not true that \(2+2 = 5\)” then you would say that my claim is true; because my claim is that “\(2+2 = 5\) is false.” And I’m right. If we write this as an R command we get this:
! (2+2 == 5)
[1] TRUE
In other words, since 2+2 == 5
is a FALSE
statement, it must be the case that !(2+2 == 5)
is a TRUE
one. Essentially, what we’ve really done is claim that “not false” is the same thing as “true.” Obviously, this isn’t really quite right in real life. But R lives in a much more black or white world: for R everything is either true or false. No shades of gray are allowed. We can actually see this much more explicitly, like this:
! FALSE
[1] TRUE
Of course, in our \(2+2 = 5\) example, we didn’t really need to use “not” !
and “equals to” ==
as two separate operators. We could have just used the “not equals to” operator !=
like this:
2+2 != 5
[1] TRUE
But there are many situations where you really do need to use the !
operator. We’ll see some later on.8
1.8.3 Storing and using logical data
Up to this point, I’ve introduced numeric data (in Sections 1.5 and 1.6) and character data (in Section 1.7). So you might not be surprised to discover that these TRUE
and FALSE
values that R has been producing are actually a third kind of data, called logical data. That is, when I asked R if 2 + 2 == 5
and it said [1] FALSE
in reply, it was actually producing information that we can store in variables. For instance, I could create a variable called is.the.Party.correct
, which would store R’s opinion:
isthiscorrect <- 2 + 2 == 5
isthiscorrect
[1] FALSE
Alternatively, you can assign the value directly, by typing TRUE
or FALSE
in your command. Like this:
isthiscorrect <- FALSE
isthiscorrect
[1] FALSE
1.8.4 Vectors of logicals
The next thing to mention is that you can store vectors of logical values in exactly the same way that you can store vectors of numbers (Section 1.6) and vectors of text data (Section 1.7). Again, we can define them directly via the c()
function, like this:
x <- c(TRUE, TRUE, FALSE)
x
[1] TRUE TRUE FALSE
or you can produce a vector of logicals by applying a logical operator to a vector. This might not make a lot of sense to you, so let’s unpack it slowly. First, let’s suppose we have a vector of numbers (i.e., a “non-logical vector”). For instance, we could use the sales.by.month
vector that we were using in Section 1.6. Suppose I wanted R to tell me, for each month of the year, whether I actually sold a book in that month. I can do that by typing this:
sales.by.month > 0
[1] FALSE TRUE TRUE TRUE TRUE
and again, I can store this in a vector if I want, as the example below illustrates:
any.sales.this.month <- sales.by.month > 0
any.sales.this.month
[1] FALSE TRUE TRUE TRUE TRUE
In other words, any.sales.this.month
is a logical vector whose elements are TRUE
only if the corresponding element of sales.by.month
is greater than zero. For instance, since I sold zero books in January, the first element is FALSE
.
1.9 Indexing vectors
One last thing to add before finishing up this chapter. So far, whenever I’ve had to get information out of a vector, all I’ve done is typed something like months[4]
; and when I do this R prints out the fourth element of the months
vector. In this section, I’ll show you two additional tricks for getting information out of the vector.
1.9.1 Extracting multiple elements
One very useful thing we can do is pull out more than one element at a time. In the previous example, we only used a single number (i.e., 2
) to indicate which element we wanted. Alternatively, we can use a vector. So, suppose I wanted the data for February, March and April. What I could do is use the vector c(2,3,4)
to indicate which elements I want R to pull out. That is, I’d type this:
sales.by.month[ c(2,3,4) ]
[1] 100 600 50
Notice that the order matters here. If I asked for the data in the reverse order (i.e., April first, then March, then February) by using the vector c(4,3,2)
, then R outputs the data in the reverse order:
sales.by.month[ c(4,3,2) ]
[1] 50 600 100
A second thing to be aware of is that R provides you with handy shortcuts for very common situations. For instance, suppose that I wanted to extract everything from the 2nd month through to the 8th month. One way to do this is to do the same thing I did above, and use the vector c(2,3,4,5,6,7,8)
to indicate the elements that I want. That works just fine
sales.by.month[ c(2,3,4,5,6,7,8) ]
[1] 100 600 50 25 NA NA NA
but it’s kind of a lot of typing. To help make this easier, R lets you use 2:8
as shorthand for c(2,3,4,5,6,7,8)
, which makes things a lot simpler. First, let’s just check that this is true:
2:8
[1] 2 3 4 5 6 7 8
Next, let’s check that we can use the 2:8
shorthand as a way to pull out the 2nd through 8th elements of sales.by.months
:
sales.by.month[2:8]
[1] 100 600 50 25 NA NA NA
So that’s kind of neat.
1.9.2 Logical indexing
At this point, I can introduce an extremely useful tool called logical indexing. In the last section, I created a logical vector any.sales.this.month
, whose elements are TRUE
for any month in which I sold at least one book, and FALSE
for all the others. However, that big long list of TRUE
s and FALSE
s is a little bit hard to read, so what I’d like to do is to have R select the names of the months
for which I sold any books. Earlier on, I created a vector months
that contains the names of each of the months. This is where logical indexing is handy. What I need to do is this:
months[ sales.by.month > 0 ]
[1] "February" "March" "April" "May"
To understand what’s happening here, it’s helpful to notice that sales.by.month > 0
is the same logical expression that we used to create the any.sales.this.month
vector in the last section. In fact, I could have just done this:
months[ any.sales.this.month ]
[1] "February" "March" "April" "May"
and gotten exactly the same result. In order to figure out which elements of months
to include in the output, what R does is look to see if the corresponding element in any.sales.this.month
is TRUE
. Thus, since element 1 of any.sales.this.month
is FALSE
, R does not include "January"
as part of the output; but since element 2 of any.sales.this.month
is TRUE
, R does include "February"
in the output. Note that there’s no reason why I can’t use the same trick to find the actual sales numbers for those months. The command to do that would just be this:
sales.by.month [ sales.by.month > 0 ]
[1] 100 600 50 25
In fact, we can do the same thing with text. Here’s an example. Suppose that – to continue the saga of the textbook sales – I later find out that the bookshop only had sufficient stocks for a few months of the year. They tell me that early in the year they had "high"
stocks, which then dropped to "low"
levels, and in fact for one month they were "out"
of copies of the book for a while before they were able to replenish them. Thus I might have a variable called stock.levels
which looks like this:
stock.levels<-c("high", "high", "low", "out", "out", "high",
"high", "high", "high", "high", "high", "high")
stock.levels
[1] "high" "high" "low" "out" "out" "high" "high" "high" "high" "high"
[11] "high" "high"
Thus, if I want to know the months for which the bookshop was out of my book, I could apply the logical indexing trick, but with the character vector stock.levels
, like this:
months[stock.levels == "out"]
[1] "April" "May"
Alternatively, if I want to know when the bookshop was either low on copies or out of copies, I could do this:
months[stock.levels == "out" | stock.levels == "low"]
[1] "March" "April" "May"
or this
months[stock.levels != "high" ]
[1] "March" "April" "May"
Either way, I get the answer I want.
At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data.
1.11 Getting help
Obviously, this book is intended to be as helpful as possible, but it’s not even close to being a comprehensive guide, and there’s thousands of things it doesn’t cover. So where should you go for help?
It is very easy to get help on an R command; you simply type ?command()
e.g. for the mutate
command:
?mutate
This will get us to the specific help file for the command.
If we are not entirely sure how the command is called, or are looking for a general concept, we can simply add another ?
e.g.
??test
Which will give us a pretty extensive search. If we want to look for a phrase including spaces, then we need to wrap this in quotation marks:
??"t test"
For some more information, please see here.
And trust me: Debugging is tough - but getting something tough right in the end is a great feeling :)

Figure 1.4: The various stages of your debugging journey.
1.12 Quitting R

Figure 1.5: The dialog box that shows up when you try to close RStudio.
There’s one last thing I should cover in this chapter: how to quit R i.e. how to exit the program. Assuming you’re running R in the usual way (i.e., through RStudio or the default GUI on a Windows or Mac computer), then you can just shut down the application in the normal way. However, R also has a function, called q()
that you can use to quit, which is pretty handy if you’re running R in a terminal window.
Regardless of what method you use to quit R, when you do so for the first time R will probably ask you if you want to save the “workspace image.” We’ll talk a lot more about loading and saving data in Section ??, but I figured we’d better quickly cover this now otherwise you’re going to get annoyed when you close R at the end of the chapter. If you’re using RStudio, you’ll see a dialogue box that looks like the one shown in Figure 1.5. If you’re using a text based interface you’ll see this:
q()
## Save workspace image? [y/n/c]:
The y/n/c
part here is short for “yes / no / cancel.” Type y
if you want to save, n
if you don’t, and c
if you’ve changed your mind and you don’t want to quit after all.
What does this actually mean? What’s going on is that R wants to know if you want to save all those variables that you’ve been creating, so that you can use them later. This sounds like a great idea, so it’s really tempting to type y
or click the “Save” button. To be honest though, I very rarely do this, and it kind of annoys me a little bit… what R is really asking is if you want it to store these variables in a “default” data file, which it will automatically reload for you next time you open R. And quite frankly, if I’d wanted to save the variables, then I’d have already saved them before trying to quit. Not only that, I’d have saved them to a location of my choice, so that I can find it again later. So I personally never bother with this.
In fact, every time I install R on a new machine one of the first things I do is change the settings so that it never asks me again. You can do this in RStudio really easily: use the menu system to find the RStudio option; the dialogue box that comes up will give you an option to tell R never to whine about this again (see Figure 1.6. On a Mac, you can open this window by going to the “RStudio” menu and selecting “Preferences.” On a Windows machine you go to the “Tools” menu and select “Global Options.” Under the “General” tab you’ll see an option that reads “Save workspace to .Rdata on exit.” By default this is set to “ask.” If you want R to stop asking, change it to “never.”

Figure 1.6: The options window in RStudio. On a Mac, you can open this window by going to the “RStudio” menu and selecting “Preferences.” On a Windows machine you go to the “Tools” menu and select “Global Options”
1.13 Summary
Every book that tries to introduce basic programming ideas to novices has to cover roughly the same topics, and in roughly the same order. Mine is no exception, and so in the grand tradition of doing it just the same way everyone else did it, this chapter covered the following topics:
- Getting started. We downloaded and installed R and RStudio
-
Basic commands. We talked a bit about the logic of how R works and in particular how to type commands into the R console (Section @ref(#firstcommand), and in doing so learned how to perform basic calculations using the arithmetic operators
+
,-
,*
,/
and^
. -
Introduction to functions. We saw several different functions, three that are used to perform numeric calculations (
sqrt()
,abs()
,round()
, one that applies to text (nchar()
; Section ??), and one that works on any variable (length()
; Section 1.6.4). In doing so, we talked a bit about how argument names work, and learned about default values for arguments. (Section 1.5.3) - Introduction to variables. We learned the basic idea behind variables, and how to assign values to variables using the assignment operator
<-
(Section 1.5). We also learned how to create vectors using the combine functionc()
(Section 1.6). - Data types. Learned the distinction between numeric, character and logical data; including the basics of how to enter and use each of them. (Sections 1.5 to 1.8)
-
Logical operations. Learned how to use the logical operators
==
,!=
,<
,>
,<=
,=>
,!
,&
and|
. And learned how to use logical indexing. (Section 1.9)
We still haven’t arrived at anything that resembles a “data set,” of course. Maybe the next Chapter will get us a bit closer…