- Each R command works in the way:
command(arg1 = "text", arg2 = object, arg3 = TRUE, arg4 = FALSE)
- You can find help for each command using
- Use the assignment operator
<-to assign a value (e.g. character string, dataset, set of numbers) to an object
- Check the type and structure of an object with
- Most functions come from a package. Use
library(packagename)to load a package and
install.packages("packagename")to install one.
Just as with any piece of software, R needs to be installed. We will install two things:
R and RStudio.
R is freely distributed online, and you can download it from the R homepage, which is:
At the top of the page – under the heading “Download and Install R” – you’ll see separate links for Windows users, Mac users, and Linux users for the most recent version. If you follow the relevant link, you’ll see that the online instructions are pretty self-explanatory. For our purposes, any of the recent versions of R are fine (for detailed instructions see e.g. here).
Note you need to download and install R before doing the same for RStudio (below).
We are also choosing to install a second piece of software: RStudio. We do this because using a modern piece of software like RStudio just makes everything much easier (it is not required though). Now download and install the Desktop version (direct link).
To illustrate what RStudio looks like, Figure 1.1 shows a screenshot of an R session in progress. In this screenshot, you can see that it’s running on a Mac, but it looks almost identical no matter what operating system you have.
Now open up RStudio where the first thing you’ll see (assuming that you’re looking at the R console, that is) is a whole lot of text that doesn’t make much sense. It should look something like this:
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out" Copyright (C) 2020 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
Most of this text is pretty uninteresting, and when doing real data analysis you’ll never really pay much attention to it. The important part of it is this…
… which has a flashing cursor next to it. That’s the command prompt. When you see this, it means that R ready for a command.
One of the easiest things you can do with R is use it as a simple calculator, so it’s a good place to start. For instance, try typing
10 + 20, and hitting enter.2 When you do this, you’ve entered a command, and R will “execute” that command. What you see on screen now will be this:
> 10 + 20 1] 30[
In the following places of this manual, I will not include the
> symbol, to make your life easier and so you can simply copy the commands to your clipboard. This then looks like this:
10 + 20
The resulting output in this handbook (not on your computer) will have
## in front of it.
We can of course also use R to conduct other simple calculations (Table 1.1).
|Operation||Operator||Example Input||Example Output|
||10 + 2||12|
||9 - 3||6|
||5 * 5||25|
||10 / 3||3|
||5 ^ 2||25|
Other than Stata, the space is not normally an essential part of a command - therefore R is somewhat flexible with spacing so when we typed
10 + 20 before, we could equally have done this
10 + 20
and I would get exactly the same answer.
If we do not see the
> symbol at the start of the line, it means R is not yet ready for us to use a command. There are two (main) reasons why this might be the case:
- the calculation of the previous command has not finished yet (so just wait)
- R is waiting for us to finish the current command (i.e. we have not finished a command yet)
In the second instance, we will see something like this:
> 10+ +
Clearly we have forgotten to add a second argument (what to add to 10) when we hit Enter on
10+, so a
+ is displayed in the next line.
> 10 + + 20  30
Now that we have used R as a simple calculator, we want to use some more advanced commands as well. For this we look at the general syntax of R. For this, we will consider the “traditional” Base R and the modern tidyverse, which we tackle a bit later ??.
Fundamentally, an R command consists of:
- the name of the command, which cannot contain spaces
- a set of round brackets ()
- a number of arguments, which begin with the name of the argument, an equal sign
=and are separated by a comma
,, if there are multiple arguments.
Conceptually this would be:
acommandname(argumentone = "textinput", argumenttwo = variableinput, argumentthree = TRUE, argumentfour = FALSE)
Almost all of the functions you might want to use in R come in packages. A package is basically just a big collection of functions, data sets and other R objects that are all grouped together under a common name. Some packages are already installed when you put R on your computer, but the vast majority of them of R packages are out there on the internet, waiting for you to download, install and use them.
There’s a critical distinction between having a package installed on your computer, and having a package loaded in R. As of this writing, there are thousands of R packages freely available online. When you install R on your computer, you only get about 30 or so with the basic R installation. When you install a package, it is downloaded and installed on your computer. However, just because something is on your computer doesn’t mean R is using it right now. In order for R to be able to use one of your installed packages, that package must also be “loaded.”
A package must be installed before it can be loaded.
A package must be loaded before it can be used.
This two step process might seem a little odd at first, but the designers of R had very good reasons to do it this way,3 and you get the hang of it pretty quickly.
So let’s download and install our first package, which is called
palmerpenguins and contains exactly what you think it contains; data on Penguins 🐧. We are going to use this type of data in a minute.
To install this package, the simply use the command
install.packages(). So here we go:
This command takes several options and inputs, but most of the time we’ll only need the first one, which takes the name of the package to be installed as a character vector - this is why we surround the package name with quotation marks
Now you will see some additional output, which you can ignore for now.
Provided everything installed as expected, we can now load the data using the simple command
Fantastic, now we have the full functionality of the
palmerpenguins package at our disposal!
# A tibble: 344 x 8 species island bill_length_mm bill_depth_mm flipper_length_~ body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torge~ 39.1 18.7 181 3750 2 Adelie Torge~ 39.5 17.4 186 3800 3 Adelie Torge~ 40.3 18 195 3250 4 Adelie Torge~ NA NA NA NA 5 Adelie Torge~ 36.7 19.3 193 3450 6 Adelie Torge~ 39.3 20.6 190 3650 7 Adelie Torge~ 38.9 17.8 181 3625 8 Adelie Torge~ 39.2 19.6 195 4675 9 Adelie Torge~ 34.1 18.1 193 3475 10 Adelie Torge~ 42 20.2 190 4250 # ... with 334 more rows, and 2 more variables: sex <fct>, year <int>
One of the most important things to be able to do in R (or any programming language, for that matter) is to store information in variables. At a conceptual level you can think of a variable as label for a certain piece of information, or even several different pieces of information. When doing statistical analysis in R all of your data will be stored as variables in R. However, before we delve into all the messy details of data sets and statistical analysis, let’s look at the very basics for how we create variables and work with them.
Since we’ve been working with numbers so far, let’s start by creating variables to store our numbers. Suppose we’re trying to calculate how much money someone is going to make from a text book. Let’s create a variable called
sales. What we want to do is assign a value to the variable
sales, and that value should be
350. We do this by using the assignment operator, which is
<- (Keyboard shortcut is
Alt + -). Here’s how we do it:
sales <- 350
When you hit enter, R doesn’t print out any output.4 It just gives you another command prompt. However, behind the scenes R has created a variable called
sales and given it a value of
350. You can check that this has happened by asking R to print the variable on screen. And the simplest way to do that is to type the name of the variable and hit enter5.
Let’s see that in action:
So that’s nice to know. Anytime you can’t remember what R has got stored in a particular variable, you can just type the name of the variable and hit enter.
350 -> sales
In addition to defining a
sales variable that counts the number of copies of a book that are sold, we also create a variable called
royalty, indicating how much money is earned per copy. Let’s say that the royalties are about $7 per book:
sales <- 350 royalty <- 7
The nice thing about variables (in fact, the whole point of having variables) is that we can do anything with a variable that we ought to be able to do with the information that it stores. That is, since R allows me to multiply
350 * 7
it also allows us to multiply
sales * royalty
As far as R is concerned, the
sales * royalty command is the same as the
350 * 7 command. Not surprisingly, I can assign the output of this calculation to a new variable, which I’ll call
revenue. And when we do this, the new variable
revenue gets the value
2450. So let’s do that, and then get R to print out the value of
revenue so that we can verify that it’s done what we asked:
revenue <- sales * royalty revenue
A slightly more subtle thing we can do is reassign the value of our variable, based on its current value. For instance, suppose the person selling the book receives a bonus of $550. The simplest way to capture this is by a command like this:
revenue <- revenue + 550 revenue
In this calculation, R has taken the old value of
revenue (i.e., 2450) and added 550 to that value, producing a value of 3000. This new value is assigned to the
revenue variable, overwriting its previous value. Beware: This is not great style though because if you are interrupting your work at some point, the variable does not tell you whether this is before or after that person has received the bonus. My suggestion would be to use
revenue_bonus or something like this as a new variable name.
There are a few rules and a lot of conventions that govern how a variable can be named and you can find them in section ??.
Before moving on, it’s worth noting that – in the same way that R allows us to put multiple operations together into a longer command, like
1 + 2*4 for instance – it also lets us put functions together and even combine functions with operators if we so desire. For example, the following is a perfectly legitimate command:
When R executes this command, starts out by calculating the value of
abs(-8), which produces an intermediate value of
8. Having done so, the command simplifies to
sqrt( 1 + 8 ). To solve the square 4oot[^basics-6] it first needs to add
1 + 8 to get
9, at which point it evaluates
sqrt(9), and so it finally outputs a value of`34.
For functions, we need to consider “named” arguments, and “default” values for arguments. Let’s consider the
round() function, which can be used to round some value to the nearest whole number. For example, I could type this:
round( 3.1415 )
Pretty straightforward, really. However, suppose I only wanted to round it to two decimal places: that is, I want to get
3.14 as the output. The
round() function supports this, by allowing you to input a second argument to the function that specifies the number of decimal places that you want to round the number to. In other words, I could do this:
round( 3.14165, 2 )
What’s happening here is that I’ve specified two arguments: the first argument is the number that needs to be rounded (i.e.,
3.1415), the second argument is the number of decimal places that it should be rounded to (i.e.,
2), and the two arguments are separated by a comma. In this simple example, it’s quite easy to remember which one argument comes first and which one comes second, but for more complicated functions this is not easy. Fortunately, most R functions make use of argument names. For the
round() function, for example the number that needs to be rounded is specified using the
x argument, and the number of decimal points that you want it rounded to is specified using the
digits argument. Because we have these names available to us, we can specify the arguments to the function by name. We do so like this:
round( x = 3.1415, digits = 2 )
Notice that this is kind of similar in spirit to variable assignment (Section 1.5), except that I used
= here, rather than
<-. In both cases we’re specifying specific values to be associated with a label. It’s important that you use
= in this context.
As you can see, specifying the arguments by name involves a lot more typing, but it’s also a lot easier to read. Because of this, the commands in this book will usually specify arguments by name,6 since that makes it clearer to you what I’m doing. However, one important thing to note is that when specifying the arguments using their names, it doesn’t matter what order you type them in. But if you don’t use the argument names, then you have to input the arguments in the correct order. In other words, these three commands all produce the same output.5.
but this one does not…
round( 2, 3.14165 )
How do you find out what the correct order is? There’s a few different ways, but the easiest one is to look at the help documentation for the function (see Section 1.11. However, if you’re ever unsure, it’s probably best to actually type in the argument name.
Okay, so that’s the first thing I said you’d need to know: argument names. The second thing you need to know about is default values. Notice that the first time I called the
round() function I didn’t actually specify the
digits argument at all, and yet R somehow knew that this meant it should round to the nearest whole number. How did that happen? The answer is that the
digits argument has a default value of
0, meaning that if you decide not to specify a value for
digits then R will act as if you had typed
digits = 0. This is quite handy: the vast majority of the time when you want to round a number you want to round it to the nearest whole number, and it would be pretty annoying to have to specify the
digits argument every single time. On the other hand, sometimes you actually do want to round to something other than the nearest whole number, and it would be even more annoying if R didn’t allow this! Thus, by having
digits = 0 as the default value, we get the best of both worlds.
We can use the autocomplete ability in RStudio to help us with the argument names (as seen in ??) when we use the “tab” key on our keyboard.
See also Section 1.11 for more information on each command.
sales above, we can also store more than one number in a variable. We can store a series of numbers in a vector.
Let’s stick with the example of the book selling. Let’s suppose the book was sold 100 times in February, 200 times in March and 50 times in April. We store all this data in a variable called
sales.by.month. The simplest way to do this in R is to use the combine function,
c(). To do so, all we have to do is type all the numbers you want to store, separated with a comma, like this:
sales.by.month <- c(0, 100, 200, 50) sales.by.month
 0 100 200 50
To use the correct terminology here, we have a single variable here called
sales.by.month: this variable is a vector that consists of 4 elements.
Suppose I want to pull out the February sales data only. February is the second month of the year, so let’s try this:
And if we want to save this again:
february.sales <- sales.by.month february.sales
If we want to extract January and April, we need to use another vector that we create using the
 0 50
If we want everything but January:
 100 200 50
But we could also use:
 100 200 50
See Section 1.9 for more information on extracting information.
Sometimes you’ll want to change the values stored in a vector. We can use the assign command again:
sales.by.month <- 600
We can also add an element; either by using a specific placement, such as:
sales.by.month <- 25 sales.by.month
 0 100 600 50 25
You can use the
length() function to how many elements there are in a vector:
length( x = sales.by.month )
To calculate monthly revenue, we can multiply each element in the
sales.by.month vector by
7. R makes this pretty easy, as the following example shows:
sales.by.month * 7
 0 700 4200 350 175
In other words, when you multiply a vector by a single number, all elements in the vector get multiplied. The same is true for addition, subtraction, division and taking powers.
Suppose we wanted to know how much money the books are making per day, rather than per month we needd to do something slightly different. Firstly, I’ll create two new vectors:
days.per.month <- c(31, 28, 31, 30, 31) profit <- sales.by.month * 7
We now want to divide every element of
profit by the corresponding element of
profit / days.per.month
 0.000000 25.000000 135.483871 11.666667 5.645161
A lot of the time your data will be numeric in nature, but not always. Sometimes your data really needs to be described using text, not using numbers. We can save “hello” in a character string:
greeting <- "hello" greeting
When interpreting this, it’s important to recognise that the quote marks here aren’t part of the string itself. They’re just something that we use to make sure that R knows to treat the characters that they enclose as a piece of text data, known as a character string.
R stores the entire word
"hello" as a single element: our
greeting variable is not a vector of five different letters. Rather, it has only the one element, and that element corresponds to the entire character string
"hello". To illustrate this, if I actually ask R to find the first element of
greeting, it prints the whole string:
Of course, there’s no reason why I can’t create a vector of character strings. For instance, if we were to continue with the example of my attempts to look at the monthly sales data for my book, one variable I might want would include the names of all 5 months used above. To do so, I could type in a command like this:
months <- c("January", "February", "March", "April", "May") months
 "January" "February" "March" "April" "May"
This is a character vector containing 5 elements, each of which is the name of a month. So if I wanted R to tell me the name of the fourth month, all I would do is this:
A key concept in that a lot of R relies on is the idea of a logical value. A logical value is an assertion about whether something is true or false. This is implemented in R in a pretty straightforward way. There are two logical values, namely
FALSE. Despite the simplicity, a logical values are very useful things. Let’s see how they work.
If I ask it to calculate
2 + 2, it always gives the same answer:
2 + 2
Of course, so far R is just doing the calculations. I haven’t asked it to explicitly assert that \(2+2 = 4\) is a true statement. If I want R to make an explicit judgement, I can use a command like this:
2 + 2 == 4
2+2 == 5
If I try to force R to believe that two plus two is five by making an assignment statement like
2 + 2 = 5 or
2 + 2 <- 5. When I do this, here’s what happens:
2 + 2 = 5
Error in 2 + 2 = 5: target of assignment expands to non-language object
R doesn’t like this very much. It recognises that
2 + 2 is not a variable (that’s what the “non-language object” part is saying), and it won’t let you try to “reassign” it.
So now we’ve seen logical operations at work, but so far we’ve only seen the simplest possible example. You probably won’t be surprised to discover that we can combine logical operations with other operations and functions in a more complicated way, like this:
3*3 + 4*4 == 5*5
sqrt( 25 ) == 5
Not only that, but as Table 1.2 illustrates, there are several other logical operators that you can use, corresponding to some basic mathematical concepts.
|less than||<||2 < 3||
|less than or equal to||<=||2 <= 2||
|greater than||>||2 > 3||
|greater than or equal to||>=||2 >= 2||
|equal to||==||2 == 3||
|not equal to||!=||2 != 3||
Hopefully these are all pretty self-explanatory: for example, the less than operator
< checks to see if the number on the left is less than the number on the right. If it’s less, then R returns an answer of
99 < 100
but if the two numbers are equal, or if the one on the right is larger, then R returns an answer of
FALSE, as the following two examples illustrate:
100 < 100
100 < 99
In contrast, the less than or equal to operator
<= will do exactly what it says. It returns a value of
TRUE if the number of the left hand side is less than or equal to the number on the right hand side. So if we repeat the previous two examples using
<=, here’s what we get:
100 <= 100
100 <= 99
And at this point I hope it’s pretty obvious what the greater than operator
> and the greater than or equal to operator
>= do! Next on the list of logical operators is the not equal to operator
!= which – as with all the others – does what it says it does. It returns a value of
TRUE when things on either side are not identical to each other. Therefore, since \(2+2\) isn’t equal to \(5\), we get:
2 + 2 != 5
We’re not quite done yet. There are three more logical operations that are worth knowing about, listed in Table 1.3.
|or|||||(1==1) | (2==3)||
|and||&||(1==1) & (2==3)||
These are the not operator
!, the and operator
&, and the or operator
|. Like the other logical operators, their behaviour is more or less exactly what you’d expect given their names. For instance, if I ask you to assess the claim that “either \(2+2 = 4\) or \(2+2 = 5\)” you’d say that it’s true. Since it’s an “either-or” statement, all we need is for one of the two parts to be true. That’s what the
| operator does:
(2+2 == 4) | (2+2 == 5)
On the other hand, if I ask you to assess the claim that “both \(2+2 = 4\) and \(2+2 = 5\)” you’d say that it’s false. Since this is an and statement we need both parts to be true. And that’s what the
& operator does:
(2+2 == 4) & (2+2 == 5)
Finally, there’s the not operator, which is simple but annoying to describe in English. If I ask you to assess my claim that “it is not true that \(2+2 = 5\)” then you would say that my claim is true; because my claim is that “\(2+2 = 5\) is false.” And I’m right. If we write this as an R command we get this:
! (2+2 == 5)
In other words, since
2+2 == 5 is a
FALSE statement, it must be the case that
!(2+2 == 5) is a
TRUE one. Essentially, what we’ve really done is claim that “not false” is the same thing as “true.” Obviously, this isn’t really quite right in real life. But R lives in a much more black or white world: for R everything is either true or false. No shades of gray are allowed. We can actually see this much more explicitly, like this:
2+2 != 5
Up to this point, I’ve introduced numeric data (in Sections 1.5 and 1.6) and character data (in Section 1.7). So you might not be surprised to discover that these
FALSE values that R has been producing are actually a third kind of data, called logical data. That is, when I asked R if
2 + 2 == 5 and it said
 FALSE in reply, it was actually producing information that we can store in variables. For instance, I could create a variable called
is.the.Party.correct, which would store R’s opinion:
isthiscorrect <- 2 + 2 == 5 isthiscorrect
Alternatively, you can assign the value directly, by typing
FALSE in your command. Like this:
isthiscorrect <- FALSE isthiscorrect
The next thing to mention is that you can store vectors of logical values in exactly the same way that you can store vectors of numbers (Section 1.6) and vectors of text data (Section 1.7). Again, we can define them directly via the
c() function, like this:
x <- c(TRUE, TRUE, FALSE) x
 TRUE TRUE FALSE
or you can produce a vector of logicals by applying a logical operator to a vector. This might not make a lot of sense to you, so let’s unpack it slowly. First, let’s suppose we have a vector of numbers (i.e., a “non-logical vector”). For instance, we could use the
sales.by.month vector that we were using in Section 1.6. Suppose I wanted R to tell me, for each month of the year, whether I actually sold a book in that month. I can do that by typing this:
sales.by.month > 0
 FALSE TRUE TRUE TRUE TRUE
and again, I can store this in a vector if I want, as the example below illustrates:
any.sales.this.month <- sales.by.month > 0 any.sales.this.month
 FALSE TRUE TRUE TRUE TRUE
In other words,
any.sales.this.month is a logical vector whose elements are
TRUE only if the corresponding element of
sales.by.month is greater than zero. For instance, since I sold zero books in January, the first element is
One last thing to add before finishing up this chapter. So far, whenever I’ve had to get information out of a vector, all I’ve done is typed something like
months; and when I do this R prints out the fourth element of the
months vector. In this section, I’ll show you two additional tricks for getting information out of the vector.
One very useful thing we can do is pull out more than one element at a time. In the previous example, we only used a single number (i.e.,
2) to indicate which element we wanted. Alternatively, we can use a vector. So, suppose I wanted the data for February, March and April. What I could do is use the vector
c(2,3,4) to indicate which elements I want R to pull out. That is, I’d type this:
sales.by.month[ c(2,3,4) ]
 100 600 50
Notice that the order matters here. If I asked for the data in the reverse order (i.e., April first, then March, then February) by using the vector
c(4,3,2), then R outputs the data in the reverse order:
sales.by.month[ c(4,3,2) ]
 50 600 100
A second thing to be aware of is that R provides you with handy shortcuts for very common situations. For instance, suppose that I wanted to extract everything from the 2nd month through to the 8th month. One way to do this is to do the same thing I did above, and use the vector
c(2,3,4,5,6,7,8) to indicate the elements that I want. That works just fine
sales.by.month[ c(2,3,4,5,6,7,8) ]
 100 600 50 25 NA NA NA
but it’s kind of a lot of typing. To help make this easier, R lets you use
2:8 as shorthand for
c(2,3,4,5,6,7,8), which makes things a lot simpler. First, let’s just check that this is true:
 2 3 4 5 6 7 8
Next, let’s check that we can use the
2:8 shorthand as a way to pull out the 2nd through 8th elements of
 100 600 50 25 NA NA NA
So that’s kind of neat.
At this point, I can introduce an extremely useful tool called logical indexing. In the last section, I created a logical vector
any.sales.this.month, whose elements are
TRUE for any month in which I sold at least one book, and
FALSE for all the others. However, that big long list of
FALSEs is a little bit hard to read, so what I’d like to do is to have R select the names of the
months for which I sold any books. Earlier on, I created a vector
months that contains the names of each of the months. This is where logical indexing is handy. What I need to do is this:
months[ sales.by.month > 0 ]
 "February" "March" "April" "May"
To understand what’s happening here, it’s helpful to notice that
sales.by.month > 0 is the same logical expression that we used to create the
any.sales.this.month vector in the last section. In fact, I could have just done this:
months[ any.sales.this.month ]
 "February" "March" "April" "May"
and gotten exactly the same result. In order to figure out which elements of
months to include in the output, what R does is look to see if the corresponding element in
TRUE. Thus, since element 1 of
FALSE, R does not include
"January" as part of the output; but since element 2 of
TRUE, R does include
"February" in the output. Note that there’s no reason why I can’t use the same trick to find the actual sales numbers for those months. The command to do that would just be this:
sales.by.month [ sales.by.month > 0 ]
 100 600 50 25
In fact, we can do the same thing with text. Here’s an example. Suppose that – to continue the saga of the textbook sales – I later find out that the bookshop only had sufficient stocks for a few months of the year. They tell me that early in the year they had
"high" stocks, which then dropped to
"low" levels, and in fact for one month they were
"out" of copies of the book for a while before they were able to replenish them. Thus I might have a variable called
stock.levels which looks like this:
stock.levels<-c("high", "high", "low", "out", "out", "high", "high", "high", "high", "high", "high", "high") stock.levels
 "high" "high" "low" "out" "out" "high" "high" "high" "high" "high"  "high" "high"
Thus, if I want to know the months for which the bookshop was out of my book, I could apply the logical indexing trick, but with the character vector
stock.levels, like this:
months[stock.levels == "out"]
 "April" "May"
Alternatively, if I want to know when the bookshop was either low on copies or out of copies, I could do this:
months[stock.levels == "out" | stock.levels == "low"]
 "March" "April" "May"
months[stock.levels != "high" ]
 "March" "April" "May"
Either way, I get the answer I want.
At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data.
Obviously, this book is intended to be as helpful as possible, but it’s not even close to being a comprehensive guide, and there’s thousands of things it doesn’t cover. So where should you go for help?
It is very easy to get help on an R command; you simply type
?command() e.g. for the
This will get us to the specific help file for the command.
If we are not entirely sure how the command is called, or are looking for a general concept, we can simply add another
Which will give us a pretty extensive search. If we want to look for a phrase including spaces, then we need to wrap this in quotation marks:
For some more information, please see here.
And trust me: Debugging is tough - but getting something tough right in the end is a great feeling :)
There’s one last thing I should cover in this chapter: how to quit R i.e. how to exit the program. Assuming you’re running R in the usual way (i.e., through RStudio or the default GUI on a Windows or Mac computer), then you can just shut down the application in the normal way. However, R also has a function, called
q() that you can use to quit, which is pretty handy if you’re running R in a terminal window.
Regardless of what method you use to quit R, when you do so for the first time R will probably ask you if you want to save the “workspace image.” We’ll talk a lot more about loading and saving data in Section ??, but I figured we’d better quickly cover this now otherwise you’re going to get annoyed when you close R at the end of the chapter. If you’re using RStudio, you’ll see a dialogue box that looks like the one shown in Figure 1.5. If you’re using a text based interface you’ll see this:
q() ## Save workspace image? [y/n/c]:
y/n/c part here is short for “yes / no / cancel.” Type
y if you want to save,
n if you don’t, and
c if you’ve changed your mind and you don’t want to quit after all.
What does this actually mean? What’s going on is that R wants to know if you want to save all those variables that you’ve been creating, so that you can use them later. This sounds like a great idea, so it’s really tempting to type
y or click the “Save” button. To be honest though, I very rarely do this, and it kind of annoys me a little bit… what R is really asking is if you want it to store these variables in a “default” data file, which it will automatically reload for you next time you open R. And quite frankly, if I’d wanted to save the variables, then I’d have already saved them before trying to quit. Not only that, I’d have saved them to a location of my choice, so that I can find it again later. So I personally never bother with this.
In fact, every time I install R on a new machine one of the first things I do is change the settings so that it never asks me again. You can do this in RStudio really easily: use the menu system to find the RStudio option; the dialogue box that comes up will give you an option to tell R never to whine about this again (see Figure 1.6. On a Mac, you can open this window by going to the “RStudio” menu and selecting “Preferences.” On a Windows machine you go to the “Tools” menu and select “Global Options.” Under the “General” tab you’ll see an option that reads “Save workspace to .Rdata on exit.” By default this is set to “ask.” If you want R to stop asking, change it to “never.”
Every book that tries to introduce basic programming ideas to novices has to cover roughly the same topics, and in roughly the same order. Mine is no exception, and so in the grand tradition of doing it just the same way everyone else did it, this chapter covered the following topics:
- Getting started. We downloaded and installed R and RStudio
Basic commands. We talked a bit about the logic of how R works and in particular how to type commands into the R console (Section @ref(#firstcommand), and in doing so learned how to perform basic calculations using the arithmetic operators
Introduction to functions. We saw several different functions, three that are used to perform numeric calculations (
round(), one that applies to text (
nchar(); Section ??), and one that works on any variable (
length(); Section 1.6.4). In doing so, we talked a bit about how argument names work, and learned about default values for arguments. (Section 1.5.3)
- Introduction to variables. We learned the basic idea behind variables, and how to assign values to variables using the assignment operator
<-(Section 1.5). We also learned how to create vectors using the combine function
- Data types. Learned the distinction between numeric, character and logical data; including the basics of how to enter and use each of them. (Sections 1.5 to 1.8)
Logical operations. Learned how to use the logical operators
|. And learned how to use logical indexing. (Section 1.9)
We still haven’t arrived at anything that resembles a “data set,” of course. Maybe the next Chapter will get us a bit closer…