I: Installing R & RStudio

Getting started

The primary pieces of software you are going to need for this class are

  • R
  • RStudio
  • Microsoft Office
    • I am going to assume you already have this, but we can meet to install it if not, it is free for UF students

There are also a few optional pieces of software you’ll need for extra credit lessons, but we will cover when needed.

Installing R

  • R is fantastic (hopefully you will see that throughout the course), but sometimes it can make things seem more complicated than they need to
    • The first time it does this is when trying to install it, there’s a bunch of options called “mirrors”
      • These are basically to reduce strain on the servers that you download from by using the closest location
    • The good news, however, is that a URL that automates this whole process for you came out recently
    • The even better news is that I’ve set up a little portal to that URL here, so you can download it without leaving this page
  1. Click the option for the OS you have (Windows/Mac/Linux)
  2. Then under the “latest release”
  1. For windows users, select “base” then “Download R…” (the top options)
  2. For mac users, select either apple silicon or intel options depending on how new your mac is
    • If you need to check which kind your mac is, hit the apple logo in the top left of your screen, then “About This Mac” then under “Processor” it will either intel or silicon. I can also help you with this in class if needed.
  1. R will then download, double click on the download when it’s finished and then follow the on-screen prompts
  2. R is now (hopefully) installed!
  • Note: If you’re using a work computer you may run into issues with administrator privileges, we can work on this individually

Installing RStudio

  • Technically, R is all you need to do all of our analyses. However, to make it accessible and usable, we also need a “development environment”
    • The reason of this, unlike a computer program like Stata or SAS, R is a programming language (same as Python, C++, etc.), that’s what we just installed
    • The easiest way to use programming languages is through a “development environment”
      • There are multiple “development environments” you can use for R. VSCode is a great option by Microsoft for using a variety of languages, but, the best option for R is RStudio as it is purpose built for the language (it also works with Python too)
  • I did try to create another little portal below, but, sadly Posit (the company who created RStudio) have disabled the way I do that. Instead…
  1. Go to this site and click the “Download RStudio Desktop for…” button underneath “2: Install RStudio” (we already did step 1)
  • This is simpler than installing R, Posit have a more sophisticated website which will automatically download the right version for your computer
  1. RStudio will then download, double click on the download when it’s finished and then follow the on-screen prompts
  • For mac users, this will just be drag n’ drop RStudio into your Applications folder
  1. RStudio is now (hopefully) installed!
  • Note: If you’re using a work computer you may run into issues with administrator privileges, we can work on this individually

Let’s See What We Just Installed

  • Hopefully, you should now be able to open RStudio on your computer (it should be the same place all your software is kept)
    • Go ahead and open it up!

By default, RStudio has 3-4 main frames:

  1. Top left: Script window (will be closed at first if you don’t have any scripts open)
  2. Bottom left: Console
  3. Top right: Environment / History / Connections
  4. Bottom right: Files / Plots / Packages / Help / Viewer

For today, we are mostly going to explore some basic features of R using the console, copying and pasting commands from the website rather than saving them in a script. Fear not, however, as there is already an .R script containing all these commands saved in your class folder that we will set up next week. - All an .R script does is save your code and pass it line-by-line to the console - After today, anything we want to save will be done through a script, anything we just need to run one time will be done in console

Basic R Commands

First, let’s try the traditional first command!

print("Hello, World!")
[1] "Hello, World!"

We can also use R like a basic calculator

1 + 1
[1] 2

Assignment

  • The first two commands we ran simply spat the output out in the console

    • This can be useful if you want to check something quickly or if we have our final output
  • More often, though, we want to save the output to our R Environment (top right panel)

  • To do this, we need to assign the output to an object”

R is a type of object-oriented programming environment. This means that R thinks of things in its world as objects, which are like virtual boxes in which we can put things: data, functions, and even other objects.

  • In R (for quirky reasons), the primary means of assignment is the arrow, <-, which is a less than symbol, <, followed by a hyphen, -.
    • You can use = (which is more common across other programming languages), and you may see this “in the wild”
    • But R traditionalists prefer <- for clarity and readability
    • Which you use is up to you! (I generally use <-)
## assign value to object x using <-
x <- 1

But’s where’s the output?

  • Check out the “Environment” tab on the top left panel
    • We see something called x has a value of 1
      • Now let’s call that object
## what's in x?
x
[1] 1

Note: the [1] is just the index (order) number, if we had more than 1 thing in our object, that would be more useful

Quick exercise

Using the arrow, assign the output of 1 + 1 to x. Next subtract 1 from x and reassign the result to x. Show the value in x.

Comments

For this section, let’s just open a blank .R script in RStudio (again, all these commands will be in a script in your class folder we set up next week)

  • Comments in R are set off using the hash or pound character at the beginning of the line: #
  • The comment character tells R to ignore the line

Quick exercise

Type the phrase “This is a comment” directly into the R console both with and without a leading “#”. What happens each time?

  • You may notice I (Ben, and therefore, now me) use two hashes
    • You can use only a single # for your comments if you like, R treats them all the same
    • If you’re typing longer comments ##' (two hashes and an apostrophe) is really useful in RStudio, as it automatically comments the next line (although this can be annoying at times too)
    • If you want to take your comments to the next level ##' in RStudio also enables some fun color coding options with @, [], **, and :
      • These can be useful in longer scripts to draw attention to specific points
      • Note: these only show up in RStudio
##' @Matt needs to work on this  
##' Matt needs to work on [this]
##' Matt *needs* to work on this
##' [Matt: needs to work on this]
  • Lastly, RStudio can comment/uncomment multiple lines of code you’ve already written
    • On the top menu bar select “Code” then “Commment/Uncomment Lines”
      • Also see the keyboard shortcut next to that option!
  • This is a big time saver!
## Try commenting/uncommenting the below line

# Matt <- "Hi"

Data types and structures

R uses variety of data types and structures to represent and work with data. There are many, but the major ones that you’ll use most often are:

  • logical
  • numeric (integer & double)
  • character
  • vector
  • matrix
  • list
  • dataframe

Let’s see what type of object x we created earlier is

typeof(x)
[1] "double"

What if we make it “1”?

x <- "1"
typeof(x)
[1] "character"

Understanding the nuanced differences between data types is not important right now. Just know that they exist and that you’ll gain an intuitive understanding of them as you become better aquainted with R.

Packages

  • User-submitted packages are a huge part of what makes R great
  • You may hear me use the phrases “base R” or “vanilla R” during class
    • What I mean by this is the R that comes as you download it with no packages loaded
    • While it’s powerful in and of itself — you can do everything you need with base R — most of your scripts will make use of one of more contributed packages = These will make your data analytic life much nicer. We’ll lean heavily on the tidyverse suite of packages this semester.

Installing packages from CRAN

  • Many contributed packages are hosted on the CRAN package repository. - What’s really nice about CRAN is that packages have to go through quite a few checks in order for CRAN to approve and host them. Checks include;
    • Making sure the package has documentation
    • Works on a variety of systems
    • Doesn’t try to do odd things to your computer
  • The upshot is that you should feel okay downloading these packages from CRAN

To download a package from CRAN, use:

install.packages("<package name>")

NOTE Throughout this course, if you see something in triangle brackets (<...>), that means it’s a placeholder for you to change accordingly.

Many packages rely on other packages to function properly. When you use install.packages(), the default option is to install all dependencies. By default, R will check how you installed R and download the right operating system file type.

Quick exercise

Install the tidyverse package, which is really a suite of packages that we’ll use throughout the semester. Don’t forget to use double quotation marks around the package name:

install.packages("tidyverse")

Loading package libraries

Package libraries can loaded in a number of ways, but the easiest it to write:

library("<library name>")

where "<library name>" is the name of the package/library. You will need to load these before you can use their functions in your scripts. Typically, they are placed at the top of the script file.

For example, let’s load the tidyverse library we just installed:

## load library (note quirk that you don't need quotes here)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Notice that when you load the tidyverse (which, again, is actually loading a number of other libraries), you see a lot of output. Not all packages are this noisy, but the information is useful here because it shows all the libraries that are now loaded and ready for you to use.

Help

I don’t have every R function and nuance memorized, so I certainly don’t expect that you will. With all the user-written packages, it would be difficult to keep up if I tried! When stuck, there are a few ways to get help.

Help files

In the console, typing a function name immediately after a question mark will bring up that function’s help file (in RStudio, you should see in the bottom right panel):

## get help file for function
?median

Two question marks will search for the command name in CRAN packages (again, in the bottom right facet):

## search for function in CRAN
??median

At first, using help files may feel like trying to use a dictionary to see how to spell a word — if you knew how to spell it, you wouldn’t need the dictionary! Similarly, if you knew what you needed, you wouldn’t need the help file. But over time, they will become more useful, particularly when you want to figure out an obscure option that will give you exactly what you need.

Package Website

  • While all R packages have to have help files, not all R packages have nice webpages. However, a lot of the main ones do
    • I find them much nicer than the CRAN helpfiles

For example, here’s another magic portal to the tidyverse’s dplyr website (you may spent a good amount of time here this semester)

Usually I Google something like “<package name> R” and the website comes up

You can find links to all the tidyverse packages here

Google it!

Google is a coder’s best friend. If you are having a problem, odds are a 1,000+ other people have too and at least one of them has been brave enough (people can be mean on the internet) to ask about it in a forum like StackOverflow, CrossValidated, or R-help mailing list.

If you are lucky, you’ll find the exact answer to your question. More likely, you’ll find a partial answer that you’ll need to modify for your needs. Sometimes, you’ll find multiple partial answers that, in combination, help you figure out a solution. It can feel overwhelming at first, particularly if it’s a way of problem-solving that’s different from what you’re used to. But it does become easier with practice.

Useful packages

We’re going to use a number of packages this semester. While we may need more than this list — and you almost certainly will in your own future work — let’s install these to get us started.

Quick exercise

Install the following packages using the install.packages() function:

  • devtools
  • knitr
  • `rmarkdown``
  • quarto

There’s no official assignment this week, just make sure you have been able to install everything covered in this lesson. If you haven’t, please set up a meeting with me ASAP!

If you really want something to do, start thinking about ideas for your final project!