Announcement

R basics

styles

(reading assignment)

Checkout Style guide in Advanced R and the tidyverse style guide.

Arithmetic

R can do any basic mathematical computations.

symbol use
+ addition
- subtraction
* multiplication
/ division
^ power
%% modulus
exp() exponent
log() natural logarithm
sqrt() square root
round() rounding
floor() flooring
ceiling() ceiling

Objects

You can create an R object to save results of a computation or other command.

Example 1

x <- 3 + 5
x
## [1] 8
  • In most languages, the direction of passing through the value into the object goes from right to left (e.g. with “=”). However, R allows both directions (which is actually bad!). In this course, we encourage the use of “<-” or “=”. There are people liking “=” over “<-” for the reason that “<-” sometimes break into two operators “< -”.

Example 2

x < - 3 + 5
## [1] FALSE
x
## [1] 8
  • For naming conventions, stick with either “.” or "_" (refer to the style guide).

Example 3

sum.result <- x + 5
sum.result
## [1] 13
  • important: many names are already taken for built-in R functions. Make sure that you don’t override them.

Example 4

sum(2:5)
## [1] 14
sum
## function (..., na.rm = FALSE)  .Primitive("sum")
sum <- 3 + 4 + 5
sum(5:8)
## [1] 26
sum
## [1] 12
  • R is case-sensitive. “Math.7260” is different from “math.7260”.

Locating and deleting objects:

The commands “objects()” and “ls()” will provide a list of every object that you’ve created in a session.

objects()
## [1] "sum"        "sum.result" "x"
ls()
## [1] "sum"        "sum.result" "x"

The “rm()” and “remove()” commands let you delete objects (tip: always clearn-up your workspace as the first command)

rm(list=ls())  # clean up workspace

Vectors

Many commands in R generate a vector of output, rather than a single number.

The “c()” command: creates a vector containing a list of specific elements.

Example 1

c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
##  [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
##  [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
##  [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0

Example 2 The command “seq()” creates a sequence of numbers.

seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
##  [1]  3  9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1]  3.0 16.4 29.8 43.2 56.6 70.0

Operations on vectors

Use brackets to select element of a vector.

x <- 73:60
x[2]
## [1] 72
x[2:5]
## [1] 72 71 70 69
x[-(2:5)]
##  [1] 73 68 67 66 65 64 63 62 61 60

Can access by “name” (safe with column/row order changes)

y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]
## mi 
##  3
y["mi"]
## mi 
##  3

R commands on vectors

command usage
sum() sum over elements in vector
mean() compute average value
sort() sort elements in a vector
min(), max() min and max values of a vector
length() length of a vector
summary() returns the min, Q1, median, mean, Q3, and max values of a vector
sample(x, size, replace = FALSE, prob = NULL) takes a random sample from a vector with or without replacement

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Matrix

matrix() command creates a matrix from the given set of values

matrix.example <- matrix(rnorm(100), nrow = 10, ncol = 10, byrow = TRUE)
matrix.example
##             [,1]        [,2]        [,3]         [,4]        [,5]        [,6]
##  [1,]  0.8355661 -0.48087083 -2.20572837  1.510554641 -0.59861246 -0.06874718
##  [2,]  1.5992211  0.03401918  1.80189564 -0.113455420  1.13700996  0.27726071
##  [3,]  0.9109775 -0.04390258  1.13523090  1.190677739 -0.33765936 -0.16697844
##  [4,]  0.1628932 -0.01428936  0.74356519 -1.312061346  0.26636641 -0.17877295
##  [5,] -0.6843322 -1.56893013  0.07663298  0.005529418  0.79987070 -0.03256946
##  [6,]  0.9899949  0.71070119  2.24731978 -0.897413550  0.97251780  0.41642828
##  [7,]  0.3514811 -0.17400926 -1.19848306 -0.458326860  0.60108663  0.04112164
##  [8,] -0.2875559  1.93461089  0.17775804  0.404335794 -0.36518636  0.48601172
##  [9,] -0.8629899 -1.35922304  0.40899403 -1.593003030  0.79208143  0.50700409
## [10,] -1.0119713 -1.15591971 -0.31448111 -0.920956136  0.06405575  1.59355910
##              [,7]       [,8]         [,9]       [,10]
##  [1,]  0.27726585  1.2310765 -1.477174352  1.44310950
##  [2,] -1.16486661 -1.6610572  0.032608476 -1.20098907
##  [3,]  0.08068534  0.2399388 -0.579789562  0.44034588
##  [4,]  0.48256782 -0.1400518 -1.919090778 -0.88090856
##  [5,] -1.04198144 -0.4201974 -0.754682301 -1.17164504
##  [6,]  0.56111808 -0.6417580 -2.138381019 -0.69683164
##  [7,] -1.16526091 -1.6192675  0.009634588  0.42265076
##  [8,]  0.99307683  0.6544635 -0.160829985 -0.04791906
##  [9,]  1.12384781 -0.7115467 -0.092345426  1.84028609
## [10,] -0.27000413  1.1670348 -0.166059145  0.51957273

R commands on vector/matrix

command usage
sum() sum over elements in vector/matrix
mean() compute average value
sort() sort all elements in a vector/matrix
min(), max() min and max values of a vector/matrix
length() length of a vector/matrix
summary() returns the min, Q1, median, mean, Q3, and max values of a vector
dim() dimension of a matrix
cbind() combine a sequence of vector, matrix or data-frame arguments and combine by columns
rbind() combine a sequence of vector, matrix or data-frame arguments and combine by rows
names() get or set names of an object
colnames() get or set column names of a matrix-like object
rownames() get or set row names of a matrix-like object
sum(matrix.example)
## [1] -1.82345
mean(matrix.example)
## [1] -0.0182345
sort(matrix.example)
##   [1] -2.205728372 -2.138381019 -1.919090778 -1.661057234 -1.619267538
##   [6] -1.593003030 -1.568930132 -1.477174352 -1.359223035 -1.312061346
##  [11] -1.200989070 -1.198483063 -1.171645037 -1.165260914 -1.164866613
##  [16] -1.155919712 -1.041981444 -1.011971334 -0.920956136 -0.897413550
##  [21] -0.880908560 -0.862989931 -0.754682301 -0.711546654 -0.696831644
##  [26] -0.684332238 -0.641757993 -0.598612461 -0.579789562 -0.480870833
##  [31] -0.458326860 -0.420197443 -0.365186363 -0.337659365 -0.314481111
##  [36] -0.287555876 -0.270004131 -0.178772954 -0.174009259 -0.166978437
##  [41] -0.166059145 -0.160829985 -0.140051822 -0.113455420 -0.092345426
##  [46] -0.068747184 -0.047919062 -0.043902581 -0.032569460 -0.014289358
##  [51]  0.005529418  0.009634588  0.032608476  0.034019179  0.041121635
##  [56]  0.064055752  0.076632979  0.080685335  0.162893196  0.177758043
##  [61]  0.239938796  0.266366408  0.277260707  0.277265846  0.351481051
##  [66]  0.404335794  0.408994027  0.416428284  0.422650764  0.440345880
##  [71]  0.482567817  0.486011722  0.507004094  0.519572734  0.561118078
##  [76]  0.601086630  0.654463536  0.710701188  0.743565190  0.792081427
##  [81]  0.799870700  0.835566052  0.910977533  0.972517795  0.989994890
##  [86]  0.993076826  1.123847810  1.135230900  1.137009965  1.167034812
##  [91]  1.190677739  1.231076535  1.443109501  1.510554641  1.593559103
##  [96]  1.599221093  1.801895644  1.840286087  1.934610889  2.247319778
summary(matrix.example)
##        V1                V2                 V3                V4         
##  Min.   :-1.0120   Min.   :-1.56893   Min.   :-2.2057   Min.   :-1.5930  
##  1st Qu.:-0.5851   1st Qu.:-0.98716   1st Qu.:-0.2167   1st Qu.:-0.9151  
##  Median : 0.2572   Median :-0.10896   Median : 0.2934   Median :-0.2859  
##  Mean   : 0.2003   Mean   :-0.21178   Mean   : 0.2873   Mean   :-0.2184  
##  3rd Qu.: 0.8921   3rd Qu.: 0.02194   3rd Qu.: 1.0373   3rd Qu.: 0.3046  
##  Max.   : 1.5992   Max.   : 1.93461   Max.   : 2.2473   Max.   : 1.5106  
##        V5                V6                V7                 V8         
##  Min.   :-0.5986   Min.   :-0.1788   Min.   :-1.16526   Min.   :-1.6611  
##  1st Qu.:-0.2372   1st Qu.:-0.0597   1st Qu.:-0.84899   1st Qu.:-0.6941  
##  Median : 0.4337   Median : 0.1592   Median : 0.17898   Median :-0.2801  
##  Mean   : 0.3332   Mean   : 0.2874   Mean   :-0.01236   Mean   :-0.1901  
##  3rd Qu.: 0.7979   3rd Qu.: 0.4686   3rd Qu.: 0.54148   3rd Qu.: 0.5508  
##  Max.   : 1.1370   Max.   : 1.5936   Max.   : 1.12385   Max.   : 1.2311  
##        V9                V10          
##  Min.   :-2.13838   Min.   :-1.20099  
##  1st Qu.:-1.29655   1st Qu.:-0.83489  
##  Median :-0.37292   Median : 0.18737  
##  Mean   :-0.72461   Mean   : 0.06677  
##  3rd Qu.:-0.10947   3rd Qu.: 0.49977  
##  Max.   : 0.03261   Max.   : 1.84029

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Comparison (logic operator)

symbol use
!= not equal
== equal
> greater
>= greater or equal
< smaller
<= smaller or equal
is.na is it “Not Available”/Missing
complete.cases returns a logical vector specifying which observations/rows have no missing values
is.finite if the value is finite
all are all values in a logical vector true?
any any value in a logical vector is true?
test.vec <- 73:68
test.vec
## [1] 73 72 71 70 69 68
test.vec < 70
## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE
test.vec > 70
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
test.vec[3] <- NA
test.vec
## [1] 73 72 NA 70 69 68
is.na(test.vec)
## [1] FALSE FALSE  TRUE FALSE FALSE FALSE
complete.cases(test.vec)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
all(is.na(test.vec))
## [1] FALSE
any(is.na(test.vec))
## [1] TRUE

Now let’s do a test of accuracy for doubles in R. Recall that for Double precision, we get approximately \(\log_{10}(2^{52}) \approx 16\) decimal point for precision.

test.exponent <- -(7:18)
10^test.exponent == 0
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 - 10^test.exponent == 1
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
7360 - 10^test.exponent == 7360
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
73600 - 10^test.exponent == 73600
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Other operators

%in%, match

test.vec
## [1] 73 72 NA 70 69 68
66 %in% test.vec
## [1] FALSE
match(66, test.vec, nomatch = 0)
## [1] 0
70 %in% test.vec
## [1] TRUE
match(70, test.vec, nomatch = 0)
## [1] 4
match(70, test.vec, nomatch = 0) > 0 # the implementation of %in%
## [1] TRUE

Control flow

These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like (Algol short for “Algorithmic Language”) language. They are all reserved words.

keyword usage
if if(cond) expr
if-else if(cond) cons.expr else alt.expr
for for(var in seq) expr
while while(cond) expr
break breaks out of a for loop
next halts the processing of the current iteration and advances the looping index

Define a function

Read Function section from Advanced R by Hadley Wickham. We will visit functions in more details.

DoNothing <- function() {
  return(invisible(NULL))
}
DoNothing()

In general, try to avoid using loops (vectorize your code) in R. If you have to loop, try using for loops first. Sometimes, while loops can be dangerous (however, a smart compiler should detect this).

DoBadThing <- function() {
  result <- NULL
  while(TRUE) {
    result <- c(result, rnorm(100))
  }
  return(result)
}
# DoBadThing()

Install packages

You can install R packages from several places (reference):

  • Comprehensive R Archive Network (CRAN)

    • Official R packages repository

    • Some levels of code checks (cross platform support, version control etc)

    • Most common place you will install packages

    • Pick a mirror location near you

    • install.packages("packge_name")

  • GitHub

    • May get development version of a package

    • Almost zero level of code checks

    • Common place to develop a package before submitting to CRAN

      install.packages("devtools")
      library(devtools)
      install_github("tidyverse/ggplot2")

Load packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
require(tidyverse)