Step into the R world

R语言入门

Qingyao Zhang

Department of Psychology, College of Teacher Education, Ningxia University

1 Preface

1.1 What is R

R logo

1.2 What Are R Packages

  • Basically, an R package is a collection of functions.
  • Functions are tools for computing and plotting.
read.csv(
  file,
  header = TRUE,
  sep = ",",
  quote = "\"",
  dec = ".",
  fill = TRUE,
  comment.char = "",
  ...
)

1.3 What Is RStudio

  • RStudio is an integrated development environment (IDE) for R and Python.
  • It includes a console, syntax-highlighting editor that supports direct code execution, and tools for plotting, history, debugging, and workspace management.
  • RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).
  • RStudio website, https://posit.co/products/open-source/rstudio/

1.4 Usage of RStudio

  • Edit multi-line codes in the Source panel.
  • Run codes in the Source panel from top to end line by line.
  • Check variables in the Environment panel.
  • Check print in the Console panel.

1.5 How to Learn R

2 Install

2.1 Install R and RStudio

2.2 Install Packages

  • Official version

    install.packages("devtools")
  • Developing version

    • The 1st way of using a function
    devtools::install_github(repo = "qyaozh/Keng",
                             dependencies = TRUE,
                             build_vignettes = TRUE)
    • The 2nd way of using a function
    library(devtools)
    install_github(repo = "qyaozh/Keng",
                   dependencies = TRUE,
                   build_vignettes = TRUE)

3 Value and Type

  • Integer
x <- 3
x
## [1] 3
  • Double (real, or decimal)
x <- 3.1415926
x
## [1] 3.141593
  • Character
x <- "Educational psychology"
x
## [1] "Educational psychology"
  • Factor
x <- c("grade1", "grade2", "grade3")
x
## [1] "grade1" "grade2" "grade3"
x <- factor(x)
x
## [1] grade1 grade2 grade3
## Levels: grade1 grade2 grade3
levels(x)
## [1] "grade1" "grade2" "grade3"
labels(x)
## [1] "1" "2" "3"
as.numeric(x)
## [1] 1 2 3
x <- ordered(x)
x
## [1] grade1 grade2 grade3
## Levels: grade1 < grade2 < grade3
  • Date
x <- strptime("2024-11-14", format = "%Y-%m-%d")
x
## [1] "2024-11-14 CST"
  • Logical
x <- (1 == 2); y <- (1 < 2)
x; y
## [1] FALSE
## [1] TRUE
  • Complex
x <- 1 + 2i
x
## [1] 1+2i
  • Special values
# empty value
NULL
# missing value
NA
# missing character value
NaN
# infinite
Inf

4 Variable and Vector

  • A vector is similar to a variable.
  • c() combines its arguments to form a vector.
x <- 2
is.vector(x)
## [1] TRUE
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x <- c("莫", "听", "穿", "林", "打", "叶", "声")
x
## [1] "莫" "听" "穿" "林" "打" "叶" "声"
x <- paste0(x, collapse = "")
x
## [1] "莫听穿林打叶声"

5 Symbols, signs, and operators

  • # Comment
# 1 + 1
1 + 1
## [1] 2
  • : Generate regular sequences (interaction)
-5:5
##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
  • ; Separate multiple command in one line,
1 + 1 ; 2 * 3
## [1] 2
## [1] 6
  • <- Assign
x <- 2 * 3
x
## [1] 6
  • $, [],[[]] subset, extract

See below.

  • - Negative numbers
x <- 2 * (-3)
x
## [1] -6
  • Logic operator
TRUE & FALSE
## [1] FALSE
TRUE | FALSE
## [1] TRUE
!FALSE
## [1] TRUE
1 == 2
## [1] FALSE
1 != 2
## [1] TRUE
  • Arithmetic Operators
1 + 2; 1 - 2; 2 * 3; 2/3
## [1] 3
## [1] -1
## [1] 6
## [1] 0.6666667
(2024 - 1949)/30
## [1] 2.5
2^3
## [1] 8
5 %% 2
## [1] 1
5 %/% 2
## [1] 2

6 Data entry

6.1 list

id <- 1:10 + 100; gender <- rep(0:1, 5); depression <- sample.int(4, 10, TRUE)
(dat_list <- list(id = id, gender, depression))
## $id
##  [1] 101 102 103 104 105 106 107 108 109 110
## 
## [[2]]
##  [1] 0 1 0 1 0 1 0 1 0 1
## 
## [[3]]
##  [1] 1 1 2 4 3 3 3 1 4 2
dat_list$id
##  [1] 101 102 103 104 105 106 107 108 109 110
dat_list[2]
## [[1]]
##  [1] 0 1 0 1 0 1 0 1 0 1
dat_list[[2]]
##  [1] 0 1 0 1 0 1 0 1 0 1

6.2 data.frame

dat_data.frame <- data.frame(id, gender, depression)
head(dat_data.frame)
##    id gender depression
## 1 101      0          1
## 2 102      1          1
## 3 103      0          2
## 4 104      1          4
## 5 105      0          3
## 6 106      1          3
str(dat_data.frame)
## 'data.frame':    10 obs. of  3 variables:
##  $ id        : num  101 102 103 104 105 106 107 108 109 110
##  $ gender    : int  0 1 0 1 0 1 0 1 0 1
##  $ depression: int  1 1 2 4 3 3 3 1 4 2

View the whole data set:

View(dat_data.frame)

6.2.1 rownames and colnames

rownames(dat_data.frame)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
colnames(dat_data.frame)
## [1] "id"         "gender"     "depression"
(colnames(dat_data.frame) <- c("id", "x", "y"))
## [1] "id" "x"  "y"
colnames(dat_data.frame)[2] <- "gender"
colnames(dat_data.frame)[3] <- "depression"
colnames(dat_data.frame)
## [1] "id"         "gender"     "depression"

6.2.2 Subset (extract parts of) the data.frame

dat_data.frame$gender
##  [1] 0 1 0 1 0 1 0 1 0 1
dat_data.frame[2]
##    gender
## 1       0
## 2       1
## 3       0
## 4       1
## 5       0
## 6       1
## 7       0
## 8       1
## 9       0
## 10      1
dat_data.frame[,2]
##  [1] 0 1 0 1 0 1 0 1 0 1
dat_data.frame[2,]
##    id gender depression
## 2 102      1          1
dat_data.frame[2,2]
## [1] 1
dat_data.frame[1:3,c(1,3)]
##    id depression
## 1 101          1
## 2 102          1
## 3 103          2
dat_data.frame[1, "depression"];dat_data.frame["1", "depression"]
## [1] 1
## [1] 1
dat_data.frame[c(2,4), "depression"];dat_data.frame[c("2","4"), "depression"]
## [1] 1 4
## [1] 1 4
dat_data.frame[dat_data.frame$gender == 0,]
##    id gender depression
## 1 101      0          1
## 3 103      0          2
## 5 105      0          3
## 7 107      0          3
## 9 109      0          4
dat_data.frame[dat_data.frame$depression > 2,]
##    id gender depression
## 4 104      1          4
## 5 105      0          3
## 6 106      1          3
## 7 107      0          3
## 9 109      0          4

6.3 Matrix

dat_byrow <- c(101, 1, 2,  102, 0, 5,  103, 1, 4,  104, 1, 1,  105, 0, 3,
               106, 0, 5,  107, 0, 2,  108, 1, 4,  109, 0, 2,  110, 1, 2)
mat <- matrix(dat_byrow, byrow = TRUE, nrow = 10,
              dimnames = list(0:9, c("id", "gender", "depression")))
mat
##    id gender depression
## 0 101      1          2
## 1 102      0          5
## 2 103      1          4
## 3 104      1          1
## 4 105      0          3
## 5 106      0          5
## 6 107      0          2
## 7 108      1          4
## 8 109      0          2
## 9 110      1          2

7 Working directory

  • Working directory is the file folder R works (read and write files).
    • getwd() gets the current working directory, setwd() sets the working directory.
  • The way R and Windows write file path differs, R uses “/”, but Windows uses “\”.
  • You could use RStudio’s file panel to :
    • Go To Working Directory
    • Set As Working Directory

7.1 Advice on the working directory

  • Use getwd() or RStudio to get and go to R’s working directory.
  • Create a new folder (directory) named with your name and school ID under this directory (e.g., qingyaozhang2024040).
  • Go into the qingyaozhang2024040 file folder.
  • Use setwd() or RStudio to set the new folder qingyaozhang2024040 as your working directory.

8 Import and Export Data

8.1 Use base R

  • Import the .csv Data
    • Step 1, Enter the data into Excel or SPSS.
    • Step 2, Export the data into the .csv file.
    • Step 3, Import the data into R using read.csv().
dat <- read.csv(file)

read .csv file from R’s working directory:

dat <- read.csv("depress.csv")
head(dat)

read .csv file from a directory other than R’s working directory, e.g.,

dat <- read.csv("C:/Users/Yao/Documents/useR/depress.csv")
head(dat)
  • Export the data using write.csv().
write.csv(
  x,
  file = "", 
  row.names = TRUE
)

Example:

write.csv(dat, file = "dat.csv", row.names = FALSE)

8.2 Other file formats and packages

  • .xlsx file, readxl package, read_excel(), read_xls(), read_xlsx()
  • .sav file, haven package, read_spss(), read_sav(), write_sav()
  • .dta file, haven package, read_stata(), read_dta(), write_dta()

11 Thanks