Tidyverse学习笔记

Tidyverse

发布于

2025年10月13日

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tibble)

1 c_across()

dplyr文档中介绍到,与c()相比,(1)c_across()使用tidy select,更加便捷; (2)c_across()使用vctrs::vec_c(),给出的输出更加安全。

tidy select只有15种(见help("select"))::!&c()everything()last_col()group_cols()starts_with()ends_with()contains()matches()num_range()all_of()any_of()where()

下面的例子在data.frame中对c()c_across()进行了比较,代码中注释了结果正确与否。

set.seed(20250927)
toy_dat <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
head(toy_dat)
##           x1         x2         x3
## 1 -2.7734224 -0.6661558  1.0882648
## 2 -1.1618233 -0.9076756  0.1483481
## 3 -0.3773254  0.1182323  1.0507643
## 4  0.2743375 -0.1521034  0.7412800
## 5  0.6070295  0.7295347 -0.5049492
## 6 -0.3429219 -0.1650072  2.3721901
toy_dat |>
  rowwise() |> 
  mutate(
    # correct
    y1 = mean(c(x1, x2, x3)),
    # incorrect
    y2 = mean(x1:x3),
    # incorrect
    y3 = mean(c(x1:x3)),
    # correct
    y4 = mean(c_across(x1:x3))
  ) |> 
  head()
## # A tibble: 6 × 7
## # Rowwise: 
##       x1     x2     x3     y1     y2     y3     y4
##    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 -2.77  -0.666  1.09  -0.784 -1.27  -1.27  -0.784
## 2 -1.16  -0.908  0.148 -0.640 -0.662 -0.662 -0.640
## 3 -0.377  0.118  1.05   0.264  0.123  0.123  0.264
## 4  0.274 -0.152  0.741  0.288  0.274  0.274  0.288
## 5  0.607  0.730 -0.505  0.277  0.107  0.107  0.277
## 6 -0.343 -0.165  2.37   0.621  0.657  0.657  0.621

下面的例子在tibble中对c()c_across()进行了比较,代码中注释了结果正确与否。

toy_dat <- as_tibble(toy_dat)
head(toy_dat)
## # A tibble: 6 × 3
##       x1     x2     x3
##    <dbl>  <dbl>  <dbl>
## 1 -2.77  -0.666  1.09 
## 2 -1.16  -0.908  0.148
## 3 -0.377  0.118  1.05 
## 4  0.274 -0.152  0.741
## 5  0.607  0.730 -0.505
## 6 -0.343 -0.165  2.37
toy_dat |>
  rowwise() |> 
  mutate(
    # correct
    y1 = mean(c(x1, x2, x3)),
    # incorrect
    y2 = mean(x1:x3),
    # incorrect
    y3 = mean(c(x1:x3)),
    # correct
    y4 = mean(c_across(x1:x3))
  ) |> 
  head()
## # A tibble: 6 × 7
## # Rowwise: 
##       x1     x2     x3     y1     y2     y3     y4
##    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 -2.77  -0.666  1.09  -0.784 -1.27  -1.27  -0.784
## 2 -1.16  -0.908  0.148 -0.640 -0.662 -0.662 -0.640
## 3 -0.377  0.118  1.05   0.264  0.123  0.123  0.264
## 4  0.274 -0.152  0.741  0.288  0.274  0.274  0.288
## 5  0.607  0.730 -0.505  0.277  0.107  0.107  0.277
## 6 -0.343 -0.165  2.37   0.621  0.657  0.657  0.621

可见,在使用rowwise()时,必须将tidy select语法与c_across()结合使用,才能得出正确的计算结果。