第7讲迭代

class: center, middle, inverse, title-slide

.title[
# 量化金融与金融编程
]
.subtitle[
## L07 迭代
]
.author[
### <br>曾永艺
]
.institute[
### 厦门大学管理学院
]
.date[
### <br>2022-11-18
]

---

class: middle, hide_logo
background-image: url(imgs/logo-purrr.png)
background-size: 25%
background-position: 15% 50%

<div>
<style type="text/css">.xaringan-extra-logo {
width: 50px;
height: 50px;
z-index: 0;
background-image: url(imgs/logo-sm.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:0.5em;right:0.5em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

.pull-right.font200[

1. 控制流结构

2. 函数式编程

3. `purrr` 包

4. `purrr` 包 与 列表列

5. `purrr` -> `furrr`

]

---
layout: false
class: inverse, center, middle

# 1. 控制流结构

.font150[( _control-flow constructs in R_ )]

---
class: middle

.font150.bold[

> 日出日落，月圆月缺，年尾年头，这是“.red[循环]”；  
> 
> 上学还是就业，单身还是结婚，丁克还是生娃，这是“.red[分支]”；  
> 
> 不管是循环还是分支，都嵌入在生老病死的时间轴上，这是“.red[顺序]”； 
> 
> 所谓尽人事听天命，想来就是心平气和地接受顺序结构，小心翼翼地制定循环结构，在关键时刻控制好分支结构，就这样度过一生罢。 
> <br><br>
>                       .right[—— 大鹏志，转引自《学R》]

]

---
layout: true

### >> 1.1 `for` 循环结构

---

.full-width[.content-box-blue.bold.font120[`for` 循环属于命令式编程（_imperative programming_）中的重复执行范式]]
--

.pull-left.code90[

```r
library(tidyverse)
set.seed(1234)
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10)
)
```
]

.pull-right.code90[

```r
# 计算各列的均值
mean(df$a)
mean(df$b)
mean(df$c)
```

```
#> [1] -0.383
#> [1] -0.118
#> [1] -0.388
```
]

.code90[

```r
# for循环的三个组成部分
output <- vector("double", ncol(df))  # 1. output    输出
for(i in seq_along(df)) {             # 2. sequence  循环序列
  output[[i]] <- mean(df[[i]])        # 3. body      循环体
}
output
```

```
#> [1] -0.383 -0.118 -0.388
```
]

---

.full-width[.content-box-blue.bold.font120[`for` 循环的三种模式]]

- `for(x in xs)`：逐个元素循环，当你只关注副作用（如作图、存盘）时，这是最有用的模式
  
--

- `for(nm in names(xs))`：逐个名字循环，在循环体中用 `xs[[nm]]` 得到命名向量 `xs` 元素的值。你还可以让输出的名字和输入的名字对应起来：
  
> .code80[

```r
output <- vector("list", length(xs))
*names(output) <- names(xs)
for(nm in names(xs)) {
  output[[nm]] <- .f(xs[[nm]], ...)
}
```
]

- `for(i in seq_along(xs))`：逐个数值索引循环，这是最通用的模式。下面语句会给出元素的名字和取值：
  
> .code80[

```r
for(i in seq_along(xs)) {
  name <- names(xs)[[i]]
  value <- xs[[i]]
  # ...
}
```
]

---

.full-width[.content-box-blue.bold.font120[特殊情况：_“就地”_修改]]

.code100[

```r
rescale01 <- function(x) {
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
```

```r
# output（输出）已备
# 在body（循环体）中直接调用`[[<-`完成“就地”修改 x           #]#]
for(i in seq_along(df)) {
* df[[i]] <- rescale01(df[[i]])  # 不要使用[]
}
df
```

```
#> # A tibble: 10 × 3
#>       a      b     c
#>   <dbl>  <dbl> <dbl>
#> 1 0.332 0.153  0.782
#> 2 0.765 0      0.473
#> 3 1     0.0651 0.498
#> # … with 7 more rows
```
]

---

.full-width[.content-box-blue.bold.font120[特殊情况：事前无法确定_输出_的长度]]

.pull-left.code90[

```r
# **低效的做法**
set.seed(1234)
means <- c(0, 1, 2)
*out <- double()  # 空的实数向量
for(i in seq_along(means)) {
  n <- sample(100, 1)
  print(glue::glue("L#{i}: n={n}"))
* out <- c(out,
*          rnorm(n, means[[i]]))
}
```

```
#> L#1: n=28
#> L#2: n=79
#> L#3: n=2
```

```r
str(out, vec.len = 2.5)
```

```
#>  num [1:109] 0.312 0.314 0.359 ...
```
]

.pull-right.code90[

```r
# **更好的做法**
# 使用更复杂的数据结构，完成循环后再处理
set.seed(1234)
means <- c(0, 1, 2)
*out <- vector("list", length(means))
for(i in seq_along(means)) {
  n <- sample(100, 1)
  print(glue::glue("L#{i}: n={n}"))
* out[[i]] <- rnorm(n, means[[i]])
}
*out <- unlist(out)
# purrr::flatten_dbl(out)
str(out, vec.len = 2.5)
```

```
#> L#1: n=28
#> L#2: n=79
#> L#3: n=2
#>  num [1:109] 0.312 0.314 0.359 ...
```

]

---
layout: false

### >> .gray[1.1 `for` 循环结构 +] 1.2 `if` 分支结构

.full-width[.content-box-blue.bold.font120[特殊情况：事前无法确定_循环_的次数 -> `while()` / `repeat` + `break`]]

.code90[

```r
flip <- function() sample(c("T", "H"), 1)
```
]

.pull-left.code90[

```r
set.seed(111)
nheads <- 0
flips <- character()
*while(nheads < 3) {
  H_T <- flip()
* if(H_T == "H") {
    nheads <- nheads + 1
* } else {
    nheads <- 0  # 重新计数
  }
  flips <- c(flips, H_T)
}
flips
```

```
#>  [1] "H" "T" "H" "T" "T" "T" "T"
#>  [8] "T" "T" "H" "H" "H"
```
]

.pull-right.code90[

```r
set.seed(111)
nheads <- 0
flips <- character()
*repeat {
  H_T <- flip()
  if(H_T == "H") {
    nheads <- nheads + 1
  } else {
    nheads <- 0  # 重新计数
  }
  flips <- c(flips, H_T)
* if(nheads >= 3) break
}
flips
```

```
#>  [1] "H" "T" "H" "T" "T" "T" "T"
#>  [8] "T" "T" "H" "H" "H"
```
]

---
layout: false
class: inverse, center, middle

# 2. 函数式编程

.font150[( _functional programming_ )]

---
layout: false

### >> 2.1 R 语言与函数式编程

.full-width[.content-box-blue.bold.font120[R 语言的核心其实是一种函数式编程（_functional programming_）语言，可通过提供向量化函数和函数式编程工具，避免直接调用 `for` 循环，减少代码重复并提高代码的可读性]]

.pull-left.code80[

```r
# 定义函数，计算数据框每列的均值
col_mean <- function(df) {
  out <- vector("double", length(df))
  for(i in seq_along(df)) {
    out[i] <- mean(df[[i]])
  }
  out
}
# 调用函数
col_mean(df)
```

```
#> [1] 0.572 0.258 0.524
```

.font110.bold[🤔 该如何一般化 `col_`.red[`mean`]`()`？]

.font110.bold[🤩 定义 `col_median()`、`col_sd()` ...？]

.font110.bold[🙅 NOPE! 更好的做法是 -> ]
]

.pull-right.code85[

```r
# ... 使用函数式编程
# 函数作为参数 -> 泛函
*col_summary <- function(df, .fun) {
  out <- vector("double", length(df))
  for(i in seq_along(df)) {
*   out[i] <- .fun(df[[i]])
  }
  out
}
```

```r
col_summary(df, mean)
```

```
#> [1] 0.572 0.258 0.524
```

```r
col_summary(df, sd)
```

```
#> [1] 0.290 0.313 0.329
```
]

---
layout: false

### >> 2.2 `apply` 族函数 & `Reduce()` 等高阶函数

.font95[
- `apply(X, MARGIN, FUN, ...)`    
  `sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)`
  
- `lapply(X, FUN, ...)`    
  `sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)`    
  `vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)`    
  `rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)`   
  `replicate(n, expr, simplify = "array")`    
  `mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)`    
  `eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)` 
  
- `tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)`    
  `by(data, INDICES, FUN, ..., simplify = TRUE)`    
  `aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)` ...
  
- `Reduce(f, x, init, right = FALSE, accumulate = FALSE)`    
  `Filter(f, x)`    
  `Find(f, x, right = FALSE, nomatch = NULL)`    
  `Map(f, ...)`    
  `Position(f, x, right = FALSE, nomatch = NA_integer_)`
]

---
layout: false
class: inverse, center, middle

# 3. `purrr` 包 <sup>.font60[v0.3.5]</sup>

.font150[( _Functional Programming Tools_ )]

---
layout: true

### >> 3.1 `purrr` 包的 `map` 族函数

---

.full-width[.content-box-blue.bold.font120[`map` 族函数：_"Learn it once, use it everywhere!" - Jenny Bryan_]]

.font100[
- **参数统一**：`map()`、`map_*()`、`modify()` 和 `walk()` 函数的第1个参数 `.x` 为输入向量（包括原子向量和列表），第2个参数 `.f` 为函数，第3个参数 `...` 为传递给 `.f` 的额外参数。
]

.font100[
- **返回结果一目了然**：`map()`、`map_*()`、`modify()` 和 `walk()` 的返回结果均为与输入向量等长的向量，`map()` 返回列表，`map_*()` 返回向量的类由函数名中的后缀决定，`modify()` 返回与输入向量同类的输出，而 `walk()` 则不可见地返回输入向量。
]

.font100[
- **一通百通**：输入扩展至2个向量（如 `map2(.x, .y, .f, ...)`）和多个向量构成的列表 `.l`（如 `pmap(.l, .f, ...)`），还可同时应用于向量元素和索引（如 `imap(.x, .f, ...)`）。
]

---

.pull-left[

```r
map(df, mean)         # 返回列表
```

```
#> $a
#> [1] 0.572
#> 
#> $b
#> [1] 0.258
#> 
#> $c
#> [1] 0.524
```

```r
map_dbl(df, mean)       # 返回实数向量
# df %>% map_dbl(mean)  # 支持管道操作
```

```
#>     a     b     c 
#> 0.572 0.258 0.524
```

]

.pull-right[

.full-width[.content-box-blue.bold.font120[和 `for` 循环比]]

.font100[

- 用函数式编程函数来完成循环，更加简洁

- 关注完成运算的函数（如 `mean`），而非准备性步骤（如定义输出 `output`）

- 适合用 `%>%` 将不同函数链接起来解决问题

- -> 代码更加易读、易写、易用

]

<br>

.full-width[.content-box-blue.bold.font120[和自行编写的 `col_summary()` 比]]

.font100[

- 后台用 C 编写，效率更高

- 参数 `.f` 支持函数、公式、字符|整数向量

- 结果保留元素的名称

]

.font150.center[👍 ❤️ ✌️]
]

---

.full-width[.content-box-blue.bold.font120[`map` 族函数的参数 `.f` 支持快捷写法]]

.pull-left.code90[

```r
# 匿名函数
models <- mtcars %>% 
  split(.$cyl) %>%  # 得到3个命名列表
* map(function(df)
*       lm(mpg ~ wt, data = df))
# map(\(df) lm(mpg ~ wt, data = df))

# 将匿名函数改写为单侧公式
models <- mtcars %>% 
  split(.$cyl) %>% 
* map(~ lm(mpg ~ wt, data = .x))

str(models, max.level = 1)
```

```
#> List of 3
#>  $ 4:List of 12
#>   ..- attr(*, "class")= chr "lm"
#>  $ 6:List of 12
#>   ..- attr(*, "class")= chr "lm"
#>  $ 8:List of 12
#>   ..- attr(*, "class")= chr "lm"
```
]

.pull-right.code100[

```r
# 单侧公式
models %>% 
  map(summary) %>% 
* map_dbl(~ .x$r.squared)
```

```
#>     4     6     8 
#> 0.509 0.465 0.423
```

```r
# 直接使用 字符向量 提取元素
# 结果同上，从略
models %>% 
  map(summary) %>% 
* map_dbl("r.squared")

# 若你知道元素的具体位置，也可以直接用
# 整数提取元素，但不推荐
```
]

---

.full-width[.content-box-blue.bold.font120[多个输入：.red[`map2(.x, .y, .f, ...)`]、`pmap()` 和 `imap()`]]

.pull-left.code90[

```r
# 1个输入用map()
mu <- list(5, 10, -3)
mu %>% map(rnorm, n = 5)
```

```r
# 2个输入呢？
mu <- list(5, 10, -3)
*sigma <- list(1, 5, 10)
```

```r
# 坚持用map()！
set.seed(1234)
seq_along(mu) %>% 
  map(~ rnorm(5, mu[[.]], 
              sigma[[.]])) %>% 
  str()
```

```
#> List of 3
#>  $ : num [1:5] 3.79 5.28 6.08 2.65 5.43
#>  $ : num [1:5] 12.53 7.13 7.27 7.18 5.55
#>  $ : num [1:5] -7.77 -12.98 -10.76 -2.36 6.59
```

]

.pull-right.code90[

```r
# 还是改用map2()吧，:)
set.seed(1234)
map2(mu, sigma, rnorm, n = 5) %>% 
  str() # 结果相同
```

```
#> List of 3
#>  $ : num [1:5] 3.79 5.28 6.08 2.65 5.43
#>  $ : num [1:5] 12.53 7.13 7.27 7.18 5.55
#>  $ : num [1:5] -7.77 -12.98 -10.76 -2.36 6.59
```

]

---

.full-width[.content-box-blue.bold.font120[多个输入：`map2(.x, .y, .f, ...)`、.red[`pmap(.l, .f, ...)`] 和 `imap()`]]

.pull-left.code110[

```r
# 我还想让抽样样本数n也有所不同！
n <- list(1, 2, 3)
```

```r
# 默认为位置匹配
args1 <- list(n, mu, sigma)
args1 %>%
  pmap(rnorm) %>% str()
```

```
#> List of 3
#>  $ : num 4.03
#>  $ : num [1:2] 4.46 3.74
#>  $ : num [1:3] -8.24 -7.97 -21.06
```

]

.pull-right.code100[

```r
# 使用命名参数列表，匹配.f函数的参数名
# 也可用数据框作为.l参数的取值
args2 <- list(mean = mu, 
              sd = sigma, 
              n = n)
args2 %>% 
  pmap(rnorm) %>% str()
```

]

---

.full-width[.content-box-blue.bold.font120[多个输入：`map2()`、`pmap()` 和 .red[`imap(.x, .f, ...)`]]]

.pull-left.font120[
`imap()`为 _indexed map_ 函数

- 当输入向量 `.x` 的元素有名称时，它是 `map2(.x, names(.x), ...)` 的简便写法

- 当输入向量 `.x` 的元素没有名称时，它 `map2(.x, seq_along(.x), ...)` 的简便写法

- 在你需要同时基于 `.x` 的取值和名称/索引进行计算时，`imap(.x, .f, ...)` 很有用
]

.pull-right.code85[

```r
# 元素没有名称，.y 为元素位置
imap_chr(sample(LETTERS[1:4]), 
         ~ paste0(.y, " -> ", .x))
```

```
#> [1] "1 -> B" "2 -> D" "3 -> C" "4 -> A"
```

```r
# 元素有名称，.y 为元素名
lst <- map(1:4, ~ sample(1000, 10))
names(lst) <- paste0("#", 1:4)
imap_chr(
  lst, 
  ~ glue::glue(
    "样本{.y} 的最大值为 {max(.x)}")
)
```

```
#>                      #1 
#> "样本#1 的最大值为 962" 
#>                      #2 
#> "样本#2 的最大值为 976" 
#>                      #3 
#> "样本#3 的最大值为 942" 
#>                      #4 
#> "样本#4 的最大值为 877"
```
]

---

.full-width[.content-box-blue.bold.font120[不同输出：.red[`modify(.x, .f, ...)`] 和 `walk(.x, .f, ...)`]]

.pull-left.code100[

.full-width[.content-box-blue.bold[`modify()`、`modify2()` 和 `imodify()` 总是返回与输入向量 `.x` 的类 `class` 相同的向量]]

```r
df <- data.frame(
  x = 1:3,
  y = 6:4
)

modify(df, ~ .x * 2)
```

```
#>   x  y
#> 1 2 12
#> 2 4 10
#> 3 6  8
```

]

.pull-right.code100[

.full-width[.content-box-blue.bold[但尽管 `modify()`、`modify2()` 和 `imodify()` 函数名中含有 `modify`，但它们并不会“原地修改”输入向量 `.x`，而只是返回修改后的版本——如果你想永久保留修改，就你必须手动将返回结果赋值给变量]]

```r
df
```

```
#>   x y
#> 1 1 6
#> 2 2 5
#> 3 3 4
```

```r
df <- modify(df, ~ .x * 2)
```

]

---

.full-width[.content-box-blue.bold.font120[不同输出：`modify(.x, .f, ...)` 和 .red[`walk(.x, .f, ...)`]]]

.pull-left.code85[

.full-width[.content-box-blue.bold[`walk()`、`walk2()`、`iwalk()` 和 `pwalk()`：调用函数不是为了函数的返回值，而是函数的“副作用”（如数据存盘）；这些函数都会不可见地返回第 1 个输入项]]

```r
# ggsave(filename, plot=last_plot(), 
#   device = NULL, path = NULL, ...)

tmp <- tempdir()
gs <- mtcars %>% 
  split(.$cyl) %>% 
  map(~ ggplot(., aes(wt, mpg)) + 
        geom_point())
fs <- str_c("cyl-", names(gs), ".pdf")

walk2(fs, gs, ggsave, path = tmp)
list.files(tmp, pattern = "^cyl-")
```

```
#> [1] "cyl-4.pdf" "cyl-6.pdf" "cyl-8.pdf"
```

]

.pull-right.code80[

```r
# 参数位置匹配并使用 ... 来传递参数
# 更推荐的做法是：
walk2(
  fs, gs, 
  \(fs, gs) ggsave(fs, gs, path = tmp)
)
```

]

---

.full-width[.content-box-blue.bold.font110[`map_dfr(.x, .f, ..., .id = NULL)` 和 `map_dfc(.x, .f, ...)`]]

.font110[
- 应用 `dplyr` 包的 `bind_rows()` 或 `bind_cols()` 对 `map(.x, .f, ...)` 输出的向量进行行/列的合并，返回数据框（感兴趣的同学可看下这两个函数的源代码）。
]

.full-width[.content-box-blue.bold.font110[`lmap(.x, .f, ...)`、`lmap_if(.x, .p, .f, ..., .else = NULL)` 和 `lmap_at(.x, .at, .f, ...)`]]

.font110[
- 类似于 `map*()` 函数，但只适用于输入并返回 `list` 或 `data.frame` 的函数 `.f`，也就是说 `.f` 应用于 `.x` 的**列表元素**而非元素（即 `.x[i]`，而非 `.x[[i]]`）。
]

.full-width[.content-box-blue.bold.font110[`map_if(.x, .p, .f, ..., .else = NULL)`、`map_at(.x, .at, .f, ...)` 和 `map_depth(.x, .depth, .f, ..., .ragged = FALSE)`]]

.font110[
- `map_if()` 将 `.f`（`.else`）应用于 `.x` 中断言函数 `.p` 取值为 `TRUE`（`FALSE`）的元素；
- `map_at()` 将 `.f` 应用于 `.x` 中 `.at` 参数（名称或位置向量）所指定的元素；
- `map_depth()` 将 `.f` 应用于嵌套向量 `.x` 中 `.depth` 参数所指定深度的元素；
- `modify()` 和 `lmap()` 也有这类**条件应用**的变体函数。
]

---
layout: true

### >> 3.2 其它 `purrr` 包函数

---

.full-width[.content-box-blue.bold.font110[`reduce(.x, .f, ..., .init, .dir = c("forward", "backward"))` 和 `accumulate(.x, .f, ..., .init, .dir = c("forward", "backward"))`]]

.pull-left.code85[

```r
dfs <- list(
  age = tibble(name = "Jo", age = 30),
  sex = tibble(name = c("Jo", "An"), 
               sex = c("M", "F")),
  trt = tibble(name = "An", 
               treatment = "A")
)
dfs %>% reduce(full_join)
```

```
#> # A tibble: 2 × 4
#>   name    age sex   treatment
#>   <chr> <dbl> <chr> <chr>    
#> 1 Jo       30 M     <NA>     
#> 2 An       NA F     A
```

```r
1:10 %>% accumulate(`+`)
```

```
#>  [1]  1  3  6 10 15 21 28 36 45 55
```
]

.pull-right.font100[

- `reduce()` 和 `accumulate()` 支持的是二元函数，即有两个输入项的函数（及运算符）
- 还有`reduce2()` 和 `accumulate2()`函数

]

---

.full-width[.content-box-blue.bold.font120[支持断言函数（_predicate functions_）的泛函]]

.pull-left.code85[

```r
# keep(.x, .p, ...) | discard()
iris %>% 
  keep(is.numeric) %>% 
  str(vec.len = 1)
```

```
#> 'data.frame':	150 obs. of  4 variables:
#>  $ Sepal.Length: num  5.1 4.9 ...
#>  $ Sepal.Width : num  3.5 3 ...
#>  $ Petal.Length: num  1.4 1.4 ...
#>  $ Petal.Width : num  0.2 0.2 ...
```

```r
# compact(.x, .p = identity)
# remove all NULLs
```

```r
# every(.x, .p, ...) | some()
list(1:5, letters) %>% 
  some(is_character)
```

```
#> [1] TRUE
```
]
--

.pull-right.code85[

```r
set.seed(1234)

(x <- sample(9))
```

```
#> [1] 6 5 4 1 8 2 7 9 3
```

```r
# detect(.x, .f, ..., 
#   .dir = c("forward", "backward"), 
#   .right = NULL,.default = NULL)
# detect_index()
x %>% detect(~ . > 2)
```

```
#> [1] 6
```

```r
# head_while(.x, .p, ...)|tail_while()
x %>% head_while(~ . > 2)
```

```
#> [1] 6 5 4
```
]

---

.full-width[.content-box-blue.bold.font110[`safely(.f, otherwise = NULL, quiet = TRUE)`、`quietly()` 和 `possibly()`]]

- 这些函数就是所谓的函数运算符（_function operator_）——以函数作为参数输入，并返回函数。
- `safely()` 会调整函数的行为（副词），让调整后的函数不再抛出错误，而是返回包含两个元素 `result` 和 `error` 的列表；通过 `otherwise` 参数设定错误时默认值，`possibly()` 总是成功；而 `quietly()` 则会捕捉命令的结果、输出、警告和消息。

.pull-left.code70[

```r
x <- list(1, 10, "a")
y <- x %>% map(safely(log)); str(y)
```

```
#> List of 3
#>  $ :List of 2
#>   ..$ result: num 0
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num 2.3
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .Primitive("log")(x, base)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
```

]

.pull-right.code70[

```r
y <- transpose(y); str(y, give.attr = FALSE)
```

```
#> List of 2
#>  $ result:List of 3
#>   ..$ : num 0
#>   ..$ : num 2.3
#>   ..$ : NULL
#>  $ error :List of 3
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .Primitive("log")(x, base)
```

```r
is_ok <- y$error %>% map_lgl(is_null)
y$result[is_ok] %>% flatten_dbl()
```

```
#> [1] 0.0 2.3
```
]

---

.full-width[.content-box-blue.bold.font120[`pluck(.x, ..., .default = NULL)` 和 `chuck(.x, ...)`：用名称或位置列表来选择 `.x` 中的一个元素或其属性（+ `attr_getter()`）]]

.full-width[.content-box-blue.bold.font120[`flatten(.x)`、`flatten_*(.x)` 和 `transpose(.l, .names = NULL)`：改变列表的形状]]

.full-width[.content-box-blue.bold.font120[`append(x, values, after = length(x))`、`prepend(x, values, before = NULL)` 和 `splice(...)`：将新元素 `values` 和列表 `x` 合并]]

.full-width[.content-box-blue.bold.font120[`list_modify(.x, ...)` 和 `list_merge(.x, ...)`：根据 `...` 的取值修改/合并 `.x`]]

.full-width[.content-box-blue.bold.font120[`set_names(x, nm = x, ...)`：以更灵活的方式设置列表 `x` 的元素名]]

---

.full-width[.content-box-blue.bold.font120[`as_vector(.x, .type = NULL)`、`simplify(.x, .type = NULL)` 和 `simplify_all(.x, .type = NULL)`：将列表 `.x` 转化为向量]]

.full-width[.content-box-blue.bold.font120[`array_branch(array, margin = NULL)` 和 `array_branch(array, margin = NULL)`：将数组转化为列表]]

.full-width[.content-box-blue.bold.font120[`has_element(.x, .y)`：列表 `.x` 是否包含元素 `.y`]]

.full-width[.content-box-blue.bold.font120[`vec_depth(x)`：计算向量 `x` 的深度]]

.full-width[.content-box-blue.bold.font120[`cross(.l, .filter = NULL)` 和 `cross*()`：生成列表元素的组合]]

.full-width[.content-box-blue.bold.font120[`auto_browse()`、`insistently()`、`slowly()`、`compose()`、`partial()`、`lift*()`、`rerun()`、`negate()`：_函数运算符_，修改函数的行为]]

---
layout: false
class: inverse, center, middle

# 4. `purrr` 包 与 列表列

.font150[( `purrr` _and list columns_ )]

---

.bold.font120[

- 将结果存入 .font110[列表列] 中，可直接提取结果并加以分析（而无需重新计算结果）

- 你已经知道如何操作数据表，你可以将你掌握的知识/流程/工具直接应用于由结果构成的表格

]

---

### >> 4.1 步骤1：生成列表列（list column）

.pull-left.code95[

```r
gapminder::gapminder
```

```
#> # A tibble: 1,704 × 6
#>   country     conti…¹  year lifeExp
#>   <fct>       <fct>   <int>   <dbl>
#> 1 Afghanistan Asia     1952    28.8
#> 2 Afghanistan Asia     1957    30.3
#> 3 Afghanistan Asia     1962    32.0
#> 4 Afghanistan Asia     1967    34.0
#> 5 Afghanistan Asia     1972    36.1
#> 6 Afghanistan Asia     1977    38.4
#> # … with 1,698 more rows, 2 more
#> #   variables: pop <int>,
#> #   gdpPercap <dbl>, and
#> #   abbreviated variable name
#> #   ¹continent
```

]

.pull-right.code90[

```r
by_cnty <- gapminder::gapminder %>% 
* tidyr::nest(
*   data = -c(country, continent))
by_cnty
```

```
*#> # A tibble: 142 × 3
#>   country     continent data             
*#>   <fct>       <fct>     <list>           
#> 1 Afghanistan Asia      <tibble [12 × 4]>
#> 2 Albania     Europe    <tibble [12 × 4]>
#> 3 Algeria     Africa    <tibble [12 × 4]>
#> # … with 139 more rows
```

]

---

### >> 4.2 步骤2：处理列表列

.pull-left.code90[

```r
# 将线性回归模型lm应用于data列的每个元素，
# 回归结果（列表）存为新的列表列model
by_cnty <- by_cnty %>% 
* mutate(
*   model = map(
      data, 
      ~ lm(lifeExp ~ year, 
           data = .x)
    )
  )
by_cnty
```

```
#> # A tibble: 142 × 4
#>   country     continent data              model 
#>   <fct>       <fct>     <list>            <list>
#> 1 Afghanistan Asia      <tibble [12 × 4]> <lm>  
#> 2 Albania     Europe    <tibble [12 × 4]> <lm>  
#> 3 Algeria     Africa    <tibble [12 × 4]> <lm>  
#> 4 Angola      Africa    <tibble [12 × 4]> <lm>  
#> 5 Argentina   Americas  <tibble [12 × 4]> <lm>  
#> 6 Australia   Oceania   <tibble [12 × 4]> <lm>  
#> # … with 136 more rows
```

]

.pull-right.code100[

```r
# 提取model列表列的第1个元素
by_cnty %>% pluck("model", 1)
# by_cnty$model[[1]]
```

```
#> 
#> Call:
#> lm(formula = lifeExp ~ year, data = .x)
#> 
#> Coefficients:
#> (Intercept)         year  
#>    -507.534        0.275
```

]

---

### >> 4.3 步骤3：简化列表列

.pull-left.code95[

```r
# 还是用mutate + map_*提取信息
by_cnty %>% 
  mutate(
    coef_year = map_dbl(
      model, ~ coef(.x)[["year"]]
    )
  ) %>% 
  select(-data, -model)
```

```
#> # A tibble: 142 × 3
#>   country     continent coef_year
#>   <fct>       <fct>         <dbl>
#> 1 Afghanistan Asia          0.275
#> 2 Albania     Europe        0.335
#> 3 Algeria     Africa        0.569
#> 4 Angola      Africa        0.209
#> 5 Argentina   Americas      0.232
#> 6 Australia   Oceania       0.228
#> # … with 136 more rows
```

]

.pull-right.code90[

```r
# 使用broom包，更强大、也更方便
# glance() | tidy() | augment()
by_cnty %>%
* mutate(
    res = map(model, 
*             broom::glance)) %>%
* tidyr::unnest(res) %>%
  select(-c(data, model))
```

```
#> # A tibble: 142 × 14
#>   country  conti…¹ r.squ…² adj.r…³ sigma stati…⁴  p.value    df logLik   AIC   BIC devia…⁵
#>   <fct>    <fct>     <dbl>   <dbl> <dbl>   <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 Afghani… Asia      0.948   0.942 1.22    181.  9.84e- 8     1 -18.3  42.7  44.1   15.0  
#> 2 Albania  Europe    0.911   0.902 1.98    102.  1.46e- 6     1 -24.1  54.3  55.8   39.3  
#> 3 Algeria  Africa    0.985   0.984 1.32    662.  1.81e-10     1 -19.3  44.6  46.0   17.5  
#> 4 Angola   Africa    0.888   0.877 1.41     79.1 4.59e- 6     1 -20.0  46.1  47.5   19.8  
#> 5 Argenti… Americ…   0.996   0.995 0.292  2246.  4.22e-13     1  -1.17  8.35  9.80   0.854
#> 6 Austral… Oceania   0.980   0.978 0.621   481.  8.67e-10     1 -10.2  26.4  27.9    3.85 
#> # … with 136 more rows, 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> #   variable names ¹continent, ²r.squared, ³adj.r.squared, ⁴statistic, ⁵deviance
```

]

---

### >> 4.4 [进一步的（探索性）分析]

.code100[

```r
by_cnty %>% 
  mutate(res = map(model, broom::glance)) %>% 
  unnest(res) %>% 
* ggplot(aes(continent, r.squared, colour = continent)) +
*   geom_jitter(width = 0.3) +
*   theme(legend.position = "none")
```

]

---
layout: false
class: inverse, center, middle
background-image: url(imgs/sxc.png), url(imgs/purrr-furrr.png)
background-size: 100%, 40%
background-position: 0% 100%, 50% 45%

# &nbsp;&nbsp;&nbsp; .font50[( `purrr` -> `furrr` <sup>.font60[v0.3.1]</sup> )]

---

### >> `furrr`: _Apply Mapping Functions in Parallel using Futures_

.pull-left.code85[

```r
library(tictoc) # for timing R scripts

by_cnty <- gapminder::gapminder %>% 
  tidyr::nest(
    data = -c(country, continent))

slow_lm <- function(...) {
  Sys.sleep(0.1)
  lm(...)
}

tic()
by_cnty %>% 
  mutate(
    model = map(
      data, 
      ~ slow_lm(lifeExp ~ year, 
                data = .x))
  ) -> gc1
toc()
```

```
#> 15.55 sec elapsed
```
]

.pull-right.code90[

```r
*library(furrr)

*plan(multisession, workers = 4)

tic()
by_cnty %>% 
  mutate(
*   model = future_map(
      data, 
      ~ slow_lm(lifeExp ~ year, 
                data = .x)
    )
  ) -> gc2
toc()
```

```
#> 8.69 sec elapsed
```

```r
identical(gc1, gc2)
```

```
#> [1] FALSE
```
]

---
layout: false
class: inverse, center, middle

# 课后作业

---
class: middle

### 课后作业

.font120[

1\. 根据第七讲课程讲义的打印稿，在 📑 _Rmd_中键入并完成代码的运行

2\. **复习** 📖 [_R for Data Science_](https://r4ds.had.co.nz/) 一书的第21章和第25章的内容（为中文翻译版的第16章和第19章，其中第19章被大量删减），并（结队）**完成（自选）课后练习**。

3\. 下载（打印） 📰 [**{{`purrr` 包的cheatsheet}}**](https://posit.co/wp-content/uploads/2022/10/purrr.pdf) 并阅读之

4\. **完成第七讲的课后练习** 👩‍💻

> .code90[

```r
remotes::install_github("qfwr2022/qfwr")
library(qfwr)
qfwr_ex("L07")
```
]

5\. 抽出时间来**自学** 📖 [_R for Data Science_](https://r4ds.had.co.nz/) 一书关于 [字符串](https://r4ds.had.co.nz/strings.html)、[因子](https://r4ds.had.co.nz/factors.html) 和 [日期-时间](https://r4ds.had.co.nz/dates-and-times.html) 等三种重要数据类型以及重要数据处理函数的内容（为中文翻译版的第10章、第11章和第12章）。我也会在课程平台上挂出对应的 [{{课件}}](https://qfwr2022.netlify.app/weeks/week-x.html) 供同学们参考。

]

---
class: center middle
background-image: url(imgs/xaringan.png)
background-size: 12%
background-position: 50% 40%

<br><br><br><br><br><br><br>
<hr color='#f00' size='2px' width='80%'>
<br>
.Large.red[_**本网页版讲义的制作由 R 包 [{{`xaringan`}}](https://github.com/yihui/xaringan) 赋能！**_]