class: center, middle, inverse, title-slide .title[ # 量化金融与金融编程 ] .subtitle[ ##
L11 制表 ] .author[ ###
曾永艺 ] .institute[ ### 厦门大学管理学院 ] .date[ ###
2022-12-05 ] --- class: middle, hide_logo background-image: url(imgs/tabulation.png) background-size: 15em background-position: 90% 50%
-- - ### .font120[
] .bold[制表难题] -- - ### .bold[`gt` 包] - .bold.font110[_The Grammar of Tabulation_] - .bold.font110[创建 `gt` 表格对象] - .bold.font110[增加 / 修改 / 移除 表格组件] - .bold.font110[操作表格的 行 / 列] - .bold.font110[数据替换、数据格式化、数据变图] -- - ### .bold[`modelsummary` 包] - .bold.font110[`datasummary()` & `datasummary_*()`] - .bold.font110[`modelsummary()` & `modelplot()`] -- - ### .bold[`kableExtra` 包 and beyond ...] --- class: middle .pull-left[ .font500.red[难题] ] -- .pull-right.font200[ - 各式各样的表格 - 便捷 vs. 美观 - 可复现 vs. 交互式 - 多种格式 / 媒介输出 - ... ] --- class: inverse, center, middle background-image: url(imgs/logo-gt.svg), url(imgs/sxc.png) background-size: 10%, 100% background-position: 30% 40%, 0% 100% # 1. [`gt`](https://gt.rstudio.com/) <sup>.font60[v0.8.0]</sup> .font120[(Easily Create Presentation-Ready Display Tables)] --- layout: true ### .bold[1.1 The .red[G]rammar of .red[T]abulation] --- .font150[表格组件] <img src="imgs/gt_parts_of_a_table.svg" width="75%" style="display: block; margin: auto;" /> --- .font150[制表流程] <br> <img src="imgs/gt_workflow_diagram.svg" width="100%" style="display: block; margin: auto;" /> --- layout: true ### .bold[1.2 .red[创建] gt 表格对象] --- ```r library(tidyverse) library(gt) ``` -- ```r gtcars ``` ``` #> # A tibble: 47 × 15 #> mfr model year trim bdy_s…¹ hp hp_rpm trq trq_rpm mpg_c mpg_h drive…² trsmn #> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> #> 1 Ford GT 2017 Base… coupe 647 6250 550 5900 11 18 rwd 7a #> 2 Ferrari 458 Sp… 2015 Base… coupe 597 9000 398 6000 13 17 rwd 7a #> 3 Ferrari 458 Sp… 2015 Base conver… 562 9000 398 6000 13 17 rwd 7a #> 4 Ferrari 458 It… 2014 Base… coupe 562 9000 398 6000 13 17 rwd 7a #> 5 Ferrari 488 GTB 2016 Base… coupe 661 8000 561 3000 15 22 rwd 7a #> 6 Ferrari Califo… 2015 Base… conver… 553 7500 557 4750 16 23 rwd 7a #> # … with 41 more rows, 2 more variables: ctry_origin <chr>, msrp <dbl>, and abbreviated #> # variable names ¹bdy_style, ²drivetrain ``` -- ```r ?gtcars # 查看gt包内置数据集gtcars的帮助文档,或 help(gtcars) ``` -- ```r gtcars %>% select(mfr, model, year, hp, trq, trsmn, msrp) %>% slice(2:3, 42:43, 38:39, 46:47) -> gtcars_sm ``` --- .pull-left.code85[ ```r gtcars_sm %>% * gt(rowname_col = "model", * groupname_col = "mfr") -> tbl_1 # gt( # data, # rowname_col = "rowname", # groupname_col = # dplyr::group_vars(data), # process_md = FALSE, # caption = NULL, # rownames_to_stub = FALSE, # auto_align = TRUE, # id = NULL, # locale = NULL, # row_group.sep = # getOption("gt.row_group.sep", # " - ") # ) # class(tbl_1) # 为 "gt_tbl" "list" # View(tbl_1) # 查看数据结构 ``` ] -- .pull-right.code90[ ```r tbl_1 # print to display ```
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- layout: true ### .bold[1.3 .red[增加 / 修改] / 移除 表格组件] --- .pull-left[ .code85[ ```r tbl_1 %>% * tab_header( * title = * md("**Deluxe** automobiles"), * subtitle = * md("`gtcars` is an R dataset") * ) # opt_align_table_header( # data, # a <gt_tbl> # align = c("left","center","right")) ``` ] .code85[ ```r # 11个用于设置表格组件格式的便捷选项函数: *ls("package:gt") %>% str_subset("^opt") ``` ``` [1] "opt_align_table_header" "opt_all_caps" [3] "opt_css" "opt_footnote_marks" [5] "opt_horizontal_padding" "opt_row_striping" [7] "opt_stylize" "opt_table_font" [9] "opt_table_lines" "opt_table_outline" [11] "opt_vertical_padding" ``` ] ] .pull-right[
Deluxe
automobiles
gtcars
is an R dataset
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- .pull-left.code80[ ```r tbl_1 %>% tab_header( title = md("**Deluxe** automobiles"), subtitle = md("`gtcars` is an R dataset") ) %>% * tab_spanner( label = "performance", columns = c(hp, trq, trsmn) ) # tab_spanner(data, label, columns, # spanners = NULL, level = NULL, # id = label, gather = TRUE, # replace = FALSE) # # ... via delimited names *# tab_spanner_delim( # data, *# delim, # columns = everything(), # split = c("last", "first") # ) ``` ] .pull-right[
Deluxe
automobiles
gtcars
is an R dataset
year
performance
msrp
hp
trq
trsmn
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- .pull-left.code90[ ```r gtcars_sm %>% gt(rowname_col = "model") %>% * tab_stubhead( label = "MODEL" ) %>% * tab_row_group( label = md("**SuperPowerful**"), * rows = hp > 500 ) %>% * tab_options( row_group.default_label = "Powerful", row_group.font.weight = "bold" ) # rows: Can either be a vector of # of row captions provided in c(), # a vector of row indices, or a # helper function focused on # selections. ``` ] .pull-right[
MODEL
mfr
year
hp
trq
trsmn
msrp
SuperPowerful
458 Speciale
Ferrari
2015
597
398
7a
291744
458 Spider
Ferrari
2015
562
398
7a
263553
AMG GT
Mercedes-Benz
2016
503
479
7a
129900
Dawn
Rolls-Royce
2016
563
575
8a
335000
Wraith
Rolls-Royce
2016
624
590
8a
304350
Powerful
718 Cayman
Porsche
2017
300
280
6m
53900
911
Porsche
2016
350
287
7m
84300
SL-Class
Mercedes-Benz
2016
329
354
7am
85050
] --- .pull-left[ .code90[ ```r tbl_1 %>% * tab_footnote( # 可用管道连接多个 footnote = "厂商建议零售价", * locations = cells_column_labels( columns = msrp ) ) %>% * opt_footnote_marks( marks = "numbers" ) ``` ] .code85[ ``` # 14个用于定位的辅助函数(cells_group()已弃用): ``` ``` [1] "cells_body" "cells_column_labels" [3] "cells_column_spanners" "cells_footnotes" [5] "cells_grand_summary" "cells_group" [7] "cells_row_groups" "cells_source_notes" [9] "cells_stub" "cells_stub_grand_summary" [11] "cells_stub_summary" "cells_stubhead" [13] "cells_summary" "cells_title" ``` ] ] .pull-right[
year
hp
trq
trsmn
msrp
1
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
1
厂商建议零售价
] --- .pull-left.code80[ ```r tbl_1 %>% tab_header( title = "Deluxe automobiles" ) %>% * tab_style( * style = list( * cell_borders(weight=px(3)), * cell_text(style = "italic") * ), * locations = cells_body( columns = hp, * rows = hp > median(hp) ) ) %>% * tab_style_body( # new in v0.8.0 * style = cell_fill(color = "#F9E3D6"), * columns = where(is.numeric), # rows = everything(), # values = NULL, pattern = NULL, * fn = \(x) x >= 200000, # targets = c("cell","row","column"), # extents = c("body", "stub") ) ``` ] .pull-right[
Deluxe automobiles
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- .pull-left.code85.font90[ ```r tbl_1 %>% tab_header( title = md("**Deluxe** automobiles"), subtitle = md("`gtcars` is an R dataset") ) %>% * tab_options( # 166 opts in total! * table.background.color = * "lightcyan", * table.width = pct(100), * heading.align = "left", * heading.subtitle.font.size = * "100%", * column_labels.text_transform = * "uppercase", * row_group.background.color = * "lightblue", * data_row.padding = px(5) * ) ``` ] .pull-right[
Deluxe
automobiles
gtcars
is an R dataset
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] -- .code85[ ``` 4个未提及的 tab_*() 函数: ``` ``` [1] "tab_caption" "tab_info" "tab_source_note" "tab_stub_indent" ``` ] --- layout: true ### .bold[1.3 增加 / 修改 / .red[移除] 表格组件] --- .pull-left.code.font90[ ```r # 移除表格组件函数,New in gt v0.8.0 tbl_1 %>% tab_header( title = md("**Deluxe** automobiles"), subtitle = md("`gtcars` is an R dataset") ) %>% tab_footnote( # 可用管道连接多个 footnote = "厂商建议零售价", locations = cells_column_labels( columns = msrp ) ) %>% rm_header() %>% rm_footnotes() ``` .code90[ ``` # 6个移除表格组件的函数: ``` ``` [1] "rm_caption" "rm_footnotes" [3] "rm_header" "rm_source_notes" [5] "rm_spanners" "rm_stubhead" ``` ] ] .pull-right[
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- layout: true ### .bold[1.4 操作表格的.red[列] / 行] --- .pull-left.code85[ ```r tbl_1 %>% tab_header( title = md("**Deluxe** Autos") ) %>% * cols_merge( columns = c(hp, trq, trsmn), hide_columns = c(trq, trsmn), pattern = "{1}-{2}-{3}" ) %>% * cols_align( align = "left", columns = hp ) %>% * cols_label(hp = md("_hp-trq-trsmn_")) ``` ``` # 13个操作表格列的函数: ``` ``` [1] "cols_align" "cols_align_decimal" [3] "cols_hide" "cols_label" [5] "cols_merge" "cols_merge_n_pct" [7] "cols_merge_range" "cols_merge_uncert" [9] "cols_move" "cols_move_to_end" [11] "cols_move_to_start" "cols_unhide" [13] "cols_width" ``` ] .pull-right[
Deluxe
Autos
year
hp-trq-trsmn
msrp
Ferrari
458 Speciale
2015
597-398-7a
291744
458 Spider
2015
562-398-7a
263553
Porsche
718 Cayman
2017
300-280-6m
53900
911
2016
350-287-7m
84300
Mercedes-Benz
AMG GT
2016
503-479-7a
129900
SL-Class
2016
329-354-7am
85050
Rolls-Royce
Dawn
2016
563-575-8a
335000
Wraith
2016
624-590-8a
304350
] --- layout: true ### .bold[1.4 操作表格的列 / .red[行]] --- .pull-left.code85[ ```r tbl_1 %>% tab_header( title = md("**Deluxe** Autos") ) %>% * row_group_order( groups = c("Ferrari", "Mercedes-Benz")) %>% * summary_rows( groups = c("Ferrari", "Porsche"), columns = c(hp, trq, msrp), fns = list('均价' = ~ mean(.)) ) %>% * grand_summary_rows( columns = c(hp, trq, msrp), * fns = list( * '最低价' = ~ min(.), * '最高价' = ~ max(.)), * formatter = fmt_number, * decimals = 0, use_seps = FALSE ) %>% tab_options( row_group.as_column = TRUE ) ``` ] .pull-right[
Deluxe
Autos
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
均价
—
579.50
398.00
—
277,648.50
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
均价
—
325.00
283.50
—
69,100.00
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
最低价
—
300
280
—
53900
最高价
—
624
590
—
335000
] --- layout: true ### .bold[1.5 .red[数据替换]、数据格式化、数据变图] --- .pull-left[ .code90[ ```r # 数据替换函数,New in gt v0.6.0 tbl_1 %>% tab_header( title = md("**Deluxe** automobiles") ) %>% * sub_values( # Add in gt v0.8.0 * columns = msrp, values = NULL, pattern = NULL, * fn = function(x) x >= 300000, * replacement = "≥300000" ) ``` ] .code90[ ``` # 5个数据替换函数: ``` ``` [1] "sub_large_vals" "sub_missing" [3] "sub_small_vals" "sub_values" [5] "sub_zero" ``` ] ] .pull-right[
Deluxe
automobiles
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
≥300000
Wraith
2016
624
590
8a
≥300000
] --- layout: true ### .bold[1.5 数据替换、.red[数据格式化]、数据变图] --- .pull-left[ .code85[ ```r tbl_1 %>% tab_header( title = md("**Deluxe** automobiles") ) %>% * fmt_currency( * columns = msrp, decimals = 0, * currency = "CNY", scale_by = 6.7 * ) ``` ] .code90[ ``` # 18个数据格式化函数: ``` ``` [1] "fmt" "fmt_bytes" [3] "fmt_currency" "fmt_date" [5] "fmt_datetime" "fmt_duration" [7] "fmt_engineering" "fmt_fraction" [9] "fmt_integer" "fmt_markdown" [11] "fmt_missing" "fmt_number" [13] "fmt_partsper" "fmt_passthrough" [15] "fmt_percent" "fmt_roman" [17] "fmt_scientific" "fmt_time" ``` ] ] .pull-right[
Deluxe
automobiles
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
¥1,954,685
458 Spider
2015
562
398
7a
¥1,765,805
Porsche
718 Cayman
2017
300
280
6m
¥361,130
911
2016
350
287
7m
¥564,810
Mercedes-Benz
AMG GT
2016
503
479
7a
¥870,330
SL-Class
2016
329
354
7am
¥569,835
Rolls-Royce
Dawn
2016
563
575
8a
¥2,244,500
Wraith
2016
624
590
8a
¥2,039,145
] --- .pull-left[ .code85[ ```r tbl_1 %>% * data_color( columns = msrp, * colors = scales::col_numeric( * palette = paletteer::paletteer_d( palette = "ggsci::red_material" ) %>% as.character(), domain = NULL ), # alpha, # apply_to = c("fill","text"), # autocolor_text = TRUE, # contrast_algo = c("apca","wcag") ) ``` ] .code90[ ``` # 6个信息备查函数: ``` ``` [1] "info_currencies" "info_date_style" [3] "info_google_fonts" "info_locales" [5] "info_paletteer" "info_time_style" ``` ] ] .pull-right[
year
hp
trq
trsmn
msrp
Ferrari
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
Porsche
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
Mercedes-Benz
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Rolls-Royce
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] --- layout: true ### .bold[1.5 数据替换、数据格式化、.red[数据变图]] --- .pull-left.code90[ ```r gtcars_sm %>% gt(rowname_col = "mfr") %>% tab_header( title = md("**Deluxe** Autos") ) %>% * text_transform( # 大杀器! locations = cells_stub(), * fn = function(x) { * local_image( * glue::glue("imgs/{x}.png"), * height = 20 * ) * } ) %>% tab_options( table.font.size = "80%" ) ``` ] .pull-right[
Deluxe
Autos
model
year
hp
trq
trsmn
msrp
458 Speciale
2015
597
398
7a
291744
458 Spider
2015
562
398
7a
263553
718 Cayman
2017
300
280
6m
53900
911
2016
350
287
7m
84300
AMG GT
2016
503
479
7a
129900
SL-Class
2016
329
354
7am
85050
Dawn
2016
563
575
8a
335000
Wraith
2016
624
590
8a
304350
] -- .code100[ ``` # 3个在表格中插入图片的辅助函数: ``` ``` [1] "ggplot_image" "local_image" "web_image" ``` ] --- layout: false class: inverse, center, middle background-image: url(imgs/logo-modelsummary.png), url(imgs/sxc.png) background-size: 10%, 100% background-position: 17% 40%, 0% 100% # 2. [`modelsummary`](https://vincentarelbundock.github.io/modelsummary/) <sup>.font60[v1.2.0]</sup> .font120[(**Summary** Tables and Plots for _Statistical Models_ and _Data_)] --- layout: true ### .bold[2.1 .red[`datasummary()`] & `datasummary_*()`] --- ```r library(modelsummary) library(palmerpenguins) set.seed(123) penguins %>% rename( # 原变量名太长,太占宝贵的幻灯片空间,重命名下 b_len = bill_length_mm, b_dep = bill_depth_mm, f_len = flipper_length_mm, b_mss = body_mass_g ) %>% arrange(runif(n = n())) %>% print() -> pgs # 短的数据集名 ``` ``` #> # A tibble: 344 × 8 #> species island b_len b_dep f_len b_mss sex year #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> #> 1 Adelie Torgersen 45.8 18.9 197 4150 male 2008 #> 2 Gentoo Biscoe NA NA NA NA <NA> 2009 #> 3 Chinstrap Dream 54.2 20.8 201 4300 male 2008 #> 4 Chinstrap Dream 52 19 197 4150 male 2007 #> 5 Adelie Dream 32.1 15.5 188 3050 female 2009 #> 6 Gentoo Biscoe 44.5 15.7 217 4875 <NA> 2009 #> 7 Adelie Dream 36.4 17 195 3325 female 2007 #> 8 Chinstrap Dream 49.8 17.3 198 3675 female 2009 #> 9 Chinstrap Dream 51.4 19 201 3950 male 2009 #> 10 Adelie Torgersen 42.5 20.7 197 4500 male 2007 #> # … with 334 more rows ``` --- .font120[`datasummary()` 基于 `tables` 包提供的公式接口来创建汇总统计表并进行改进(如一致的函数参数设置、特定用途的便捷函数、可将表格导出为多种格式)] ```r datasummary( formula, # 1. 双侧公式:rows ~ columns data, # 2. 数据框或tibble output = "default", # 3. 表格类型如:"default", "html", "markdown", "latex", # "latex_tabular", "data.frame", "modelsummary_list", # "gt", "kableExtra", "huxtable", "flextable", "DT", "jupyter" # 也可直接输出为.docx, .html, .tex, .md, .txt, .png, .jpg文件 fmt = 2, # 4. 整数,format(round(x, fmt), nsmall=fmt); # 传递给sprintf()的字符参数,如"%.3f"; # 返回格式化字符串的函数 # NULL,允许用户在 glue 格式的字符串中使用函数 title = NULL, notes = NULL, align = NULL, # 5. 按列给出对齐方式,如"lcrd" (d为小数点对齐) add_columns = NULL, # 6. 和主表格相同行数的数据框或tibble add_rows = NULL, # 7. 和主表格相同列数的数据框或tibble sparse_header = TRUE, # 8. 如果全部列有着相同的标签,是否删除这一列标签 escape = TRUE, # 8. 是否需要转义某些 LaTeX/HTML 特殊字符 ... # 9. 传递给底层制表函数(如kableExtra::kbl())的参数 ) ``` --- .pull-left[ ```r # 定义个便捷函数,方便调用,节省空间 ds <- function(...) { datasummary(data = pgs, ..., output = "gt") %>% gt::tab_options(table.font.size='70%') } ds(f_len ~ Mean) # modelsummary包内置的数据汇总统计量函数, # 如:Mean, SD, Min, Max, Median, P25 ... ```
Mean
f_len
200.92
```r # 公式中也可使用自定义的函数 MinMax <- function(x) { glue::glue('[{min(x, na.rm = TRUE)}, {max(x, na.rm = TRUE)}]') } ds(f_len ~ MinMax) ```
MinMax
f_len
[172, 231]
] -- .pull-right[ ```r # 用 + 连接多个变量 / 多个汇总统计量函数 ds(f_len + b_mss ~ Mean + SD) ```
Mean
SD
f_len
200.92
14.06
b_mss
4201.75
801.95
```r # 用 \* 表示套嵌 \# \* 左侧或右侧的变量必须为因子变量、逻辑变量 \# 或字符变量 ds(f_len + b_mss ~ sex \* (Mean + SD)) ```
female
male
Mean
SD
Mean
SD
f_len
197.36
12.50
204.51
14.55
b_mss
3862.27
666.17
4545.68
787.63
] --- .pull-left[ ```r # 在此需要用 Factor() 将整型变量转化为因子变量 ds(f_len ~ Factor(year) * (Mean + SD)) ```
2007
2008
2009
Mean
SD
Mean
SD
Mean
SD
f_len
196.88
13.92
202.80
13.90
202.81
13.68
```r # 用 = 来重命名变量名或汇总统计量名 ds(island * (`FlipperLen(mm)` = f_len) ~ (Mean + (Std.Dev = SD))) ```
island
Mean
Std.Dev
Biscoe
FlipperLen(mm)
209.71
14.14
Dream
FlipperLen(mm)
193.07
7.51
Torgersen
FlipperLen(mm)
191.20
6.23
] -- .pull-right[ ```r # 用 * 给分析函数附加额外处理,如增加参数 ... ds(f_len ~ * (mean + sd) * Arguments(na.rm=TRUE)) ```
mean
sd
f_len
200.92
14.06
```r # ... 去除空白行 / 空白列 ds(island \* species \* b_mss ~ (Mean + SD) \* DropEmpty()) ```
island
species
Mean
SD
Biscoe
Adelie
b_mss
3709.66
487.73
Gentoo
b_mss
5076.02
504.12
Dream
Adelie
b_mss
3688.39
455.15
Chinstrap
b_mss
3733.09
384.34
Torgersen
Adelie
b_mss
3706.37
445.11
] --- .pull-left[ ```r # All():选择全部数值型(字符型...)变量 ds( * All(pgs %>% select(-year)) ~ * Mean + SD + Histogram ) ```
Mean
SD
Histogram
b_len
43.92
5.46
▁▅▆▆▆▇▇▂▁
b_dep
17.15
1.97
▃▄▄▄▇▆▇▅▂▁
f_len
200.92
14.06
▂▅▇▄▁▄▄▂▁
b_mss
4201.75
801.95
▁▄▇▅▄▄▃▃▂▁
] -- .pull-right[ ```r # 频数分布表 ds( species * sex + * Heading("ALL", nearData=FALSE) * 1 ~ * N + Percent() ) ```
species
sex
N
Percent
Adelie
female
73
21.22
male
73
21.22
Chinstrap
female
34
9.88
male
34
9.88
Gentoo
female
58
16.86
male
61
17.73
ALL
344
100.00
] --- layout: true ### .bold[2.1 `datasummary()` & .red[`datasummary_*()`]] --- ```r # datasummary_skim( # inspired by the `skimr` package for R # data, type = "numeric", output = "default", fmt = "%.1f", histogram = TRUE, # title = NULL, notes = NULL, align = NULL, escape = TRUE, ... # ) datasummary_skim(penguins, fmt = "%.2f", output = 'gt') %>% # 利用 gt 包中的相关函数对 <gt_tbl> 表格进行后续操作,如格式化 tab_style(list(cell_fill(color = "#F9E3D6"), cell_text(weight = "bold")), locations = cells_column_labels()) %>% tab_style(cell_text(weight = "bold"), locations = cells_body(columns = 1)) %>% tab_options(table.width = '90%', table.font.size = '85%') ```
Unique (#)
Missing (%)
Mean
SD
Min
Median
Max
bill_length_mm
165
1
43.92
5.46
32.10
44.45
59.60
bill_depth_mm
81
1
17.15
1.97
13.10
17.30
21.50
flipper_length_mm
56
1
200.92
14.06
172.00
197.00
231.00
body_mass_g
95
1
4201.75
801.95
2700.00
4050.00
6300.00
year
3
0
2008.03
0.82
2007.00
2008.00
2009.00
--- .pull-left[ ```r datasummary_skim( * penguins, type = "categorical" ) # html的默认输出为 kableExtra 格式的表格 ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> <th style="text-align:right;"> N </th> <th style="text-align:right;"> % </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> species </td> <td style="text-align:left;"> Adelie </td> <td style="text-align:right;"> 152 </td> <td style="text-align:right;"> 44.2 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Chinstrap </td> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> 19.8 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Gentoo </td> <td style="text-align:right;"> 124 </td> <td style="text-align:right;"> 36.0 </td> </tr> <tr> <td style="text-align:left;"> island </td> <td style="text-align:left;"> Biscoe </td> <td style="text-align:right;"> 168 </td> <td style="text-align:right;"> 48.8 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Dream </td> <td style="text-align:right;"> 124 </td> <td style="text-align:right;"> 36.0 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> Torgersen </td> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> 15.1 </td> </tr> <tr> <td style="text-align:left;"> sex </td> <td style="text-align:left;"> female </td> <td style="text-align:right;"> 165 </td> <td style="text-align:right;"> 48.0 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> male </td> <td style="text-align:right;"> 168 </td> <td style="text-align:right;"> 48.8 </td> </tr> </tbody> </table> ] -- .pull-right[ ```r # datasummary_correlation( # data, output = "default", # method = "pearson", fmt = 2, # align = NULL, add_rows = NULL, # add_columns = NULL, # title = NULL, notes = NULL, # escape = TRUE, ... # ) datasummary_correlation(pgs) ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> b_len </th> <th style="text-align:right;"> b_dep </th> <th style="text-align:right;"> f_len </th> <th style="text-align:right;"> b_mss </th> <th style="text-align:right;"> year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> b_len </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> </tr> <tr> <td style="text-align:left;"> b_dep </td> <td style="text-align:right;"> -.24 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> </tr> <tr> <td style="text-align:left;"> f_len </td> <td style="text-align:right;"> .66 </td> <td style="text-align:right;"> -.58 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> . </td> <td style="text-align:right;"> . </td> </tr> <tr> <td style="text-align:left;"> b_mss </td> <td style="text-align:right;"> .60 </td> <td style="text-align:right;"> -.47 </td> <td style="text-align:right;"> .87 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> . </td> </tr> <tr> <td style="text-align:left;"> year </td> <td style="text-align:right;"> .05 </td> <td style="text-align:right;"> -.06 </td> <td style="text-align:right;"> .17 </td> <td style="text-align:right;"> .04 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> ```r *# method: a character or function *# 可参见帮助文档中的示例 ``` ] --- ```r *# install.packages("estimatr") # 要安装这个包才会报告均值差异检验的结果,即 dinm = TRUE # datasummary_balance( # formula, data, output = "default", fmt = 1, # title = NULL, notes = NULL, align = NULL, stars = FALSE, # add_columns = NULL, add_rows = NULL, # dinm = TRUE, dinm_statistic = "std.error", escape = TRUE, ... # ) datasummary_balance(~ sex, data = penguins %>% select(-year, -species, -island), fmt = 2, stars = TRUE) ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">female (N=165)</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">male (N=168)</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Std. Dev. </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Std. Dev. </th> <th style="text-align:right;"> Diff. in Means </th> <th style="text-align:right;"> Std. Error </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> bill_length_mm </td> <td style="text-align:right;"> 42.10 </td> <td style="text-align:right;"> 4.90 </td> <td style="text-align:right;"> 45.85 </td> <td style="text-align:right;"> 5.37 </td> <td style="text-align:right;"> 3.76*** </td> <td style="text-align:right;"> 0.56 </td> </tr> <tr> <td style="text-align:left;"> bill_depth_mm </td> <td style="text-align:right;"> 16.43 </td> <td style="text-align:right;"> 1.80 </td> <td style="text-align:right;"> 17.89 </td> <td style="text-align:right;"> 1.86 </td> <td style="text-align:right;"> 1.47*** </td> <td style="text-align:right;"> 0.20 </td> </tr> <tr> <td style="text-align:left;"> flipper_length_mm </td> <td style="text-align:right;"> 197.36 </td> <td style="text-align:right;"> 12.50 </td> <td style="text-align:right;"> 204.51 </td> <td style="text-align:right;"> 14.55 </td> <td style="text-align:right;"> 7.14*** </td> <td style="text-align:right;"> 1.49 </td> </tr> <tr> <td style="text-align:left;"> body_mass_g </td> <td style="text-align:right;"> 3862.27 </td> <td style="text-align:right;"> 666.17 </td> <td style="text-align:right;"> 4545.68 </td> <td style="text-align:right;"> 787.63 </td> <td style="text-align:right;"> 683.41*** </td> <td style="text-align:right;"> 79.89 </td> </tr> </tbody> </table> --- ```r # datasummary_crosstab( # formula, statistic = 1 ~ 1 + N + Percent("row"), data, # output = "default", fmt = 1, title = NULL, notes = NULL, # align = NULL, add_columns = NULL, add_rows = NULL, # sparse_header = TRUE, escape = TRUE, ... # ) datasummary_crosstab(species ~ sex * island, data = penguins, output = "gt") ```
species
female
male
All
Biscoe
Dream
Torgersen
Biscoe
Dream
Torgersen
Adelie
N
22
27
24
22
28
23
152
% row
14.5
17.8
15.8
14.5
18.4
15.1
100.0
Chinstrap
N
0
34
0
0
34
0
68
% row
0.0
50.0
0.0
0.0
50.0
0.0
100.0
Gentoo
N
58
0
0
61
0
0
124
% row
46.8
0.0
0.0
49.2
0.0
0.0
100.0
All
N
80
61
24
83
62
23
344
% row
23.3
17.7
7.0
24.1
18.0
6.7
100.0
--- layout: true ### .bold[2.2 .red[`modelsummary()`] & `modelplot()`] --- [{{下载 guerry.csv}}](data/Guerry.csv) ```r (guerry <- read_csv('data/Guerry.csv')) ``` ``` #> # A tibble: 86 × 24 #> ...1 dept Region Department Crime_pers Crime…¹ Liter…² Donat…³ Infants Suici…⁴ MainC…⁵ #> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 1 1 E Ain 28870 15890 37 5098 33120 35039 2:Med #> 2 2 2 N Aisne 26226 5521 51 8901 14572 12831 2:Med #> 3 3 3 C Allier 26747 7925 13 10973 17044 114121 2:Med #> # … with 83 more rows, 13 more variables: Wealth <dbl>, Commerce <dbl>, Clergy <dbl>, #> # Crime_parents <dbl>, Infanticide <dbl>, Donation_clergy <dbl>, Lottery <dbl>, #> # Desertion <dbl>, Instruction <dbl>, Prostitutes <dbl>, Distance <dbl>, Area <dbl>, #> # Pop1831 <dbl>, and abbreviated variable names ¹Crime_prop, ²Literacy, ³Donations, #> # ⁴Suicides, ⁵MainCity ``` ```r # 将4个回归模型的结果“打包”为一个列表对象 models <- list( "M1" = lm(Donations ~ Clergy, data = guerry), "M2" = lm(Donations ~ Commerce, data = guerry), "M3" = lm(Donations ~ Literacy, data = guerry), "M4" = lm(Donations ~ Commerce + Literacy + Clergy, data = guerry) ) ``` --- .pull-left.code80[ ```r # modelsummary( # models, output = "default", fmt = 3, # estimate = "estimate", # statistic = "std.error", # vcov = NULL, # conf_level = 0.95, # exponentiate = FALSE, # stars = FALSE, # shape = term + statistic ~ model, # coef_map = NULL, coef_omit = NULL, # coef_rename = NULL, # gof_map = NULL, gof_omit = NULL, # group_map = NULL, # add_columns = NULL, # add_rows = NULL, # align = NULL, # notes = NULL, title = NULL, # escape = TRUE, ... # ) modelsummary(models, output = "gt") %>% gt::opt_vertical_padding(scale = 0.8) ``` ] .pull-right[
M1
M2
M3
M4
(Intercept)
6210.669
4122.293
8759.068
96.107
(1269.581)
(1202.868)
(1559.363)
(3365.879)
Clergy
19.914
35.369
(25.372)
(25.623)
Commerce
68.998
89.652
(24.296)
(30.978)
Literacy
-42.886
40.912
(36.362)
(45.007)
Num.Obs.
86
86
86
86
R2
0.007
0.088
0.016
0.111
R2 Adj.
-0.005
0.077
0.005
0.079
AIC
1739.9
1732.7
1739.1
1734.4
BIC
1747.3
1740.0
1746.5
1746.7
Log.Lik.
-866.966
-863.338
-866.574
-862.210
F
0.616
8.065
1.391
3.420
RMSE
5779.42
5540.68
5753.14
5468.51
] --- .pull-left[ ```r modelsummary( models, output = "gt", * fmt = 2, stars = TRUE ) ```
M1
M2
M3
M4
(Intercept)
6210.67***
4122.29***
8759.07***
96.11
(1269.58)
(1202.87)
(1559.36)
(3365.88)
Clergy
19.91
35.37
(25.37)
(25.62)
Commerce
69.00**
89.65**
(24.30)
(30.98)
Literacy
-42.89
40.91
(36.36)
(45.01)
Num.Obs.
86
86
86
86
R2
0.007
0.088
0.016
0.111
R2 Adj.
-0.005
0.077
0.005
0.079
AIC
1739.9
1732.7
1739.1
1734.4
BIC
1747.3
1740.0
1746.5
1746.7
Log.Lik.
-866.966
-863.338
-866.574
-862.210
F
0.616
8.065
1.391
3.420
RMSE
5779.42
5540.68
5753.14
5468.51
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
] -- .pull-right[ ```r modelsummary( models, output = "gt", fmt = 2, * estimate = * "{estimate}{stars}<br>[{std.error}]", * statistic = NULL, * vcov = "robust", * coef_omit = "Intercept", * gof_omit = "^(R2|AIC|BIC|Log.Lik.)$" ) ```
M1
M2
M3
M4
Clergy
19.91<br>[35.01]
35.37<br>[28.49]
Commerce
69.00*<br>[29.76]
89.65*<br>[44.30]
Literacy
-42.89<br>[31.14]
40.91<br>[52.40]
Num.Obs.
86
86
86
86
R2 Adj.
-0.005
0.077
0.005
0.079
F
0.324
5.376
1.897
3.575
RMSE
5779.42
5540.68
5753.14
5468.51
Std.Errors
HC3
HC3
HC3
HC3
] --- layout: true ### .bold[2.2 `modelsummary()` & .red[`modelplot()`]] --- .pull-left.code80[ ```r # modelplot( # models, # conf_level = 0.95, # coef_map = NULL, coef_omit = NULL, # coef_rename = NULL, # vcov = NULL # exponentiate = FALSE, # add_rows = NULL, *# facet = FALSE, *# draw = TRUE, *# background = NULL, # ... # ) modelplot( models, vcov = "robust", coef_omit = "(Intercept)", background = list( geom_vline( xintercept = 0, linewidth = 1.5, color = 'orange', alpha = 0.5) ) ) + labs(title = "Models comparison") ``` ] .pull-right[ <!-- --> ] --- .pull-left.code80[ ```r # 1. fit # models <- list(...) # 2. summarize *mfun <- function(x) { * modelplot(models, conf_level = x, * draw = FALSE) %>% * mutate(.width = x) %>% * filter(term != "(Intercept)") *} *dat <- map_dfr(c(.80, .90, .95), mfun) # 3. plot dat %>% ggplot( aes(x = estimate, y = term, xmin = conf.low, xmax = conf.high, color = model) ) + ggdist::geom_pointinterval( #install it position = "dodge", interval_size_range = c(1, 3), fatten_point = .1 ) ``` ] .pull-right[ <!-- --> ] --- layout: false class: inverse, center, middle # 3. `kableExtra` and beyond ... --- layout: false class: inverse, center, middle background-image: url(imgs/logo-kableExtra.svg) background-size: 4%, 100% background-position: 5% 3% .left.font80[   [`kableExtra`](http://haozhu233.github.io/kableExtra/) <sup>.font60[v1.3.4]</sup>: Construct Complex Table with `kable` and Pipe Syntax] <iframe src="http://haozhu233.github.io/kableExtra/" width="100%" height="580px" data-external="1"></iframe> --- layout: false class: inverse, center, middle background-image: url(imgs/logo-flextable.svg) background-size: 4% background-position: 5% 3% .left.font80[   [`flextable`](https://ardata-fr.github.io/flextable-book/) <sup>.font60[v0.8.3]</sup>: Functions for Tabular Reporting] <iframe src="https://ardata-fr.github.io/flextable-book/" width="100%" height="580px" data-external="1"></iframe> --- layout: false class: inverse, center, middle background-image: url(imgs/logo-gtsummary.png) background-size: 5% background-position: 5% 3% .left.font80[   [`gtsummary`](https://www.danieldsjoberg.com/gtsummary/) <sup>.font60[v1.6.2]</sup>: Presentation-Ready Data Summary and Analytic Result Tables] <iframe src="https://www.danieldsjoberg.com/gtsummary/" width="100%" height="580px" data-external="1"></iframe> --- class: center middle background-image: url(imgs/xaringan.png) background-size: 12% background-position: 50% 40% <br><br><br><br><br><br><br> <hr color='#f00' size='2px' width='80%'> <br> .Large.red[_**本网页版讲义的制作由 R 包 [{{`xaringan`}}](https://github.com/yihui/xaringan) 赋能!**_]