[R]通算打数が5,000以上のプレイヤーのホームラン率と三振率の散布図(「Rによるセイバーメトリクス入門」(技術評論社)pp.62-63)
散布図の青実線は平滑化曲線。
> library(tidyverse)
> library(ggplot2)
> library(Lahman)
> Batting |> group_by(playerID) |>
+ summarize(tAB = sum(AB, na.rm = TRUE),
+ tHR = sum(HR, na.rm = TRUE),
+ tSO = sum(SO, na.rm = TRUE)) -> long_careers
> Batting_5000 <- filter(long_careers, tAB >= 5000)
> print(as.data.frame(head(Batting_5000)))
playerID tAB tHR tSO
1 aaronha01 12364 755 1383
2 abreubo01 8480 288 1840
3 abreujo02 5494 261 1218
4 adamssp01 5557 9 223
5 adcocjo01 6606 336 1059
6 alfoned01 5385 146 617
> ggplot(Batting_5000,
+ aes(x = tHR / tAB, y = tSO / tAB)) + geom_point() + geom_smooth() -> g
> print(g)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

