[R]Apacheのログファイルの日付列から日付時刻型ベクトルを作成する
tidyverseパッケージを読み込むと読み込まれるlubridateパッケージのparse_date_time関数を使う。以下は15行からなるサンプルのApacheのログファイルを読み込み、日付列(4列目)を日付時刻型ベクトルに変換した例。
> library(tidyverse)
> options(readr.show_progress = FALSE, readr.show_col_types = FALSE)
> tib <- read_log("access.log")
> dim(tib)
[1] 15 9
> tib[, 1:6] |> as.data.frame()
X1 X2 X3 X4 X5 X6
1 192.168.12.2 NA NA 06/Dec/2024:23:51:07 +0900 GET / HTTP/1.1 200
2 192.168.12.2 NA NA 06/Dec/2024:23:51:07 +0900 GET /icons/ubuntu-logo.png HTTP/1.1 200
3 192.168.12.2 NA NA 06/Dec/2024:23:51:07 +0900 GET /favicon.ico HTTP/1.1 404
4 192.168.12.4 NA NA 06/Dec/2024:23:53:49 +0900 GET / HTTP/1.1 200
5 192.168.12.4 NA NA 06/Dec/2024:23:53:49 +0900 GET /icons/ubuntu-logo.png HTTP/1.1 200
6 192.168.12.4 NA NA 06/Dec/2024:23:53:49 +0900 GET /favicon.ico HTTP/1.1 404
7 192.168.12.4 NA NA 06/Dec/2024:23:54:23 +0900 GET / HTTP/1.1 200
8 192.168.12.4 NA NA 06/Dec/2024:23:54:23 +0900 GET /icons/ubuntu-logo.png HTTP/1.1 200
9 192.168.12.4 NA NA 06/Dec/2024:23:54:23 +0900 GET /favicon.ico HTTP/1.1 404
10 192.168.12.4 NA NA 06/Dec/2024:23:55:14 +0900 408
11 192.168.12.4 NA NA 06/Dec/2024:23:56:56 +0900 GET / HTTP/1.1 200
12 192.168.12.4 NA NA 06/Dec/2024:23:56:56 +0900 GET /icons/ubuntu-logo.png HTTP/1.1 200
13 192.168.12.4 NA NA 06/Dec/2024:23:56:56 +0900 GET /favicon.ico HTTP/1.1 404
14 192.168.12.4 NA NA 06/Dec/2024:23:57:04 +0900 GET / HTTP/1.1 200
15 192.168.12.4 NA NA 06/Dec/2024:23:57:55 +0900 408
> Sys.timezone()
[1] "Asia/Tokyo"
> parse_date_time(tib$X4, "%d/%m/%Y:%H:%M:%S %z")
[1] "2024-12-06 14:51:07 UTC" "2024-12-06 14:51:07 UTC" "2024-12-06 14:51:07 UTC"
[4] "2024-12-06 14:53:49 UTC" "2024-12-06 14:53:49 UTC" "2024-12-06 14:53:49 UTC"
[7] "2024-12-06 14:54:23 UTC" "2024-12-06 14:54:23 UTC" "2024-12-06 14:54:23 UTC"
[10] "2024-12-06 14:55:14 UTC" "2024-12-06 14:56:56 UTC" "2024-12-06 14:56:56 UTC"
[13] "2024-12-06 14:56:56 UTC" "2024-12-06 14:57:04 UTC" "2024-12-06 14:57:55 UTC"
> parse_date_time(tib$X4, "%d/%m/%Y:%H:%M:%S %z", tz = "Asia/Tokyo")
Date in ISO8601 format; converting timezone from UTC to "Asia/Tokyo".
[1] "2024-12-06 23:51:07 JST" "2024-12-06 23:51:07 JST" "2024-12-06 23:51:07 JST"
[4] "2024-12-06 23:53:49 JST" "2024-12-06 23:53:49 JST" "2024-12-06 23:53:49 JST"
[7] "2024-12-06 23:54:23 JST" "2024-12-06 23:54:23 JST" "2024-12-06 23:54:23 JST"
[10] "2024-12-06 23:55:14 JST" "2024-12-06 23:56:56 JST" "2024-12-06 23:56:56 JST"
[13] "2024-12-06 23:56:56 JST" "2024-12-06 23:57:04 JST" "2024-12-06 23:57:55 JST"
> dttm <- parse_date_time(tib$X4, "%d/%m/%Y:%H:%M:%S %z", tz = "Asia/Tokyo")
Date in ISO8601 format; converting timezone from UTC to "Asia/Tokyo".
> class(dttm)
[1] "POSIXct" "POSIXt"
上記のとおりparse_date_time関数は読み込んだ日付時刻を強制的にUTCと判断して変換する。そのため、読み込む際には明示的にタイムゾーンを指定する必要がある。変換された日付時刻型ベクトルのクラスはPOSIXctである。