eBPF on Kernel debugging
跟風來學一下最新的技術 eBPF(extended Berkeley Packet Filter),本來想看看對 debug kernel 有沒有什麼幫助,仔細研究了一下發現 eBPF 更多是在觀察跟統計 kernel 的行為上,對 debug kernel 的幫助不大。不過時間都花了,就稍微記綠一下,主要還是著重在對 debug 有幫助的東西上。
-
簡介 eBPF 能做什麼 目前 eBPF 支援以下幾種類型的程式,但是這邊只研究 kprobe 跟 tracepoint。
userspace app 與 kernel module 溝通的資料結構有下列幾種1
2
3
4
5
6
7
8
9
10bool is_socket = strncmp(event, "socket", 6) == 0; // a network packet filter
bool is_kprobe = strncmp(event, "kprobe/", 7) == 0; // determine whether a kprobe should fire or not
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0; // determine whether a kretprobe should fire or not
bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0; // determine whether a tracepoint should fire or not
bool is_xdp = strncmp(event, "xdp", 3) == 0; // a network packet filter run from the device-driver receive path
bool is_perf_event = strncmp(event, "perf_event", 10) == 0; // determine whether a perf event handler should fire or not
bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0; // a network packet filter for control groups
bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0; // a network packet filter for control groups that is allowed to modify socket options
bool is_sockops = strncmp(event, "sockops", 7) == 0; // a program for setting socket parameters
bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0; // a network packet filter for forwarding packets between sockets1
2
3
4
5
6
7
8
9
10
11
12
13
14
15BPF_MAP_TYPE_HASH: a hash table
BPF_MAP_TYPE_ARRAY: an array map, optimized for fast lookup speeds, often used for counters
BPF_MAP_TYPE_PROG_ARRAY: an array of file descriptors corresponding to eBPF programs; used to implement jump tables and sub-programs to handle specific packet protocols
BPF_MAP_TYPE_PERCPU_ARRAY: a per-CPU array, used to implement histograms of latency
BPF_MAP_TYPE_PERF_EVENT_ARRAY: stores pointers to struct perf_event, used to read and store perf event counters
BPF_MAP_TYPE_CGROUP_ARRAY: stores pointers to control groups
BPF_MAP_TYPE_PERCPU_HASH: a per-CPU hash table
BPF_MAP_TYPE_LRU_HASH: a hash table that only retains the most recently used items
BPF_MAP_TYPE_LRU_PERCPU_HASH: a per-CPU hash table that only retains the most recently used items
BPF_MAP_TYPE_LPM_TRIE: a longest-prefix match trie, good for matching IP addresses to a range
BPF_MAP_TYPE_STACK_TRACE: stores stack traces
BPF_MAP_TYPE_ARRAY_OF_MAPS: a map-in-map data structure
BPF_MAP_TYPE_HASH_OF_MAPS: a map-in-map data structure
BPF_MAP_TYPE_DEVICE_MAP: for storing and looking up network device references
BPF_MAP_TYPE_SOCKET_MAP: stores and looks up sockets and allows socket redirection with BPF helper functions -
為什麼要用 eBPF
- Q. kprobe module 直接寫就好了,為什麼要使用 eBPF 包起來寫?
- A. eBPF code runs in vm, never panic the running kernel
- 實用工具
bcc, 可以在下面 reference 的 BPF Compiler Collection (BCC) 找到安裝方法,或是直接裝 snap package
1
2snap install bcc
# and the tools are prefixed by bcc, e.g. sudo bcc.biotop1
2
3
4
5/*
* This program traces functions and frequency counts them with their entire
* stack trace, summarized in-kernel for efficiency.
*/
sudo /usr/share/bcc/tools/stackcount -K hrtimer_init_sleeper1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35/*
* trace probes functions you specify and displays trace messages if a particular
* condition is met. You can control the message format to display function
* arguments and return values.
*/
/*
* "retval": "PT_REGS_RC(ctx)",
* "arg1": "PT_REGS_PARM1(ctx)",
* "arg2": "PT_REGS_PARM2(ctx)",
* "arg3": "PT_REGS_PARM3(ctx)",
* "arg4": "PT_REGS_PARM4(ctx)",
* "arg5": "PT_REGS_PARM5(ctx)",
* "arg6": "PT_REGS_PARM6(ctx)",
* "$uid": "(unsigned)(bpf_get_current_uid_gid() & 0xffffffff)",
* "$gid": "(unsigned)(bpf_get_current_uid_gid() >> 32)",
* "$pid": "(unsigned)(bpf_get_current_pid_tgid() & 0xffffffff)",
* "$tgid": "(unsigned)(bpf_get_current_pid_tgid() >> 32)",
* "$cpu": "bpf_get_smp_processor_id()"
*/
sudo /usr/share/bcc/tools/trace '::sys_execve "%s", arg1'
PID COMM FUNC -
4402 bash sys_execve /usr/bin/man
4411 man sys_execve /usr/local/bin/less
4411 man sys_execve /usr/bin/less
4410 man sys_execve /usr/local/bin/nroff
4410 man sys_execve /usr/bin/nroff
4409 man sys_execve /usr/local/bin/tbl
4409 man sys_execve /usr/bin/tbl
4408 man sys_execve /usr/local/bin/preconv
4408 man sys_execve /usr/bin/preconv
4415 nroff sys_execve /usr/bin/locale
4416 nroff sys_execve /usr/bin/groff
4418 groff sys_execve /usr/bin/grotty
4417 groff sys_execve /usr/bin/troff
^C1
2
3
4
5
6sudo /usr/share/bcc/tools/trace 't:block:block_rq_complete "sectors=%d", args->nr_sector' -T
TIME PID COMM FUNC -
01:23:51 0 swapper/0 block_rq_complete sectors=8
01:23:55 10017 kworker/u64: block_rq_complete sectors=1
01:23:55 0 swapper/0 block_rq_complete sectors=8
^C1
2sudo /usr/share/bcc/tools/trace 'r::__kmalloc (retval == 0) "kmalloc failed!"'
Trace returns from __kmalloc which returned a null pointer1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21/*
* This program traces functions, tracepoints, or USDT probes that match a
* specified pattern, and when Ctrl-C is hit prints a summary of their count
* while tracing.
*/
sudo /usr/share/bcc/tools/funccount 'vfs_*'
Tracing... Ctrl-C to end.
^C
FUNC COUNT
vfs_create 1
vfs_rename 1
vfs_fsync_range 2
vfs_lock_file 30
vfs_fstatat 152
vfs_fstat 154
vfs_write 166
vfs_getattr_nosec 262
vfs_getattr 262
vfs_open 264
vfs_read 470
Detaching...1
2
3
4
5
6
7
8
9sudo /usr/share/bcc/tools/funccount t:block:*
Tracing 19 functions for "t:block:*"... Hit Ctrl-C to end.
^C
FUNC COUNT
block:block_rq_complete 7
block:block_rq_issue 7
block:block_getrq 7
block:block_rq_insert 7
Detaching...1
2
3
4
5
6
7/*
* This program traces hard interrupts (irqs), and stores timing statistics
* in-kernel for efficiency.
*/
sudo /usr/share/bcc/tools/hardirqs
# -d : distribution histogram
sudo /usr/share/bcc/tools/hardirqs -d1
2
3
4
5
6
7/*
* This program traces soft interrupts (irqs), and stores timing statistics
* in-kernel for efficiency.
*/
sudo /usr/share/bcc/tools/softirqs
# -d : distribution histogram
sudo /usr/share/bcc/tools/softirqs -d
參考文件
- eBPF 簡史 - 非常全面的中文 eBPF 介紹。 Linux Enhanced BPF (eBPF) Tracing Tools - eBPF 的大本營。所以 eBPF 的學習資源都可以在這找到。 BPF Compiler Collection (BCC) - 使用 python 寫出的工具集,完美的將 kernel module 及 userspace app 整合到 python 裡,大幅降低 eBPF 的使用難度。 bcc Tutorial - 詳細的 BCC 工具程式說明 bcc Reference Guide - BCC API 文件 A dynamic tracer for Linux - 比 BCC 更精簡的一行式程式,目前可以使用 kprobe/kretprobe 等 kernel 提供的除錯函數。 BPF samples in Kernel - Linux Kernel 收集的一些 BPF 範例。