diff --git a/HOWTO.md b/HOWTO.md index 123647c..e5ef6d5 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -1,16 +1,17 @@ +### Disclaimer + +Like many other debugging tools, **bpfvv** may help you better understand **what** is happening with the verification of your BPF program but it is up to you to figure out **why** it is happening. + # How to use bpfvv -> [!WARNING] -> The bpfvv app is in early stages of development, and you should expect -> bugs, UI inconveniences and significant changes from week to week. -> -> If you're working with BPF and you think this tool (or a better -> version of it) would be useful, feel free to use it and don't be shy -> to report issues and request features via github. Thanks! +The tool itself is hosted here: https://libbpf.github.io/bpfvv/ -Go here: https://libbpf.github.io/bpfvv/ +You can load a log by pasting it into the text box or choosing a local file. -Load a log by pasting it into the text box or choosing a file. +You can also use the `url` query parameter to link to a raw log file, for example: +``` +https://libbpf.github.io/bpfvv/?url=https://gist.githubusercontent.com/theihor/e0002c119414e6b40e2192bd7ced01b1/raw/866bcc155c2ce848dcd4bc7fd043a97f39a2d370/gistfile1.txt +``` The app expects BPF verifier log of `BPF_LOG_LEVEL1`[^1]. This is a log that you get when your BPF program has failed verification on load @@ -42,14 +43,32 @@ lot of information about the interpreted state of the program on each instruction. The app parses the log and re-constructs program states in order to display potentially useful information in interactive way. -There are two main views of the program: -* (on the left) formatted log, instruction stream -* (on the right) program state: known values of registers and stack slots -Screenshot 2025-04-25 at 4 19 21 PM +## UI overview + +There are three main views of the program: +* (on the left) C source view +* (in the middle) interactive instruction stream +* (on the right) program state: known values of registers and stack slots for the *selected log line* + +The left and right views are collapsible -## What's in the log +https://github.com/user-attachments/assets/758d650b-22f1-49f0-ab46-ae1a089667a8 -Notice that the displayed text has different content than the raw log. +### Top bar + +The top bar contains basic app controls such as: +* clear current log +* load an example log +* load a local file +* link to this howto doc + +https://github.com/user-attachments/assets/4d3f8aa0-cb9d-46e0-ae46-a1224c7a5600 + +### The instruction stream + +The main view of the log is the interactive instruction stream. + +Notice that the displayed text has content different from the raw log. For example, consider this line: ``` 1: (7b) *(u64 *)(r10 -24) = r2 ; R2_w=1 R10=fp0 fp-24_w=1 @@ -70,157 +89,152 @@ interactive features. Notable example is call instructions. For example, consider the following raw log line: ``` -23: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=3,map=eventmap,ks=4,vs=2452) +7: (85) call bpf_probe_read_user#112 ``` It is displayed like this: ``` -r0 = call bpf_map_lookup_elem#1(r1, r2, r3, r4, r5) +r0 = bpf_probe_read_user(dst: r1, size: r2, unsafe_ptr: r3) ``` +If bpfvv is aware of a helper signature, it knows the number and names of arguments and displays them in the format `name: reg`. +For known helpers its name is also a link to documentation for that helper. + Notice also that the lines not recognized by the parser are greyed out. If you notice an unrecognized instruction, please submit a bug report. -### Subprogram calls +#### Data dependencies + +The app computes a use-def analysis [^2] and you can interactively view dependencies between the instructions. + +The concept is simple. Every instruction may read some slots (registers, stack, memory) and write to others. +Knowing these it is possible to determine, for a given slot, where its value came from, from what slot, and at what instruction. + +You can view the results of this analysis by clicking on some instruction operands (registers and stack slots). + +The selected slot is identified by a box around it. This selection changes the log view, greying out "irrelevant" instructions, and leaving only data-dependent instructions in the foreground. + +On the left side of the instruction stream are the lines visualizing the dependencies. The lines are interactive and can be used for navigation. + +https://github.com/user-attachments/assets/82ae80d6-314e-47bf-9892-f5dded4b9944 + +#### Subprogram calls When there is a subprogram call in the log instruction stream, the -stack frames are tracked by the app when computing state. When subprogram -call is detected there is indentation and comments in the main log view to -visualize it. +stack frames are tracked by the app when computing state. When a subprogram +call is detected it is visualized in the main log view. -Screenshot 2025-04-25 at 4 35 14 PM +https://github.com/user-attachments/assets/14b2302e-9814-4d9a-ae94-e176727fd11a +### The state panel -## What can you do? +The state panel displays the current state of the program based on the loaded log, with the current state determined by the line selected in the instruction stream view. -### Step through the instruction stream +Remember that the verifier log is a trace through the program. +This means that a particular instruction may be visited more than once, and the state at the same instruction (but a different point of execution) is usually also different. And so a log line roughly represents a particular point of the program execution, as interpreted by the BPF verifier. -The most basic feature of the visualizer is "stepping" through the -log, similar to what you'd do in a debugger. +The verifier reports changes in the program state like this: +``` +1: (7b) *(u64 *)(r10 -24) = r2 ; R2_w=1 R10=fp0 fp-24_w=1 +``` +After the semicolon `;`, there are expressions showing relevant register and stack slot states. The visualizer accumulates this information from all the prior instructions, and in the state panel this accumulated state is displayed. -You can select a line by clicking on it, or by navigating with arrows -(you can also use pgup, pgdown, home and end). The selected line has -light-blue background. +The header of the state panel shows the context of the state: log line number, C line number, program counter (PC) and the stack frame index. -When a line is selected, current state of known values is displayed in -the panel on the right. By moving the selected line up/down the log, -you can see how the values change with each instruction. +The known values of the registers and stack slots are displayed in a table. -In the "state panel", the values that are written by selected -instruction are marked with light-red background and the previous -value is also often displayed, for example: +The background color of a row in the state panel indicates that the relevant value has been affected by the selected instruction. +Rows marked with red background indicate a "write" and the previous value is also often displayed, for example: ``` r6 scalar(id=1) -> 0 ``` -Means that current instruction changes the value of `r6` from -`scalar(id=1)` to `0`. +This means that current instruction changes the value of `r6` from `scalar(id=1)` to `0`. -The values that are read by current instruction have light-green -background. +The values that are read by the current instruction have a blue background. -Note that for "update" instructions (such as `r1 += 8`), the slot -will be marked as written. +Note that for "update" instructions (such as `r1 += 8`), the slot will be marked as written. -#### Sometimes a value of a slot has changed, but it's not highlighted as a write. Is that a bug? +This then allows you to "step through" the instruction stream and watch how the values are changing, similar to classic debugger UIs. +You can click on the lines that interest you, or use arrow keys to navigate. -Currently the visualizer only considers writes derived from the instructions -themselves. For example, `r1 = r2` is a write by definition, or a call would -scratch some registers. +https://github.com/user-attachments/assets/c6b5b5b1-30fb-4309-a90a-1832a0a33502 -But remember that we are looking at the BPF verifier log. BPF verifier -simulates execution of a program, which requires maintaining and continuously -updating a virtual state of the program. This means that whenever the verifier -gains some knowledge about a value (which is not necesarily a write instruction), -it will update it. +#### The rows in the state panel are clickable! -For example when processing conditional jumps such as `if (r2 == 0) goto pc+6`, -the verifier usually explores both branches. But in both cases it gained information -about r2: it's either 0 or not. And so while there was no explicit write into r2, -it's value is known (and has changed) after the jump instruction, when you look at -it in the verifier log. +It is sometimes useful to jump to the source of a particular slot value from the selected instruction, even if the slot is not relevant to that instruction. -Going forward the visualizer will likely treat all value updates as writes, -as it is useful to know at what point verifier inferred a particular value. +https://github.com/user-attachments/assets/8f5d03cc-54a5-426b-8428-c8b11f4ccf11 -### View data dependencies +### The C source view -The app computes a use-def analysis [^2] and you can interactively -view dependencies between the instructions. +The C source view panel (on the left) shows reconstructed C source lines. -The concept is simple. Every instruction may read some slots -(registers, stack, memory) and write to others. Knowing these sets -(verifier log contains enough information to compute them), it is -possible to determine for a slot used by current instruction, where -its value came from (from what slot in what instruction). +A raw verifier log might contain source line information, and bpfvv attempts to reconstruct the source code and associate it with the instructions. +Here is how it looks in the raw log: +``` +1800: R1=scalar() R10=fp0 +; int rb_print(rbtree_t __arg_arena *rbtree) @ rbtree.bpf.c:507 +1800: (b4) w0 = -22 ; R0_w=0xffffffea +; if (unlikely(!rbtree)) @ rbtree.bpf.c:517 +1801: (15) if r1 == 0x0 goto pc+132 ; R1=scalar(umin=1) +``` -You can view the results of this analysis by clicking on some -instruction operands (registers and stack slots). +The original source code is not available in the log of course. So bpfvv doesn't have enough information to even format it properly. -The selected slot is identified by a box. This selection changes the -log view, greying out "irrelevant" instructions, and leaving only -data-dependent instructions in the foreground. -Screenshot 2025-04-25 at 4 50 11 PM +However, it allows you to see a rough correspondence between BPF instructions and the original C source code. -#### What's clickable? +Be aware though that this information is noisy and may be inaccurate, since it reached the visualizer through a long way: +* the compiler generated DWARF with line info, which is already "best-effort" +* DWARF was transformed into BTF with line data +* BTF was processed by the verifier and available information was dumped interleaved with the program trace -Registers r0-r9 and explicit stack accesses such as `*(u32 *)(r10 -8)`. +https://github.com/user-attachments/assets/3e8c52f0-3823-4d5f-abbd-f7c2d8e31d19 -r10 (stack frame pointer) is not clickable because it's effectively a -constant [^3]. +### The bottom panel -Note that the stack slots may be accessed indirectly: if say `r6 = fp-64` -and then you do `*(u32 *)(r6 -8)` it's equivalent to `*(u32 *)(r10 -72)`. -The visualizer does not show such dependencies (yet). Although state values -are tracked correctly. +The bottom panel shows original log text for the selected line and for the current hovered line. +It is sometimes useful to check the source of the information displayed by the visualizer. -#### How deep is the displayed dependency chain? -It depends, but usually not deep. +## Not frequently asked questions -The problem with showing all dependencies is that it's too much -information, which renders it useless. +### What exactly do "read" and "written" values means here? -Currently the upstream instruction is highlighted if it's an -unambiguous dependency. For example: -``` -42: r1 = 13 -43: r7 = 0 -44: r2 = r1 -``` +Here is a couple of trivial examples: +* `r1 = 0` this is a write to `r1` +* `r2 = r3` this is a read of `r3` and write to `r2` +* `r2 += 1` this is a read of `r2` and write to `r2`, aka an update -Instruction 42 is an unambiguous dependency of instruction 44, because -r1 is the only read slot, and there were no modifications to it along -the way. +Here is a couple of more complicated examples: +* `*(u64 *)(r10 -32) = r1` this is a read of `r1` and a write to `fp-32` + * `r10` is effectively constant[^3], as it is always a pointer to the top of a BPF stack frame, so stores to `r10-offset` are writes to the stack slots, identified by `fp-off` or `fp[frame]-off` in the visualizer +* `r1 = *(u64 *)(r2 -8)` this is a write to `r1` and a read of `r2`, however it may also be a read of the stack, if `r2` happens to contain a pointer to the stack slot -All such direct dependencies up the chain are shown. +Most instructions have intrinsic "read" and "write" sets, defined by its semantics. However context also matters, as you can see from the last example. -However, when more than one value is read in the upstream instruction, -the UI will stop highlighting at that instruction. +The visualizer takes into account a few important things, when determining data dependencies: +* it is aware of scratch and callee-saved register semantics of subprogram/helper calls +* it is aware of the stack frames: we enter new stack memory in a subprogram, and pop back on exit +* it is aware of indirect stack slot access and basic pointer arithmetic -Consider an example: -``` -42: r1 = r2 -43: r3 = *(u32 *)(r10 -16) -44: r1 += r3 -45: *(u32 *)(r10 -64) = r1 -``` +### Side effects? -If you select `r1` at instruction 45, only instruction 44 will be -highlighted, even though 42 and 43 are its transitive dependencies -(`r1 += r3` reads both `r1` and `r3`). +One counterintuitive thing about data dependencies in the context of BPF verification is that the instructions which don't do any arithmetic or memory stores can still change the progam state. -The reason for this UI behavior is that showing all dependencies (both -r1 and r3 and in turn all their dependencies) may very quickly cover -most of the instructions. This is especially true for call -instructions, which read up to 5 registers. +Remember, we are looking at the BPF verifier log. +The BPF verifier simulates the execution of a program, which requires maintaining a virtual state of the program. +This means that whenever the verifier gains some knowledge about a value (which is not necesarily an intrinsic write instruction), it will update the program state. -On the other hand the app can't know what the user is looking for, and -there is no point in guessing. So, for an instruction like `r1 += r3`, -the user must choose specific operand (r1 or r3 in this case) to -expand the dependency chain further. +For example, when processing conditional jumps such as `if (r2 == 0) goto pc+6`, +the verifier usually explores both branches. But in both cases it gained information +about `r2`: it's either 0 or not. And so while there was no explicit write into r2, +it's value is known (and has changed) after the jump instruction, when you look at +it in the verifier log. -#### Note on memory stores and loads +https://github.com/user-attachments/assets/94d271e2-f033-439b-8554-d9f8a66b4143 + +### What if we write to memory or a BPF arena? Currently non-stack memory access is a "black hole" from the point of view of use-def analysis in this app. The reason is that it's @@ -235,6 +249,28 @@ dependencies. If you see `*(u32 *)(r8 +0)` down the instruction stream, even if value of r8 hasn't changed, the analysis does not recognize these slots as "the same". +**Unless** `r8` contains a pointer to a stack slot. +In that case you can click both on the register to see where its value came from, and on the dereference expression to see where the stack slot value came from. + +https://github.com/user-attachments/assets/f345ec63-b91d-411c-b1d2-3890ed8f1c99 + +### An instruction is highlighted as dependency, but I don't understand why. Is that a bug? + +Probably not[^4]. + +The visualizer has a single source of information: the verifier log. +The log contains two streams of information: the instructions and the associated state change, as reported by the verifier. + +Some of the state that the visualizer computes is derived from the instructions themselves. +However, the state reported by the verifier always takes precedence. + +Since the values in the context of the visualizer are just strings, if the verifier reported a slightly different string, we treat it as an update. +For example, you might see something like this: +``` +r8 ptr_or_null_node_data(id=9,ref_obj_id=9,off=16) -> ptr_node_data(ref_obj_id=9,off=16) +``` + +The verifier reported a different value, and that's what bpfvv shows. ## Footnotes @@ -247,3 +283,5 @@ the browser will not be happy to render it. [^2]: https://en.wikipedia.org/wiki/Use-define_chain [^3]: https://docs.cilium.io/en/latest/reference-guides/bpf/architecture/ + +[^4]: But maybe yes... If you suspect a bug, please report. diff --git a/README.md b/README.md index 8a828e4..b60daa4 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,17 @@ -> [!WARNING] -> The bpfvv app is in early stages of development, and you should expect -> bugs, UI inconveniences and significant changes from week to week. -> -> If you're working with BPF and you think this tool (or a better -> version of it) would be useful, feel free to use it and don't be shy -> to report issues and request features via github. Thanks! - [![CI](https://github.com/libbpf/bpfvv/actions/workflows/ci.yml/badge.svg)](https://github.com/libbpf/bpfvv/actions/workflows/ci.yml) **bpfvv** stands for BPF Verifier Visualizer https://libbpf.github.io/bpfvv/ -This project is an experiment about visualizing Linux Kernel BPF verifier log to help BPF programmers with debugging verification failures. +BPF Verifier Visualizer is a tool to analyze Linux Kernel BPF verifier logs. + +The goal of bpfvv is to help BPF programmers debug verification failures. The user can load a text file, and the app will attempt to parse it as a verifier log. Successfully parsed lines produce a state which is then visualized in the UI. You can think of this as a primitive debugger UI, except it interprets a log and not a runtime state of a program. +For more information on how to use **bpfvv** see the [HOWTO.md](https://github.com/libbpf/bpfvv/blob/master/HOWTO.md) + ## Development - Fork the website repo: https://github.com/libbpf/bpfvv.git