Beginning with eBPF - Hello World and What Actually Happens When You Trace a Syscall

                
                    Christian Lehnert •
                
                2026-05-12 •
                ~10 min read

Why eBPF and Why Now

eBPF is the most consequential change to the Linux kernel in the last
decade, and most engineers I talk to outside the networking and
observability worlds have not yet figured out why they should care. The
shorthand is this. eBPF lets you run sandboxed programs inside the
kernel, attached to specific events such as syscall entries, kernel
function calls, network packet arrivals, or scheduler decisions, and
collect or act on data with overhead measured in tens of nanoseconds.
The programs cannot crash the kernel because they are verified by a
checker before they load, they cannot loop forever because the verifier
proves termination, and they cannot read or write arbitrary memory
because the verifier proves bounded access.

The result is a programmable kernel without the risks of writing a
kernel module. The use cases that have moved into production already
include Cilium for Kubernetes networking, which has replaced iptables
in much of the cloud-native world; Falco for runtime security
monitoring; bpftrace for ad-hoc performance investigation; and a
growing ecosystem of observability tools that attach to running
production systems without restarts.

The career relevance for any serious Linux engineer in 2026 is
straightforward. The next decade of Linux operations will be built on
this technology. The engineers who learn it now will spend the rest of
the decade reading and writing eBPF programs as routinely as they
currently read and write shell scripts. The engineers who avoid it
will be at the mercy of tools they cannot inspect.

This post is the first step. The canonical hello world. What it does,
how it works, and what is actually happening inside the kernel when
you run it.

The Code

The hello world is small enough to read in one screen. Two files. A
Python wrapper that handles loading and a string of restricted C that
runs in the kernel.

 1#!/usr/bin/python3
 2from bcc import BPF
 3 
 4program = r"""
 5int hello(void *ctx) {
 6    bpf_trace_printk("Hello World!");
 7    return 0;
 8}
 9"""
10 
11b = BPF(text=program)
12syscall = b.get_syscall_fnname("execve")
13b.attach_kprobe(event=syscall, fn_name="hello")
14 
15print("eBPF program attached. Running... Press Ctrl+C to stop.")
16b.trace_print()

Twelve lines of Python. Four lines of C. And the result is that every
time any process anywhere on the system calls the execve syscall, which
is what happens whenever you run any program, the kernel prints "Hello
World!" through a tracing pipe that the Python script reads and
displays.

On Debian or Ubuntu, the prerequisites install with one command.

1sudo apt install bpfcc-tools linux-headers-$(uname -r) python3-bpfcc

And the program runs with root privileges, which are required because
loading eBPF programs is a privileged operation.

1sudo python3 hello_ebpf.py

Open a second terminal. Run ls or date or any other command. The
first terminal prints Hello World! for each one. Press Ctrl+C to
detach the probe and exit.

The whole experience takes ninety seconds end to end. The implications
are larger.

What Actually Happens

The twelve lines of Python and four lines of C produce a sequence of
events at multiple layers of the system, and understanding that
sequence is the difference between using eBPF as a magic black box and
using it as a real engineering tool.

Step one. The C source is compiled to BPF bytecode by Clang inside
BCC at runtime. BCC is short for BPF Compiler Collection. It bundles
an LLVM and Clang frontend that takes C source as a string and
produces BPF bytecode for the kernel. This is the reason BCC requires
kernel headers on the host. The headers are needed by Clang to resolve
kernel type definitions referenced by your C code. It is also the
reason BCC programs feel slow to start the first time, because they
literally compile every invocation.

Step two. The bytecode is loaded into the kernel through the bpf
syscall. Userspace cannot directly write to kernel memory. It can,
however, ask the kernel through a well-defined syscall to install a
program. The bpf syscall is the entry point for this. It takes the
bytecode and a description of what the program is supposed to do, and
the kernel takes ownership of validation and loading.

Step three. The kernel BPF verifier examines the bytecode and proves
properties about it. This is the technical heart of eBPF and the
reason it can be trusted to run in kernel space at all. The verifier
walks every possible execution path through the program and proves
several things. It proves that the program terminates because there
are no unbounded loops. It proves that every memory access is in
bounds. It proves that the program uses no more than a fixed amount of
stack space. It proves that the program does not call any function
outside a small allowed set called BPF helpers. If any of these proofs
fail, the kernel rejects the program and returns an error.

The verifier is doing genuinely sophisticated static analysis here.
The first time you write an eBPF program that the verifier rejects,
you see error messages that are unlike anything else in the Linux
ecosystem. The verifier explains in detail which register held what
value at which instruction and why the path was rejected. Learning to
read these messages is part of becoming fluent.

Step four. If the verifier accepts the program, the kernel
JIT-compiles the BPF bytecode to native machine code for the host
architecture. On x86_64 and arm64 this happens by default. The
result is that your eBPF program runs at full native speed, not
through an interpreter. This is the reason eBPF overhead is measured
in tens of nanoseconds rather than microseconds.

Step five. The Python code calls attach_kprobe to register the
function as a kprobe handler for execve. A kprobe is a dynamic
instrumentation point that the kernel exposes on almost every internal
function. Attaching a kprobe means asking the kernel to call your
function every time the named kernel function fires. There is no
patching of the running kernel. There is no recompilation. The kernel
maintains a separate set of probe entry points that are activated on
demand.

Step six. Every subsequent execve syscall anywhere on the system
triggers your handler. When any process anywhere calls execve, the
kernel reaches the execve implementation, notices that a kprobe is
attached, calls your verified and JIT-compiled function, and then
continues with the normal execve work. Your function calls
bpf_trace_printk, which is a BPF helper that writes a string to a
kernel tracing pipe at /sys/kernel/debug/tracing/trace_pipe.

Step seven. The Python script reads from the tracing pipe and
prints each line to stdout. The trace_print method is a thin loop
around reading that pipe. The strings that the kernel wrote during
each kprobe invocation flow out of the pipe into your terminal.

The total path from a process running ls in one terminal to the
text appearing in the other terminal involves a kernel syscall entry,
a verified BPF program running natively in kernel context, a write
to a kernel tracing buffer, a userspace read from that buffer through
the procfs interface, and a print to stdout. The entire round trip
completes in roughly the same time it takes the process being traced
to finish its own work.

The Honest Limits

Hello world is hello world. The full picture has caveats worth naming
before you start writing your second eBPF program.

Root is required to load eBPF programs in the general case.
There are mechanisms to relax this for specific use cases, including
the CAP_BPF capability and unprivileged BPF for specific program
types, but the practical answer for any serious work is that you need
either root or specific capabilities granted to a service account.

BCC depends on runtime compilation, which has tradeoffs. The BCC
approach used in this hello world is convenient but it requires LLVM
and kernel headers on every machine where the program runs. This is
fine for a development workstation. It is increasingly painful for
production deployment across a fleet. The modern alternative is BPF
CO-RE, which stands for Compile Once Run Everywhere. CO-RE programs
are written with libbpf in C, compiled to BPF bytecode on a build
machine, and shipped as binaries that work across kernel versions
without local recompilation. BCC remains excellent for learning and
ad-hoc work. CO-RE is the right approach for anything that has to be
deployed at scale.

The verifier is strict, and its strictness is the entire point.
Every eBPF programmer eventually writes a program that they believe
is correct and that the verifier rejects. Sometimes the program is
genuinely incorrect. Sometimes the verifier is unable to prove
correctness even though the program is sound. Either way, the
verifier is non-negotiable. You either restructure the code until the
verifier accepts it, or you do not load the program.

Kernel version matters. eBPF has evolved rapidly across recent
kernel releases. Features available in 6.x are not available in older
kernels. Programs written for one kernel version may not load on
another. The maturity has improved substantially in recent years, but
version awareness remains part of the craft.

What Comes Next

This hello world is the canonical first step in a learning path that
extends in several directions. Each one is worth a separate post in
its own right.

The first direction is tracing. The hello world attaches a kprobe
to a syscall. The same mechanism extends to any kernel function, to
tracepoints which are stable predeclared instrumentation points, and
to uprobes which attach to userspace function entries. The tools
bpftrace and bcc-tools build on this foundation to provide an entire
shell of useful one-liners and scripts for production performance
investigation.

The second direction is networking. Most production eBPF today is
networking eBPF. Cilium uses it for Kubernetes networking. Katran uses
it for layer-4 load balancing. The XDP framework attaches eBPF
programs at the network driver level, before packets even reach the
kernel networking stack, and is the basis for high-performance packet
processing at hundreds of millions of packets per second.

The third direction is security. Falco and Tetragon use eBPF to
monitor system calls and kernel events for security-relevant
behavior. The runtime security space is shifting toward eBPF as the
foundation because it provides observability with overhead low enough
to run continuously in production.

The fourth direction is observability. The Pixie project uses
eBPF for automatic application observability without code changes.
Companies like Datadog and New Relic have integrated eBPF-based
profiling into their products. The pattern is the same in each case.
Attach an eBPF program to the right kernel hook, collect data, ship
it to userspace for analysis, and you have instrumentation that no
amount of application-level code could provide.

The learning path through these four directions is roughly the next
year of part-time study for any engineer who decides to commit to it.
The hello world above is approximately the first hour of that year.

Closing

eBPF is the kind of technology that arrives quietly and then becomes
the substrate that everything else is built on. The cloud-native
networking world is already there. The observability world is moving
quickly. The security world is following. The general Linux
operations world has not yet noticed how comprehensive the shift is
going to be.

If you have read this far, you have an opportunity to be early. The
hello world above runs on any modern Linux machine with two packages
installed. The next post in this series goes deeper into bpftrace and
the one-liners that will change how you debug production performance
problems. The post after that addresses the move from BCC to libbpf
and CO-RE for production deployment.

The first step is the twelve lines of Python and four lines of C
above. Run them. Read the output. Read the kernel documentation. The
career trajectory of every serious Linux engineer in the next decade
goes through this technology. Now is the time to start.

The code from this post is in a small repository at the usual place
on my homelab git server. Pull it, modify it, break it, and learn
from the verifier's error messages. That last part, in particular, is
where the real understanding happens.

                Tagged:
            

#linux #ebpf #kernel #sysadmin

                
                   ← Back to posts