The only GPU observability tool your AI assistant can talk to.
Ingero - GPU Causal Observability
Featured in: awesome-ebpf · awesome-observability · awesome-sre-tools · awesome-cloud-native · awesome-profiling · Awesome-GPU · awesome-devops-mcp-servers · MCP Registry · Glama · mcpservers.org
Version: 0.8.2.13
The only GPU observability tool your AI assistant can talk to.
"What caused the GPU stall?" → "forward() at train.py:142 - cudaMalloc spiking 48ms during CPU contention. 9,829 calls, 847 scheduler preemptions."
Ingero is a production-grade eBPF agent that traces the full chain - from Linux kernel events through CUDA API calls to your Python source lines - with <2% overhead, zero code changes, and one binary.
Quick Start
# Install (Linux amd64 — see below for arm64/Docker)
VERSION=0.8.2
curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v${VERSION}/ingero_${VERSION}_linux_amd64.tar.gz" | tar xz
sudo mv ingero /usr/local/bin/
# Trace your GPU workload
sudo ingero trace
# Diagnose what happened
ingero explain --since 5m
- The "Why": Correlate a
cudaStreamSyncspike withsched_switchevents - the host kernel preempted your thread. - The "Where": Map CUDA calls back to Python source lines in your PyTorch
forward()pass. - The "Hidden Kernels": Trace the CUDA Driver API to see kernel launches by cuBLAS/cuDNN that bypass standard profilers.
No ClickHouse, no PostgreSQL, no MinIO - just one statically linked Go binary and embedded SQLite.
See a real AI investigation session - an AI assistant diagnosing GPU training issues on A100 and GH200 using only Ingero's MCP tools. No shell access, no manual SQL - just questions and answers.
What It Does
Ingero uses eBPF to trace GPU workloads at three layers, reads system metrics from /proc, and assembles causal chains that explain root causes:
- CUDA Runtime uprobes - traces
cudaMalloc,cudaFree,cudaLaunchKernel,cudaMemcpy,cudaMemcpyAsync,cudaStreamSync/cudaDeviceSynchronizevia uprobes onlibcudart.so - CUDA Driver uprobes - traces
cuLaunchKernel,cuMemcpy,cuMemcpyAsync,cuCtxSynchronize,cuMemAllocvia uprobes onlibcuda.so. Captures kernel launches from cuBLAS/cuDNN that bypass the runtime API. - Host tracepoints - traces
sched_switch,sched_wakeup,mm_page_alloc,oom_kill,sched_process_exec/exit/forkfor CPU scheduling, memory pressure, and process lifecycle - System context - reads CPU utilization, memory usage, load average, and swap from
/proc(no eBPF, no root needed)
The causal engine correlates events across layers by timestamp and PID to produce automated root cause analysis with severity ranking and fix recommendations.
$ sudo ingero trace
Ingero Trace - Live CUDA Event Stream
Target: PID 4821 (python3)
Library: /usr/lib/x86_64-linux-gnu/libcudart.so.12
CUDA probes: 14 attached
Driver probes: 10 attached
Host probes: 7 attached
System: CPU [████████░░░░░░░░░░░░] 47% | Mem [██████████████░░░░░░] 72% (11.2 GB free) | Load 3.2 | Swap 0 MB
CUDA Runtime API Events: 11,028
┌──────────────────────┬────────┬──────────┬──────────┬──────────┬─────────┐
│ Operation │ Count │ p50 │ p95 │ p99 │ Flags │
├──────────────────────┼────────┼──────────┼──────────┼──────────┼─────────┤
│ cudaLaunchKernel │ 11,009 │ 5.2 µs │ 12.1 µs │ 18.4 µs │ │
│ cudaMalloc │ 12 │ 125 µs │ 2.1 ms │ 8.4 ms │ ⚠ p99 │
│ cudaDeviceSynchronize│ 7 │ 684 µs │ 1.2 ms │ 3.8 ms │ │
└──────────────────────┴──────
Tools (7)
get_checkRetrieves system health and check status.get_trace_statsReturns statistics from the current trace session.get_causal_chainsAnalyzes and returns causal chains for GPU latency.get_stacksRetrieves stack traces for observed events.run_demoExecutes a demonstration trace.get_test_reportGenerates a test report for the current environment.run_sqlExecutes a SQL query against the internal trace database.Configuration
{"mcpServers": {"ingero": {"command": "ingero", "args": ["mcp"]}}}