Ingero MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
git clone https://github.com/ingero-io/ingero
cd ingero

Then follow the repository README for any remaining dependency or build steps before continuing.

2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add ingero -- node "<FULL_PATH_TO_INGERO>/dist/index.js"

Replace <FULL_PATH_TO_INGERO>/dist/index.js with the actual folder you prepared in step 1.

README.md

The only GPU observability tool your AI assistant can talk to.

Ingero - GPU Causal Observability

Featured in: awesome-ebpf · awesome-observability · awesome-sre-tools · awesome-cloud-native · awesome-profiling · Awesome-GPU · awesome-devops-mcp-servers · MCP Registry · Glama · mcpservers.org

Version: 0.8.2.13

The only GPU observability tool your AI assistant can talk to.

"What caused the GPU stall?" → "forward() at train.py:142 - cudaMalloc spiking 48ms during CPU contention. 9,829 calls, 847 scheduler preemptions."

Ingero is a production-grade eBPF agent that traces the full chain - from Linux kernel events through CUDA API calls to your Python source lines - with <2% overhead, zero code changes, and one binary.

Quick Start

# Install (Linux amd64 — see below for arm64/Docker)
VERSION=0.8.2
curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v${VERSION}/ingero_${VERSION}_linux_amd64.tar.gz" | tar xz
sudo mv ingero /usr/local/bin/

# Trace your GPU workload
sudo ingero trace

# Diagnose what happened
ingero explain --since 5m
  • The "Why": Correlate a cudaStreamSync spike with sched_switch events - the host kernel preempted your thread.
  • The "Where": Map CUDA calls back to Python source lines in your PyTorch forward() pass.
  • The "Hidden Kernels": Trace the CUDA Driver API to see kernel launches by cuBLAS/cuDNN that bypass standard profilers.

No ClickHouse, no PostgreSQL, no MinIO - just one statically linked Go binary and embedded SQLite.

See a real AI investigation session - an AI assistant diagnosing GPU training issues on A100 and GH200 using only Ingero's MCP tools. No shell access, no manual SQL - just questions and answers.

What It Does

Ingero uses eBPF to trace GPU workloads at three layers, reads system metrics from /proc, and assembles causal chains that explain root causes:

  1. CUDA Runtime uprobes - traces cudaMalloc, cudaFree, cudaLaunchKernel, cudaMemcpy, cudaMemcpyAsync, cudaStreamSync / cudaDeviceSynchronize via uprobes on libcudart.so
  2. CUDA Driver uprobes - traces cuLaunchKernel, cuMemcpy, cuMemcpyAsync, cuCtxSynchronize, cuMemAlloc via uprobes on libcuda.so. Captures kernel launches from cuBLAS/cuDNN that bypass the runtime API.
  3. Host tracepoints - traces sched_switch, sched_wakeup, mm_page_alloc, oom_kill, sched_process_exec/exit/fork for CPU scheduling, memory pressure, and process lifecycle
  4. System context - reads CPU utilization, memory usage, load average, and swap from /proc (no eBPF, no root needed)

The causal engine correlates events across layers by timestamp and PID to produce automated root cause analysis with severity ranking and fix recommendations.

$ sudo ingero trace

  Ingero Trace  -  Live CUDA Event Stream
  Target: PID 4821 (python3)
  Library: /usr/lib/x86_64-linux-gnu/libcudart.so.12
  CUDA probes: 14 attached
  Driver probes: 10 attached
  Host probes: 7 attached

  System: CPU [████████░░░░░░░░░░░░] 47% | Mem [██████████████░░░░░░] 72% (11.2 GB free) | Load 3.2 | Swap 0 MB

  CUDA Runtime API                                               Events: 11,028
  ┌──────────────────────┬────────┬──────────┬──────────┬──────────┬─────────┐
  │ Operation            │ Count  │ p50      │ p95      │ p99      │ Flags   │
  ├──────────────────────┼────────┼──────────┼──────────┼──────────┼─────────┤
  │ cudaLaunchKernel     │ 11,009 │ 5.2 µs   │ 12.1 µs  │ 18.4 µs  │         │
  │ cudaMalloc           │     12 │ 125 µs   │ 2.1 ms   │ 8.4 ms   │ ⚠ p99  │
  │ cudaDeviceSynchronize│      7 │ 684 µs   │ 1.2 ms   │ 3.8 ms   │         │
  └──────────────────────┴──────

Tools (7)

get_checkRetrieves system health and check status.
get_trace_statsReturns statistics from the current trace session.
get_causal_chainsAnalyzes and returns causal chains for GPU latency.
get_stacksRetrieves stack traces for observed events.
run_demoExecutes a demonstration trace.
get_test_reportGenerates a test report for the current environment.
run_sqlExecutes a SQL query against the internal trace database.

Configuration

claude_desktop_config.json
{"mcpServers": {"ingero": {"command": "ingero", "args": ["mcp"]}}}

Try it

Analyze the current GPU latency and explain the root cause using causal chains.
Run a SQL query to find all cudaMalloc calls that took longer than 5ms.
Get the trace statistics for the last 5 minutes of GPU activity.
Identify if any CPU scheduling preemptions are causing my current GPU stalls.

Frequently Asked Questions

What are the key features of Ingero?

Traces CUDA Runtime and Driver APIs via eBPF uprobes. Correlates GPU events with host kernel tracepoints like sched_switch. Maps CUDA calls back to specific Python source lines. Provides automated root cause analysis with severity ranking. Operates with less than 2% overhead using a single binary.

What can I use Ingero for?

Diagnosing GPU training stalls in PyTorch models. Identifying hidden kernel launches from cuBLAS or cuDNN. Correlating CPU contention with GPU memory allocation spikes. Automated performance debugging of AI training workloads.

How do I install Ingero?

Install Ingero by running: curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v0.8.2/ingero_0.8.2_linux_amd64.tar.gz" | tar xz && sudo mv ingero /usr/local/bin/

What MCP clients work with Ingero?

Ingero works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep Ingero docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare