AI-native SRE for Kubernetes incidents.
RootCause 🧭
AI-native SRE for Kubernetes incidents.
RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.
Built in Go as a single binary, RootCause is optimized for low-friction local workflows using your existing kubeconfig identity.
🚀 Quick Start | 🌐 Client Setup | 🛠️ Tools | 🧩 Skills | 🔒 Safety | ⚙️ Config | 🏗️ Architecture | 🤝 Contributing
Why RootCause 💡
RootCause is built for SRE/operator workflows where speed matters, but unsafe automation is unacceptable.
- 🚀 Stop context-switching: investigate incidents, rollout risk, Helm/Terraform/AWS signals, and remediation from one MCP server.
- 🧠 AI-powered diagnostics: evidence-first analysis with RCA, timelines, and action-oriented next checks.
- 💸 Built-in cost optimization: combine resource usage, workload best-practice checks, Terraform plan analysis, and cloud context for optimization decisions.
- 🔒 Enterprise-ready guardrails: role/namespace policy enforcement, redaction, read-only mode, destructive tool controls, and mutation preflight.
- ⚡ Zero learning curve: ask natural-language operational questions and use provided prompt templates for common SRE flows.
- 🌐 Universal compatibility: works with MCP-compatible clients across Claude, Cursor, Copilot, Codex, and more.
- 🏭 Production-grade workflow: single Go binary, kubeconfig-native auth, deterministic structured outputs, and broad test coverage.
Why teams choose it
| Need | RootCause answer |
|---|---|
| "What changed and why did this break?" | rootcause.incident_bundle, rootcause.change_timeline, rootcause.rca_generate |
| "Is it safe to restart or roll out now?" | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight |
| "Is my platform ecosystem healthy?" | k8s.*_detect + k8s.diagnose_* for ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium |
| "Can I standardize SRE responses?" | Prompt templates + structured output from shared render/evidence pipeline |
What Can You Do?
Ask your AI assistant in natural language:
- "Why did this deployment fail after rollout?"
- "Is this workload safe to restart right now?"
- "Why are ArgoCD apps out of sync?"
- "Is Flux healthy in this cluster?"
- "Why are certs failing to renew?"
- "Before patch/apply, is this mutation safe?"
RootCause keeps its depth-first model: evidence-first diagnosis, root-cause analysis, and remediation flow instead of raw tool sprawl.
Power users can map these prompts to concrete tools in this README (Complete Feature Set, Toolchains, and Tools sections).
Use Cases
Incident response
- Build end-to-end incident evidence with
rootcause.incident_bundle - Generate probable causes with
rootcause.rca_generate - Export timeline and postmortem artifacts for follow-up
Safe operations before mutation
- Evaluate rollout/restart risk with
k8s.restart_safety_checkandk8s.best_practice - Run
k8s.safe_mutation_preflightbefore apply/patch/delete/scale operations
Ecosystem-specific health checks
- ArgoCD: detect installation and diagnose sync/health drift
- Flux: detect controllers and diagnose reconciliation failures
- cert-manager / Kyverno / Gatekeeper / Cilium: detect footprint and diagnose control-plane or policy issues
Feature Highlights
| Area | RootCause Capability |
|---|---|
| Incident analysis | rootcause.incident_bundle, rootcause.rca_generate, rootcause.change_timeline, rootcause.postmortem_export, rootcause.capabilities |
| Kubernetes resilience | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight |
| Ecosystem diagnostics | ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium via *_detect and diagnose_* tools |
| Deployment safety | Automatic preflight before k8s mutating operations |
| Helm operations | Chart search/list/get, release diff, rollback advisor, template apply/uninstall flows |
| Terraform analysis | Module/provider search + terraform.debug_plan for impact/risk analysis |
| Service mesh & scaling | Linkerd/Istio/Karpenter diagnostics with shared evidence model |
Complete Feature Set
| Category | Representative capabilities |
|---|---|
Kubernetes core (k8s.*) |
CRUD, logs/events, graph-based debug flows, restart safety, best-practice scoring, mutation preflight |
| Ecosystem diagnostics | ArgoCD, Flux, cert-manager, Kyverno, Gatekeeper, Cilium via `*_detec |
Tools (5)
rootcause.incident_bundleBuilds an end-to-end incident evidence bundle.rootcause.rca_generateGenerates probable causes for an incident.k8s.restart_safety_checkEvaluates the risk of restarting a workload.k8s.safe_mutation_preflightRuns a safety check before applying, patching, or deleting resources.terraform.debug_planAnalyzes a Terraform plan for impact and risk.Environment Variables
KUBECONFIGPath to the kubeconfig file for cluster access.Configuration
{"mcpServers": {"rootcause": {"command": "rootcause"}}}