RootCause MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
git clone https://github.com/yindia/rootcause
cd rootcause

Then follow the repository README for any remaining dependency or build steps before continuing.

2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add rootcause -- node "<FULL_PATH_TO_ROOTCAUSE>/dist/index.js"

Replace <FULL_PATH_TO_ROOTCAUSE>/dist/index.js with the actual folder you prepared in step 1.

README.md

AI-native SRE for Kubernetes incidents.

RootCause 🧭

AI-native SRE for Kubernetes incidents.

RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.

Built in Go as a single binary, RootCause is optimized for low-friction local workflows using your existing kubeconfig identity.


🚀 Quick Start | 🌐 Client Setup | 🛠️ Tools | 🧩 Skills | 🔒 Safety | ⚙️ Config | 🏗️ Architecture | 🤝 Contributing


Why RootCause 💡

RootCause is built for SRE/operator workflows where speed matters, but unsafe automation is unacceptable.

  • 🚀 Stop context-switching: investigate incidents, rollout risk, Helm/Terraform/AWS signals, and remediation from one MCP server.
  • 🧠 AI-powered diagnostics: evidence-first analysis with RCA, timelines, and action-oriented next checks.
  • 💸 Built-in cost optimization: combine resource usage, workload best-practice checks, Terraform plan analysis, and cloud context for optimization decisions.
  • 🔒 Enterprise-ready guardrails: role/namespace policy enforcement, redaction, read-only mode, destructive tool controls, and mutation preflight.
  • ⚡ Zero learning curve: ask natural-language operational questions and use provided prompt templates for common SRE flows.
  • 🌐 Universal compatibility: works with MCP-compatible clients across Claude, Cursor, Copilot, Codex, and more.
  • 🏭 Production-grade workflow: single Go binary, kubeconfig-native auth, deterministic structured outputs, and broad test coverage.

Why teams choose it

Need RootCause answer
"What changed and why did this break?" rootcause.incident_bundle, rootcause.change_timeline, rootcause.rca_generate
"Is it safe to restart or roll out now?" k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight
"Is my platform ecosystem healthy?" k8s.*_detect + k8s.diagnose_* for ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium
"Can I standardize SRE responses?" Prompt templates + structured output from shared render/evidence pipeline

What Can You Do?

Ask your AI assistant in natural language:

  • "Why did this deployment fail after rollout?"
  • "Is this workload safe to restart right now?"
  • "Why are ArgoCD apps out of sync?"
  • "Is Flux healthy in this cluster?"
  • "Why are certs failing to renew?"
  • "Before patch/apply, is this mutation safe?"

RootCause keeps its depth-first model: evidence-first diagnosis, root-cause analysis, and remediation flow instead of raw tool sprawl.

Power users can map these prompts to concrete tools in this README (Complete Feature Set, Toolchains, and Tools sections).

Use Cases

Incident response

  • Build end-to-end incident evidence with rootcause.incident_bundle
  • Generate probable causes with rootcause.rca_generate
  • Export timeline and postmortem artifacts for follow-up

Safe operations before mutation

  • Evaluate rollout/restart risk with k8s.restart_safety_check and k8s.best_practice
  • Run k8s.safe_mutation_preflight before apply/patch/delete/scale operations

Ecosystem-specific health checks

  • ArgoCD: detect installation and diagnose sync/health drift
  • Flux: detect controllers and diagnose reconciliation failures
  • cert-manager / Kyverno / Gatekeeper / Cilium: detect footprint and diagnose control-plane or policy issues

Feature Highlights

Area RootCause Capability
Incident analysis rootcause.incident_bundle, rootcause.rca_generate, rootcause.change_timeline, rootcause.postmortem_export, rootcause.capabilities
Kubernetes resilience k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight
Ecosystem diagnostics ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium via *_detect and diagnose_* tools
Deployment safety Automatic preflight before k8s mutating operations
Helm operations Chart search/list/get, release diff, rollback advisor, template apply/uninstall flows
Terraform analysis Module/provider search + terraform.debug_plan for impact/risk analysis
Service mesh & scaling Linkerd/Istio/Karpenter diagnostics with shared evidence model

Complete Feature Set

Category Representative capabilities
Kubernetes core (k8s.*) CRUD, logs/events, graph-based debug flows, restart safety, best-practice scoring, mutation preflight
Ecosystem diagnostics ArgoCD, Flux, cert-manager, Kyverno, Gatekeeper, Cilium via `*_detec

Tools (5)

rootcause.incident_bundleBuilds an end-to-end incident evidence bundle.
rootcause.rca_generateGenerates probable causes for an incident.
k8s.restart_safety_checkEvaluates the risk of restarting a workload.
k8s.safe_mutation_preflightRuns a safety check before applying, patching, or deleting resources.
terraform.debug_planAnalyzes a Terraform plan for impact and risk.

Environment Variables

KUBECONFIGPath to the kubeconfig file for cluster access.

Configuration

claude_desktop_config.json
{"mcpServers": {"rootcause": {"command": "rootcause"}}}

Try it

Why did this deployment fail after rollout?
Is this workload safe to restart right now?
Why are ArgoCD apps out of sync?
Before I apply this change, is this mutation safe?
Why are certs failing to renew?

Frequently Asked Questions

What are the key features of RootCause?

Evidence-first incident analysis and root-cause generation. Kubernetes mutation preflight checks for safe operations. Ecosystem diagnostics for ArgoCD, Flux, cert-manager, and service meshes. Built-in cost optimization and best-practice scoring. Kubeconfig-native authentication for secure local workflows.

What can I use RootCause for?

Automating incident evidence collection and postmortem artifact generation. Validating rollout and restart safety before performing cluster mutations. Diagnosing reconciliation failures in GitOps controllers like ArgoCD and Flux. Analyzing Terraform plans to identify potential infrastructure risks. Standardizing SRE responses using prompt templates and structured diagnostic outputs.

How do I install RootCause?

Install RootCause by running: go install github.com/yindia/rootcause@latest

What MCP clients work with RootCause?

RootCause works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep RootCause docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare