MCP Prometheus MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add mcp-prometheus -- uv run mcp_prometheus/main.py
README.md

Prometheus-based monitoring MCP server for system metrics and health checks.

MCP Prometheus ๐Ÿ“ˆ

Prometheus ๊ธฐ๋ฐ˜ ๋ชจ๋‹ˆํ„ฐ๋ง์šฉ MCP ์„œ๋ฒ„์ž…๋‹ˆ๋‹ค. ์—”ํŠธ๋ฆฌํฌ์ธํŠธ๋Š” main.py์ž…๋‹ˆ๋‹ค.

Quick Start ๐Ÿš€

cd d:\MCPTools
uv sync
uv run python mcp_prometheus/main.py

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ๐Ÿงฉ

mcp_prometheus/
  main.py
  core/
    config.py
    runtime.py
    server.py
    time_utils.py
  domain/
    checks.py
  infra/
    prom_client.py
  tools/
    catalog.py
    alerts_runner.py
    checks_runner.py
    promql.py
  utils/
    query_utils.py
    summarize.py

Tools ์š”์•ฝ ๐Ÿ› ๏ธ

Tool ๋ชฉ์  ๋น„๊ณ 
list_checks ๋“ฑ๋ก๋œ ์ฒดํฌ ๋ชฉ๋ก ์กฐํšŒ id, name, description ๋ฐ˜ํ™˜
list_environments ํ™˜๊ฒฝ๋ณ„ Prometheus URL ์กฐํšŒ prod/dev_test/dr
list_servers ์ตœ๊ทผ up ๊ธฐ์ค€ ์„œ๋ฒ„ ๋ชฉ๋ก ์กฐํšŒ (instance, job) ๊ธฐ์ค€ ์ค‘๋ณต ์ œ๊ฑฐ
list_process_groups ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน ๋ชฉ๋ก ์กฐํšŒ process_monitoring ๊ธฐ์ค€
get_alerts Prometheus ํ™œ์„ฑ Alert ์กฐํšŒ /api/v1/alerts ๊ธฐ๋ฐ˜, ๋ผ๋ฒจ/์ƒํƒœ ํ•„ํ„ฐ ์ง€์›
run_check ๋‹จ์ผ ์ฒดํฌ ์‹คํ–‰ ๊ธฐ๋ณธ ๊ถŒ์žฅ
run_all_checks ์ „์ฒด ์ฒดํฌ ๋ณ‘๋ ฌ ์‹คํ–‰ step=5m ๊ณ ์ •
run_promql ์‚ฌ์šฉ์ž PromQL ์ง์ ‘ ์‹คํ–‰ approved=True ํ•„์š”

`run_check` ์ž…๋ ฅ ๊ฐ€์ด๋“œ ๐Ÿงญ

ํ•„์ˆ˜

  • check_id

๊ธฐ๊ฐ„

  • ์ƒ๋Œ€: hours, minutes, days
  • ์ ˆ๋Œ€: start_time_utc_iso, end_time_utc_iso
  • ์ข…๋ฃŒ ์˜คํ”„์…‹: end_offset_minutes, end_offset_hours, end_offset_days

ํƒ€๊ฒŸ ํ•„ํ„ฐ

  • server_name
  • instance (์˜ˆ: host-or-ip:9100)

ํ•„ํ„ฐ ๊ทœ์น™:

  • server_name์™€ instance๋ฅผ ํ•จ๊ป˜ ์ฃผ๋ฉด AND ์ ์šฉ
  • ํ•˜๋‚˜๋งŒ ์ฃผ๋ฉด ํ•ด๋‹น ๋ผ๋ฒจ๋งŒ ์ ์šฉ

`run_promql` ๊ฐ€๋“œ๋ ˆ์ผ ๐Ÿ”’

  • approved=False: ์‹คํ–‰ํ•˜์ง€ ์•Š๊ณ  ํ™•์ธ ๋ฉ”์‹œ์ง€ ๋ฐ˜ํ™˜
  • approved=True: ์‹คํ–‰

๋ชจ๋“œ:

  • instant=True -> /api/v1/query
  • instant=False -> /api/v1/query_range

์‚ฌ์šฉ ์˜ˆ์‹œ ๐Ÿ“Œ

1) ํŠน์ • ์„œ๋ฒ„ CPU ํ‰๊ท  (์ตœ๊ทผ 24์‹œ๊ฐ„)

{
  "check_id": "cpu_avg_pct",
  "hours": 24,
  "instance": "10.23.12.11:9100",
  "environment": "prod"
}

2) ํŠน์ • ์„œ๋ฒ„ ๋””์Šคํฌ ์‚ฌ์šฉ๋ฅ  (mountpoint๋ณ„)

{
  "check_id": "disk_used_pct_by_mount",
  "hours": 24,
  "server_name": "CMS AP #1",
  "environment": "prod"
}

3) ์‚ฌ์šฉ์ž PromQL ์‹คํ–‰ (instant)

{
  "promql": "up",
  "approved": true,
  "instant": true,
  "environment": "prod"
}

CHECKS Catalog โœ…

Source: domain/checks.py (CHECKS)

System / Resource

  • cpu_avg_pct: CPU average usage (%) by instance/server_name
  • cpu_peak_pct: window peak CPU usage (%) over selected range
  • mem_used_pct: memory used ratio (%)
  • mem_swap_used_pct: swap used ratio (%)
  • load15_avg: 15-minute load average
  • cpu_iowait_pct: CPU iowait ratio (%)

Disk / Filesystem

  • disk_used_pct_by_mount: filesystem used (%) by mountpoint/device (0-100 scale)
  • disk_used_top5_pct: top 5 filesystem usage (%)
  • disk_inodes_used_pct: inode usage (%)
  • fs_readonly: readonly filesystem indicator (1=readonly)
  • disk_io_busy_pct: disk I/O busy ratio (%)

Availability

  • up: target liveness (1=up, 0=down)

Network / TCP

  • net_in_bytes: inbound throughput (bytes/sec)
  • net_out_bytes: outbound throughput (bytes/sec)
  • net_errs_per_sec: RX+TX network errors per second
  • tcp_retrans_per_sec: TCP retransmit segments per second
  • tcp_established: established TCP connections
  • tcp_time_wait: TIME_WAIT TCP sockets
  • tcp_inuse: in-use TCP sockets
  • tcp_orphan: orphan TCP sockets

Process Monitoring

  • proc_cpu_pct: process group CPU usage (%)
  • proc_mem_bytes: process group memory usage (bytes)
  • proc_count: process group process count

PostgreSQL

  • pg_up: PostgreSQL exporter up state (1=up, 0=down)
  • pg_qps: PostgreSQL transactions/sec (commit + rollback)
  • pg_cache_hit_pct: PostgreSQL buffer cache hit ratio (%)
  • pg_active_conn: active PostgreSQL connections

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์š”์•ฝ โš™๏ธ

PROM_ENV_URLS={"prod":"http://...:9090","dev_test":"http://...:9090","dr":"http://...:9090"}
PROM_URL=http://...:9090
PROM_BEARER_TOKEN=
PROM_TIMEOUT_SEC=15

ALERT_WARN_PCT=85
ALERT_CRIT_PCT=95
ALERT_SUSTAIN_MINUTES=5

PROM_MAX_SAMPLES_PER_SERIES=5000
PROM_MAX_PARALLEL_CHECKS=6

ํ™˜๊ฒฝ ์„ ํƒ ์šฐ์„ ์ˆœ์œ„:

  1. environment
  2. env_hint
  3. PROM_URL fallback

์šด์˜ ํŒ ๐Ÿ’ก

  • ๋ฆฌํฌํŠธ ์ถœ๋ ฅ ์‹œ % ๋‹จ์œ„๋ฅผ ๋ช…ํ™•ํžˆ ํ‘œ๊ธฐํ•˜์„ธ์š”.
  • ๋‹จ์ผ ์„œ๋ฒ„ ์ ๊ฒ€์€ instance ๋˜๋Š” server_name ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
  • disk_used_pct_by_mount ๊ฐ’์€ 0~100 ์Šค์ผ€์ผ์ž…๋‹ˆ๋‹ค. (0.8 = 0.8%)

Tools (8)

list_checksReturns a list of registered diagnostic checks.
list_environmentsRetrieves Prometheus URLs for different environments.
list_serversLists servers based on recent up status.
list_process_groupsLists monitored process groups.
get_alertsRetrieves active Prometheus alerts with optional filtering.
run_checkExecutes a single diagnostic check.
run_all_checksExecutes all diagnostic checks in parallel.
run_promqlExecutes a custom PromQL query.

Environment Variables

PROM_ENV_URLSJSON map of environment names to Prometheus URLs
PROM_URLDefault Prometheus server URL
PROM_BEARER_TOKENAuthentication token for Prometheus API
PROM_TIMEOUT_SECTimeout in seconds for API requests

Configuration

claude_desktop_config.json
{"mcpServers": {"prometheus": {"command": "uv", "args": ["run", "mcp_prometheus/main.py"], "env": {"PROM_URL": "http://localhost:9090"}}}}

Try it

โ†’Check the CPU usage for the production server instance 10.23.12.11:9100 over the last 24 hours.
โ†’List all active alerts in the production environment.
โ†’Run all diagnostic checks for the current environment and summarize the results.
โ†’Execute a PromQL query to get the current 'up' status for all targets in the dev_test environment.
โ†’What is the disk usage percentage by mountpoint for the CMS AP #1 server?

Frequently Asked Questions

What are the key features of MCP Prometheus?

Query system metrics and resource usage via predefined diagnostic checks. Support for custom PromQL execution with safety guardrails. Multi-environment monitoring support (prod, dev_test, dr). PostgreSQL health monitoring including cache hit ratios and connection stats. Parallel execution of diagnostic checks for efficient troubleshooting.

What can I use MCP Prometheus for?

Quickly diagnosing high CPU or memory usage on specific production instances. Monitoring PostgreSQL database performance and connection health. Automated health reporting across multiple infrastructure environments. Investigating disk space issues by checking mountpoint usage percentages.

How do I install MCP Prometheus?

Install MCP Prometheus by running: cd d:\MCPTools && uv sync && uv run python mcp_prometheus/main.py

What MCP clients work with MCP Prometheus?

MCP Prometheus works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep MCP Prometheus docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare