Prometheus-based monitoring MCP server for system metrics and health checks.
MCP Prometheus ๐
Prometheus ๊ธฐ๋ฐ ๋ชจ๋ํฐ๋ง์ฉ MCP ์๋ฒ์
๋๋ค.
์ํธ๋ฆฌํฌ์ธํธ๋ main.py์
๋๋ค.
Quick Start ๐
cd d:\MCPTools
uv sync
uv run python mcp_prometheus/main.py
ํ๋ก์ ํธ ๊ตฌ์กฐ ๐งฉ
mcp_prometheus/
main.py
core/
config.py
runtime.py
server.py
time_utils.py
domain/
checks.py
infra/
prom_client.py
tools/
catalog.py
alerts_runner.py
checks_runner.py
promql.py
utils/
query_utils.py
summarize.py
Tools ์์ฝ ๐ ๏ธ
| Tool | ๋ชฉ์ | ๋น๊ณ |
|---|---|---|
list_checks |
๋ฑ๋ก๋ ์ฒดํฌ ๋ชฉ๋ก ์กฐํ | id, name, description ๋ฐํ |
list_environments |
ํ๊ฒฝ๋ณ Prometheus URL ์กฐํ | prod/dev_test/dr |
list_servers |
์ต๊ทผ up ๊ธฐ์ค ์๋ฒ ๋ชฉ๋ก ์กฐํ | (instance, job) ๊ธฐ์ค ์ค๋ณต ์ ๊ฑฐ |
list_process_groups |
ํ๋ก์ธ์ค ๊ทธ๋ฃน ๋ชฉ๋ก ์กฐํ | process_monitoring ๊ธฐ์ค |
get_alerts |
Prometheus ํ์ฑ Alert ์กฐํ | /api/v1/alerts ๊ธฐ๋ฐ, ๋ผ๋ฒจ/์ํ ํํฐ ์ง์ |
run_check |
๋จ์ผ ์ฒดํฌ ์คํ | ๊ธฐ๋ณธ ๊ถ์ฅ |
run_all_checks |
์ ์ฒด ์ฒดํฌ ๋ณ๋ ฌ ์คํ | step=5m ๊ณ ์ |
run_promql |
์ฌ์ฉ์ PromQL ์ง์ ์คํ | approved=True ํ์ |
`run_check` ์ ๋ ฅ ๊ฐ์ด๋ ๐งญ
ํ์
check_id
๊ธฐ๊ฐ
- ์๋:
hours,minutes,days - ์ ๋:
start_time_utc_iso,end_time_utc_iso - ์ข
๋ฃ ์คํ์
:
end_offset_minutes,end_offset_hours,end_offset_days
ํ๊ฒ ํํฐ
server_nameinstance(์:host-or-ip:9100)
ํํฐ ๊ท์น:
server_name์instance๋ฅผ ํจ๊ป ์ฃผ๋ฉด AND ์ ์ฉ- ํ๋๋ง ์ฃผ๋ฉด ํด๋น ๋ผ๋ฒจ๋ง ์ ์ฉ
`run_promql` ๊ฐ๋๋ ์ผ ๐
approved=False: ์คํํ์ง ์๊ณ ํ์ธ ๋ฉ์์ง ๋ฐํapproved=True: ์คํ
๋ชจ๋:
instant=True->/api/v1/queryinstant=False->/api/v1/query_range
์ฌ์ฉ ์์ ๐
1) ํน์ ์๋ฒ CPU ํ๊ท (์ต๊ทผ 24์๊ฐ)
{
"check_id": "cpu_avg_pct",
"hours": 24,
"instance": "10.23.12.11:9100",
"environment": "prod"
}
2) ํน์ ์๋ฒ ๋์คํฌ ์ฌ์ฉ๋ฅ (mountpoint๋ณ)
{
"check_id": "disk_used_pct_by_mount",
"hours": 24,
"server_name": "CMS AP #1",
"environment": "prod"
}
3) ์ฌ์ฉ์ PromQL ์คํ (instant)
{
"promql": "up",
"approved": true,
"instant": true,
"environment": "prod"
}
CHECKS Catalog โ
Source:
domain/checks.py(CHECKS)
System / Resource
cpu_avg_pct: CPU average usage (%) by instance/server_namecpu_peak_pct: window peak CPU usage (%) over selected rangemem_used_pct: memory used ratio (%)mem_swap_used_pct: swap used ratio (%)load15_avg: 15-minute load averagecpu_iowait_pct: CPU iowait ratio (%)
Disk / Filesystem
disk_used_pct_by_mount: filesystem used (%) by mountpoint/device (0-100 scale)disk_used_top5_pct: top 5 filesystem usage (%)disk_inodes_used_pct: inode usage (%)fs_readonly: readonly filesystem indicator (1=readonly)disk_io_busy_pct: disk I/O busy ratio (%)
Availability
up: target liveness (1=up, 0=down)
Network / TCP
net_in_bytes: inbound throughput (bytes/sec)net_out_bytes: outbound throughput (bytes/sec)net_errs_per_sec: RX+TX network errors per secondtcp_retrans_per_sec: TCP retransmit segments per secondtcp_established: established TCP connectionstcp_time_wait: TIME_WAIT TCP socketstcp_inuse: in-use TCP socketstcp_orphan: orphan TCP sockets
Process Monitoring
proc_cpu_pct: process group CPU usage (%)proc_mem_bytes: process group memory usage (bytes)proc_count: process group process count
PostgreSQL
pg_up: PostgreSQL exporter up state (1=up, 0=down)pg_qps: PostgreSQL transactions/sec (commit + rollback)pg_cache_hit_pct: PostgreSQL buffer cache hit ratio (%)pg_active_conn: active PostgreSQL connections
ํ๊ฒฝ ๋ณ์ ์์ฝ โ๏ธ
PROM_ENV_URLS={"prod":"http://...:9090","dev_test":"http://...:9090","dr":"http://...:9090"}
PROM_URL=http://...:9090
PROM_BEARER_TOKEN=
PROM_TIMEOUT_SEC=15
ALERT_WARN_PCT=85
ALERT_CRIT_PCT=95
ALERT_SUSTAIN_MINUTES=5
PROM_MAX_SAMPLES_PER_SERIES=5000
PROM_MAX_PARALLEL_CHECKS=6
ํ๊ฒฝ ์ ํ ์ฐ์ ์์:
environmentenv_hintPROM_URLfallback
์ด์ ํ ๐ก
- ๋ฆฌํฌํธ ์ถ๋ ฅ ์
%๋จ์๋ฅผ ๋ช ํํ ํ๊ธฐํ์ธ์. - ๋จ์ผ ์๋ฒ ์ ๊ฒ์
instance๋๋server_nameํํฐ๋ฅผ ์ฌ์ฉํ์ธ์. disk_used_pct_by_mount๊ฐ์ 0~100 ์ค์ผ์ผ์ ๋๋ค. (0.8=0.8%)
Tools (8)
list_checksReturns a list of registered diagnostic checks.list_environmentsRetrieves Prometheus URLs for different environments.list_serversLists servers based on recent up status.list_process_groupsLists monitored process groups.get_alertsRetrieves active Prometheus alerts with optional filtering.run_checkExecutes a single diagnostic check.run_all_checksExecutes all diagnostic checks in parallel.run_promqlExecutes a custom PromQL query.Environment Variables
PROM_ENV_URLSJSON map of environment names to Prometheus URLsPROM_URLDefault Prometheus server URLPROM_BEARER_TOKENAuthentication token for Prometheus APIPROM_TIMEOUT_SECTimeout in seconds for API requestsConfiguration
{"mcpServers": {"prometheus": {"command": "uv", "args": ["run", "mcp_prometheus/main.py"], "env": {"PROM_URL": "http://localhost:9090"}}}}