bare-metal talos · gitops by argocd · solo enterprise agentgateway v2026.6.3

Every LLM request through one gateway — traced, priced, and budgeted.

This page is the live map of the lab: what's deployed and in what order, where the endpoints are, which CRDs do the work, and a replayable session where a $-aware budget blocks real OpenAI traffic with a 429. Everything here is the actual config from the repo and actual captured output from the cluster.

13 ArgoCD apps · Synced 3 LLM providers 2 Talos nodes $0.02 total demo spend

#What's deployed

Everything lands via GitOps: edit a manifest, push to main, ArgoCD auto-syncs with prune + self-heal. Manual kubectl edits get reverted on purpose.

AgentGateway Enterprise

agentgateway-system · v2026.6.3

Control plane + 4 gateways (LLM proxy, DGX Spark, Grok, virtual MCP) with ext-auth, rate-limiter, and ext-cache extensions.

kagent Enterprise

kagent · 0.4.7

Agent controller + kmcp. Every ModelConfig routes through the gateway — agents never hold real provider keys.

Solo UI

agentgateway-system · management 0.4.7

One shared dashboard for both products: kagent at /, AgentGateway at /age/. Autoauth IdP built in.

Langfuse

langfuse · 1.5.35

LLM observability with two isolated projects: gateway traces (via otel-fanout-collector) land in the agentgateway project; kagent agent-runtime traces (via kagent-otel-collector) land in a separate kagent project. Both also feed the Solo ClickHouse dashboards.

Vault + External Secrets

vault / external-secrets

All provider keys live in Vault, synced to K8s Secrets by ESO. Dev-mode Vault: wiped on reboot, re-seeded by one script.

MCP servers

agentgateway-system

Demo MCP servers (everything, website-fetcher) behind a virtual MCP gateway, plus a drone MCP server on its own dedicated /drone endpoint.

Drone agents NEW

kagent · gpt-5.5

Four kagent agents fly the same live Tello/RMTT drone, one per agentgateway MCP tool mode — Standard (/drone), Search, Code, and CodeSearch. The MCP server runs a 5s keepalive (telemetry stays live + auto-reconnects), reports connected/stale freshness, exposes a live get_battery, and does selective retry on flaky WiFi. All ship the drone-operations skill + canned prompts. Flight deck →

Local LLM — DGX Spark

172.16.10.173:8000 · vLLM

Qwen3.6-35B served locally, fronted by its own gateway at /spark.

AI Budgets NEW

agentgateway-system · v2026.6.3

Dollar- and token-denominated spend limits with Audit/Block enforcement. Demo below trips a real 429.

#Reference architecture

Three views of the same platform: how LLM traffic moves, how telemetry and secrets flow, and how Git becomes cluster state. Hover a box to spot it; everything shown is running.

1 · llm traffic — one gateway per concern

2 · observability & secrets — one collector, one vault

3 · gitops delivery — the only write path

#Deployment order — ArgoCD sync waves

The wave numbers aren't decoration: they're the actual argocd.argoproj.io/sync-wave annotations that sequence the platform. Click a wave.

#Endpoints

Bare-metal cluster: LoadBalancers never get an external IP, so everything is a NodePort on the worker (172.16.10.155). Plain HTTP — lab only. Heads-up: the worker is on DHCP, so verify with kubectl get nodes -o wide if a URL goes dead.

ServiceURLNotes
Solo UIhttp://172.16.10.155:30854kagent at / · AgentGateway at /age/
OpenAI proxyhttp://172.16.10.155:30160/openaigpt-5.5 via AgentgatewayBackend
Budget demohttp://172.16.10.155:30160/budget-demosame backend, budget-enforced route
DGX Sparkhttp://172.16.10.155:31944/sparklocal Qwen3.6-35B (vLLM)
xAI Grokhttp://172.16.10.155:31397/grokgrok-4.3, TLS to api.x.ai
Virtual MCPhttp://172.16.10.155:31606/mcpMCP aggregation gateway (demo servers)
Drone MCP — Standardhttp://172.16.10.155:31606/droneall 15 drone tools directly
Drone MCP — Searchhttp://172.16.10.155:31606/drone-searchmeta-tools get_tool · invoke_tool
Drone MCP — Codehttp://172.16.10.155:31606/drone-codesingle run_code (JS sandbox, 15s)
Drone MCP — CodeSearchhttp://172.16.10.155:31606/drone-codesearchget_tool + run_code
Langfusehttp://172.16.10.155:30300traces + cost analytics
Vault UIhttp://172.16.10.155:31495dev mode, token root
smoke-test — one completion per provider
# OpenAI (cloud) — through the gateway
curl -s http://172.16.10.155:30160/openai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}'

# xAI Grok
curl -s http://172.16.10.155:31397/grok/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"grok-4.3","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}'

# DGX Spark — local Qwen via vLLM
curl -s http://172.16.10.155:31944/spark/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"Qwen/Qwen3.6-35B-A3B-FP8","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}'

#Gateway configs — listeners, routes, rules

The full running configuration of each gateway, straight from config/: what it listens on, which HTTPRoute rules attach, where the backend points, and which policies apply. Pick a gateway.

#CRD explorer

The custom resources that make the platform go — each with the real manifest from this repo and the command to inspect it. The two budget kinds shipped in v2026.6.3.

#The budget demo — watch money say no

Two EnterpriseAgentgatewayBudget entries enforce on the isolated /budget-demo route (so tripping it can't touch kagent's default model on /openai). The terminal replays the real session, captured 2026-07-02.

demo-usd-audit · Audit
$5/day

Realized USD spend, computed per request from the model cost catalog. Logs when exceeded — never blocks. Roll out budgets in this mode first.

demo-token-block · Block
2,000 tokens/day

The circuit breaker. Counts input + output tokens; once spent, the gateway returns 429 at admission until spend ages out of the rolling 24-hour window.

how a request clears (or doesn't clear) the budget

sebbycorp@lab — /budget-demo session (real output)

  
proxy log — the block, and the per-request USD cost
info  llm::cost  loaded model catalog  providers=3 models=80
info  request route=agentgateway-system/budget-demo http.status=200
      gen_ai.request.model=gpt-4o gen_ai.usage.input_tokens=21 gen_ai.usage.output_tokens=683
      agw.ai.usage.cost.total=0.0068825
warn  budget  budget exceeded; blocking...
      budget_id=agentgateway-system/budget-demo/demo-token-block
      budget_action="BLOCK" budget_unit="TOKENS" budget_limit=2000
      budget_window="DAILY" phase="admission" outcome="over_limit_block"

#Runbook — the commands that fix this lab

Battle-tested on real incidents. Full detail lives in CLAUDE.md.

Vault wiped after a node reboot (ClusterSecretStore Degraded)

Vault runs dev-mode with in-memory storage — every restart erases it. The synced K8s Secrets survive, and the script re-seeds Vault from them. Never regenerate Langfuse creds while the databases hold data keyed to the old ones.

recover
./scripts/configure-vault.sh
kubectl annotate clustersecretstore vault force-sync=$(date +%s) --overwrite
kubectl annotate externalsecret -n agentgateway-system \
  openai-secret xai-secret langfuse-otel-auth force-sync=$(date +%s) --overwrite
kubectl annotate externalsecret -n langfuse langfuse-secrets force-sync=$(date +%s) --overwrite
App syncs Succeeded but stays OutOfSync forever

Three causes seen here: two apps owning the same resources (fixed with directory.exclude), a lone zero-value helm field ArgoCD normalizes away, and client-side diff false-positives under ServerSideApply (fixed with compare-options: ServerSideDiff=true). Diagnose with the CLI in core mode:

diagnose
# kubeconfig namespace must be argocd; plain --core can lie — use --server-side-generate
argocd app diff argocd/<app> --core --server-side-generate
Force a sync after pushing
hard refresh
kubectl -n argocd annotate app <name> argocd.argoproj.io/refresh=hard --overwrite
"All my data is gone" in both UIs

Almost certainly ENABLE_MOCK_UI=true on the shared frontend — it renders sample data for both products and hides the real thing. Keep it false; the data is safe in Postgres (kagent) and ClickHouse (AgentGateway). Hard-refresh the browser after reverting.

Langfuse down — ClickHouse OOMKilled

resourcesPreset: small (768Mi) is not enough. Set explicit ClickHouse resources (request 1Gi / limit 3Gi) in langfuse.yaml, then delete the stuck langfuse-web pod to skip its backoff.

Drone shows a stale battery / disconnect & reconnect after charging

The Tello leaves SDK mode (stops streaming state) ~15s after its last command, so a server that only sends command once freezes on the last reading. Fixed with a 5s keepalive that re-sends command — telemetry stays live and the drone auto-reconnects when powered back on (no pod restart). get_state reports connected/stale/state_age_s so a dead stream never masquerades as live; get_battery actively queries for a guaranteed-current reading. If it doesn't reconnect, the drone likely came back on a different IP — set a DHCP reservation pinning its MAC to 172.16.10.168. Flaky WiFi: idempotent commands (keepalive, battery?, land, emergency) retry ~2×; movement never retries (a timed-out move may have already run — retrying would double-move).

kagent dashboard / tracing is empty (but AgentGateway tracing works)

kagent OTEL tracing is OFF by default (otel.tracing.enabled=false) — AGE tracing is a separate gateway-policy pipeline, which is why one works and the other doesn't. Enable otel.tracing in kagent-enterprise.yaml pointed at kagent-otel-collector, refresh the root app-of-apps, then kubectl -n kagent rollout restart deploy/kagent-controller (it re-renders agent pods with the OTEL env). The config lands in the kagent-controller ConfigMap.

Langfuse accepts traces but none show up (any project)

langfuse-clickhouse PVC is 100% full — the worker drops every write (Cannot reserve … not enough space). Expand the PVC; if it stalls at Resizing with cannot expand volume before replica scheduling success, this single-node cluster's Longhorn volume is degraded at 3 replicas — patch volume.longhorn.io <id> numberOfReplicas: 1 so it goes healthy, then set spec.size. Don't bump the chart persistence.size (immutable StatefulSet template → wedges the ArgoCD sync).