bare-metal talos · gitops by argocd · solo enterprise agentgateway v2026.6.3
This page is the live map of the lab: what's deployed and in what order, where the endpoints are, which CRDs do the work, and a replayable session where a $-aware budget blocks real OpenAI traffic with a 429. Everything here is the actual config from the repo and actual captured output from the cluster.
Everything lands via GitOps: edit a manifest, push to main, ArgoCD
auto-syncs with prune + self-heal. Manual kubectl edits get reverted on purpose.
Control plane + 4 gateways (LLM proxy, DGX Spark, Grok, virtual MCP) with ext-auth, rate-limiter, and ext-cache extensions.
Agent controller + kmcp. Every ModelConfig routes through the gateway — agents never hold real provider keys.
One shared dashboard for both products: kagent at /, AgentGateway at /age/. Autoauth IdP built in.
LLM observability with two isolated projects: gateway traces (via otel-fanout-collector) land in the agentgateway project; kagent agent-runtime traces (via kagent-otel-collector) land in a separate kagent project. Both also feed the Solo ClickHouse dashboards.
All provider keys live in Vault, synced to K8s Secrets by ESO. Dev-mode Vault: wiped on reboot, re-seeded by one script.
Demo MCP servers (everything, website-fetcher) behind a virtual MCP gateway, plus a drone MCP server on its own dedicated /drone endpoint.
Four kagent agents fly the same live Tello/RMTT drone, one per agentgateway MCP tool mode — Standard (/drone), Search, Code, and CodeSearch. The MCP server runs a 5s keepalive (telemetry stays live + auto-reconnects), reports connected/stale freshness, exposes a live get_battery, and does selective retry on flaky WiFi. All ship the drone-operations skill + canned prompts. Flight deck →
Qwen3.6-35B served locally, fronted by its own gateway at /spark.
Dollar- and token-denominated spend limits with Audit/Block enforcement. Demo below trips a real 429.
Three views of the same platform: how LLM traffic moves, how telemetry and secrets flow, and how Git becomes cluster state. Hover a box to spot it; everything shown is running.
1 · llm traffic — one gateway per concern
2 · observability & secrets — one collector, one vault
3 · gitops delivery — the only write path
The wave numbers aren't decoration: they're the actual argocd.argoproj.io/sync-wave
annotations that sequence the platform. Click a wave.
Bare-metal cluster: LoadBalancers never get an external IP, so everything is a NodePort
on the worker (172.16.10.155). Plain HTTP — lab only. Heads-up: the worker is on
DHCP, so verify with kubectl get nodes -o wide if a URL goes dead.
| Service | URL | Notes |
|---|---|---|
| Solo UI | http://172.16.10.155:30854 | kagent at / · AgentGateway at /age/ |
| OpenAI proxy | http://172.16.10.155:30160/openai | gpt-5.5 via AgentgatewayBackend |
| Budget demo | http://172.16.10.155:30160/budget-demo | same backend, budget-enforced route |
| DGX Spark | http://172.16.10.155:31944/spark | local Qwen3.6-35B (vLLM) |
| xAI Grok | http://172.16.10.155:31397/grok | grok-4.3, TLS to api.x.ai |
| Virtual MCP | http://172.16.10.155:31606/mcp | MCP aggregation gateway (demo servers) |
| Drone MCP — Standard | http://172.16.10.155:31606/drone | all 15 drone tools directly |
| Drone MCP — Search | http://172.16.10.155:31606/drone-search | meta-tools get_tool · invoke_tool |
| Drone MCP — Code | http://172.16.10.155:31606/drone-code | single run_code (JS sandbox, 15s) |
| Drone MCP — CodeSearch | http://172.16.10.155:31606/drone-codesearch | get_tool + run_code |
| Langfuse | http://172.16.10.155:30300 | traces + cost analytics |
| Vault UI | http://172.16.10.155:31495 | dev mode, token root |
# OpenAI (cloud) — through the gateway curl -s http://172.16.10.155:30160/openai/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}' # xAI Grok curl -s http://172.16.10.155:31397/grok/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"grok-4.3","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}' # DGX Spark — local Qwen via vLLM curl -s http://172.16.10.155:31944/spark/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"Qwen/Qwen3.6-35B-A3B-FP8","messages":[{"role":"user","content":"say ok"}],"max_tokens":5}'
The full running configuration of each gateway, straight from config/:
what it listens on, which HTTPRoute rules attach, where the backend points, and which
policies apply. Pick a gateway.
The custom resources that make the platform go — each with the real manifest from this repo and the command to inspect it. The two budget kinds shipped in v2026.6.3.
Two EnterpriseAgentgatewayBudget entries enforce on the isolated
/budget-demo route (so tripping it can't touch kagent's default model on
/openai). The terminal replays the real session, captured 2026-07-02.
Realized USD spend, computed per request from the model cost catalog. Logs when exceeded — never blocks. Roll out budgets in this mode first.
The circuit breaker. Counts input + output tokens; once spent, the gateway returns 429 at admission until spend ages out of the rolling 24-hour window.
how a request clears (or doesn't clear) the budget
info llm::cost loaded model catalog providers=3 models=80 info request route=agentgateway-system/budget-demo http.status=200 gen_ai.request.model=gpt-4o gen_ai.usage.input_tokens=21 gen_ai.usage.output_tokens=683 agw.ai.usage.cost.total=0.0068825 warn budget budget exceeded; blocking... budget_id=agentgateway-system/budget-demo/demo-token-block budget_action="BLOCK" budget_unit="TOKENS" budget_limit=2000 budget_window="DAILY" phase="admission" outcome="over_limit_block"
Battle-tested on real incidents. Full detail lives in CLAUDE.md.
Vault runs dev-mode with in-memory storage — every restart erases it. The synced K8s Secrets survive, and the script re-seeds Vault from them. Never regenerate Langfuse creds while the databases hold data keyed to the old ones.
./scripts/configure-vault.sh kubectl annotate clustersecretstore vault force-sync=$(date +%s) --overwrite kubectl annotate externalsecret -n agentgateway-system \ openai-secret xai-secret langfuse-otel-auth force-sync=$(date +%s) --overwrite kubectl annotate externalsecret -n langfuse langfuse-secrets force-sync=$(date +%s) --overwrite
Three causes seen here: two apps owning the same resources (fixed with directory.exclude), a lone zero-value helm field ArgoCD normalizes away, and client-side diff false-positives under ServerSideApply (fixed with compare-options: ServerSideDiff=true). Diagnose with the CLI in core mode:
# kubeconfig namespace must be argocd; plain --core can lie — use --server-side-generate
argocd app diff argocd/<app> --core --server-side-generatekubectl -n argocd annotate app <name> argocd.argoproj.io/refresh=hard --overwrite
Almost certainly ENABLE_MOCK_UI=true on the shared frontend — it renders sample data for both products and hides the real thing. Keep it false; the data is safe in Postgres (kagent) and ClickHouse (AgentGateway). Hard-refresh the browser after reverting.
resourcesPreset: small (768Mi) is not enough. Set explicit ClickHouse resources (request 1Gi / limit 3Gi) in langfuse.yaml, then delete the stuck langfuse-web pod to skip its backoff.
The Tello leaves SDK mode (stops streaming state) ~15s after its last command, so a server that only sends command once freezes on the last reading. Fixed with a 5s keepalive that re-sends command — telemetry stays live and the drone auto-reconnects when powered back on (no pod restart). get_state reports connected/stale/state_age_s so a dead stream never masquerades as live; get_battery actively queries for a guaranteed-current reading. If it doesn't reconnect, the drone likely came back on a different IP — set a DHCP reservation pinning its MAC to 172.16.10.168. Flaky WiFi: idempotent commands (keepalive, battery?, land, emergency) retry ~2×; movement never retries (a timed-out move may have already run — retrying would double-move).
kagent OTEL tracing is OFF by default (otel.tracing.enabled=false) — AGE tracing is a separate gateway-policy pipeline, which is why one works and the other doesn't. Enable otel.tracing in kagent-enterprise.yaml pointed at kagent-otel-collector, refresh the root app-of-apps, then kubectl -n kagent rollout restart deploy/kagent-controller (it re-renders agent pods with the OTEL env). The config lands in the kagent-controller ConfigMap.
langfuse-clickhouse PVC is 100% full — the worker drops every write (Cannot reserve … not enough space). Expand the PVC; if it stalls at Resizing with cannot expand volume before replica scheduling success, this single-node cluster's Longhorn volume is degraded at 3 replicas — patch volume.longhorn.io <id> numberOfReplicas: 1 so it goes healthy, then set spec.size. Don't bump the chart persistence.size (immutable StatefulSet template → wedges the ArgoCD sync).