Observability
This page focuses solely on collecting and visualizing metrics for Semantic Router using Prometheus and Grafanaβdeployment method (Docker Compose vs Kubernetes) is covered in docker-quickstart.md.
1. Metrics & Endpoints Summaryβ
| Component | Endpoint | Notes |
|---|---|---|
| Router metrics | :9190/metrics | Prometheus format (flag: --metrics-port) |
| Router health (future probe) | :8080/health | HTTP readiness/liveness candidate |
| Envoy metrics (optional) | :19000/stats/prometheus | If you enable Envoy |
Dashboard JSON: deploy/llm-router-dashboard.json.
Primary source file exposing metrics: src/semantic-router/cmd/main.go (uses promhttp).
2. Docker Compose Observabilityβ
Compose bundles: prometheus, grafana, semantic-router, (optional) envoy, mock-vllm.
Key files:
config/prometheus.yamlconfig/grafana/datasource.yamlconfig/grafana/dashboards.yamldeploy/llm-router-dashboard.json
Start (with testing profile example):
CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build
Access:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
Expected Prometheus targets:
semantic-router:9190envoy-proxy:19000(optional)
3. Kubernetes Observabilityβ
This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
Namespace β All manifests default to the
vllm-semantic-router-systemnamespace to match the core deployment. Override it with Kustomize if you use a different namespace.
What Gets Installedβ
| Component | Purpose | Key Files |
|---|---|---|
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | prometheus/ (rbac.yaml, configmap.yaml, deployment.yaml, pvc.yaml, service.yaml) |
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | grafana/ (secret.yaml, configmap-*.yaml, deployment.yaml, pvc.yaml, service.yaml) |
| Ingress (optional) | Exposes the UIs outside the cluster | ingress.yaml |
| Dashboard provisioning | Automatically loads deploy/llm-router-dashboard.json into Grafana | grafana/configmap-dashboard.yaml |
Prometheus is configured to discover the semantic-router-metrics service (port 9190) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.
1. Prerequisitesβ
- Deployed Semantic Router workload via
deploy/kubernetes/ - A Kubernetes cluster (managed, on-prem, or kind)
kubectlv1.23+- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access
2. Directory Layoutβ
deploy/kubernetes/observability/
βββ README.md
βββ kustomization.yaml # (created in the next step)
βββ ingress.yaml # optional HTTPS ingress examples
βββ prometheus/
β βββ configmap.yaml # Scrape config (Kubernetes SD)
β βββ deployment.yaml
β βββ pvc.yaml
β βββ rbac.yaml # SA + ClusterRole + binding
β βββ service.yaml
βββ grafana/
βββ configmap-dashboard.yaml # Bundled LLM router dashboard
βββ configmap-provisioning.yaml # Datasource + provider config
βββ deployment.yaml
βββ pvc.yaml
βββ secret.yaml # Admin credentials (override in prod)
βββ service.yaml
3. Prometheus Configuration Highlightsβ
- Uses
kubernetes_sd_configsto enumerate endpoints invllm-semantic-router-system - Keeps 15 days of metrics by default (
--storage.tsdb.retention.time=15d) - Stores metrics in a
PersistentVolumeClaimnamedprometheus-data - RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices
Scrape configuration snippetβ
scrape_configs:
- job_name: semantic-router
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- vllm-semantic-router-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: semantic-router-metrics
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: metrics
action: keep
Modify the namespace or service name if you changed them in your primary deployment.
4. Grafana Configuration Highlightsβ
- Stateful deployment backed by the
grafana-storagePVC - Datasource provisioned automatically pointing to
http://prometheus:9090 - Dashboard provider watches
/var/lib/grafana-dashboards - Bundled
llm-router-dashboard.jsonis identical todeploy/llm-router-dashboard.json - Admin credentials pulled from the
grafana-adminsecret (defaultadmin/adminβ change this!)
Updating credentialsβ
kubectl create secret generic grafana-admin \
--namespace vllm-semantic-router-system \
--from-literal=admin-user=monitor \
--from-literal=admin-password='pick-a-strong-password' \
--dry-run=client -o yaml | kubectl apply -f -
Remove or overwrite the committed secret.yaml when you adopt a different secret management approach.
5. Deployment Stepsβ
5.1. Create the Kustomizationβ
Create deploy/kubernetes/observability/kustomization.yaml (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.
5.2. Apply manifestsβ
kubectl apply -k deploy/kubernetes/observability/
Verify pods:
kubectl get pods -n vllm-semantic-router-system
You should see prometheus-... and grafana-... pods in Running state.
5.3. Integration with the core deploymentβ
-
Deploy or update Semantic Router (
kubectl apply -k deploy/kubernetes/). -
Deploy observability stack (
kubectl apply -k deploy/kubernetes/observability/). -
Confirm the metrics service (
semantic-router-metrics) has endpoints:kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system -
Prometheus target should transition to UP within ~15 seconds.
5.4. Accessing the UIsβ
Optional Ingress β If you prefer to keep the stack private, delete
ingress.yamlfromkustomization.yamlbefore applying.
-
Port-forward (quick check)
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-systemPrometheus β http://localhost:9090, Grafana β http://localhost:3000
-
Ingress (production) β Customize
ingress.yamlwith real domains, TLS secrets, and your ingress class before applying. Replace*.example.comand configure HTTPS certificates via cert-manager or your provider.
6. Verifying Metrics Collectionβ
- Open Prometheus (port-forward or ingress) β Status βΈ Targets β ensure
semantic-routerjob is green. - Query
rate(llm_model_completion_tokens_total[5m])β should return data after traffic. - Open Grafana, log in with the admin credentials, and confirm the LLM Router Metrics dashboard exists under the Semantic Router folder.
- Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
- Prompt Category counts
- Token usage rate per model
- Routing modifications between models
- Latency histograms (TTFT, completion p95)
7. Dashboard Customizationβ
- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
- Update Grafana provisioning (
grafana/configmap-provisioning.yaml) to point to alternate folders or add new providers. - Add additional dashboards by extending
grafana/configmap-dashboard.yamlor mounting a different ConfigMap. - Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.
8. Best Practicesβ
Resource Sizingβ
- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
- Grafana: start with
500mCPU /1GiRAM; scale replicas horizontally when concurrent viewers exceed a few dozen.
Storageβ
- Use SSD-backed storage classes for Prometheus when retention/window is large.
- Increase
prometheus/pvc.yaml(default 20Gi) andgrafana/pvc.yaml(default 10Gi) to match retention requirements. - Enable volume snapshots or backups for dashboards and alert history.
Securityβ
- Replace the demo
grafana-adminsecret with credentials stored in your preferred secret manager. - Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
- Enable Grafana role-based access control and API keys for automation.
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.
Maintenanceβ
- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
- Roll upgrades separately: update Prometheus and Grafana images via
kustomization.yamlpatches. - Consider adopting the Prometheus Operator (
ServiceMonitor+PodMonitor) if you already run kube-prometheus-stack. A sampleServiceMonitoris inwebsite/docs/tutorials/observability/observability.md.
4. Key Metrics (Sample)β
| Metric | Type | Description |
|---|---|---|
llm_category_classifications_count | counter | Number of category classification operations |
llm_model_completion_tokens_total | counter | Tokens emitted per model |
llm_model_routing_modifications_total | counter | Model switch / routing adjustments |
llm_model_completion_latency_seconds | histogram | Completion latency distribution |
process_cpu_seconds_total / process_resident_memory_bytes | standard | Runtime resource usage |
Use typical PromQL patterns:
rate(llm_model_completion_tokens_total[5m])
histogram_quantile(0.95, sum by (le) (rate(llm_model_completion_latency_seconds_bucket[5m])))
5. Troubleshootingβ
| Symptom | Likely Cause | Check | Fix |
|---|---|---|---|
| Target DOWN (Docker) | Service name mismatch | Prometheus /targets | Ensure semantic-router container running |
| Target DOWN (K8s) | Label/selectors mismatch | kubectl get ep semantic-router-metrics | Align labels or ServiceMonitor selector |
| No new tokens metrics | No traffic | Generate chat/completions via Envoy | Send test requests |
| Dashboard empty | Datasource URL wrong | Grafana datasource settings | Point to http://prometheus:9090 (Docker) or cluster Prometheus |
| Large 5xx spikes | Backend model unreachable | Router logs | Verify vLLM endpoints configuration |