Posts tagged "vllm"

May 19, 2026 ai

My AI Lab on an RTX 3090: What Runs, What Broke, and What Surprised Me

Full architecture breakdown of a personal GPU inference lab on RTX 3090 — vLLM, SGLang, llama.cpp benchmarks, a Go model gateway with OpenTelemetry, Qwen Scope SAE activations in Grafana, and the FlashInfer investigation that inverted my benchmark results.

aiinferencevllmsglang