My AI Lab on an RTX 3090: What Runs, What Broke, and What Surprised Me
Full architecture breakdown of a personal GPU inference lab on RTX 3090 — vLLM, SGLang, llama.cpp benchmarks, a Go model gateway with OpenTelemetry, Qwen Scope SAE activations in Grafana, and the FlashInfer investigation that inverted my benchmark results.