Skip to content

Reports

Real-LLM evaluation, capability, and stress campaigns — every claim backed by runs against real models (remote gateway + on-device Ollama), graded with pass^k and grounded process checks.