projects
Projects
Two projects, in different stages. Plus the small things that keep the lab honest.
Live
KVWarden
Tenant fairness on shared inference.
1.14× of solo TTFT, 26× better than FIFO
KVWarden is a scheduler and cache-pressure experiment for shared LLM inference. The first public result is narrow on purpose: a quiet tenant stays near solo latency while a flooder pushes the system. The harness is public; the plots do not hide the quiet tenant in an aggregate.
In research
Weft
Tenant-fair LLM inference on Apple Silicon.
Weft is an early thread on local inference scheduling. No public artifact yet. The shape is to keep tenants honest under load and make measurements easy to reproduce, on a class of hardware that is increasingly shared between agents on the same machine.
Probe window: 2026-05-19 → 2026-06-16.