projects

Projects

Two projects, in different stages. Plus the small things that keep the lab honest.

Live

KVWarden

Tenant fairness on shared inference.

1.14× of solo TTFT, 26× better than FIFO

KVWarden is a scheduler and cache-pressure experiment for shared LLM inference. The first public result is narrow on purpose: a quiet tenant stays near solo latency while a flooder pushes the system. The harness is public; the plots do not hide the quiet tenant in an aggregate.

In research

Weft

Tenant-fair LLM inference on Apple Silicon.

Weft is an early thread on local inference scheduling. No public artifact yet. The shape is to keep tenants honest under load and make measurements easy to reproduce, on a class of hardware that is increasingly shared between agents on the same machine.

Probe window: 2026-05-19 → 2026-06-16.

tools and experiments

Smaller things, mostly the scaffolding behind the public work.

RSS for new entries: /rss.xml