Live

KVWarden

Tenant fairness on shared inference.

1.14x of solo TTFT, 26x better than FIFO

Read the full project

KVWarden is a scheduler and cache-pressure experiment for shared inference. The first public result is deliberately narrow: a quiet tenant stays near solo latency while a flooder pushes the system.

The useful part is not the logo or the promise. It is the harness, the plots, and the refusal to hide the quiet tenant in an aggregate.

The standalone site remains the source of truth for the live waitlist and project detail until the full project page migrates here.