independent inference research

coconutlabs

Schedulers, systems notes, and reproducible measurements for shared inference.

KVWarden Gate 2: 1.14× of solo TTFT under load. 26× better than FIFO.

Coconut Labs works on the shared layer of inference: scheduling, fairness, cache pressure, and the measurements that keep claims honest.

The lab is small by design. Fewer abstractions between the benchmark, the note, and the code.

The quiet tenant should still have a name.

Projects

Live

KVWarden

1.14x of solo, 26x better than FIFO

Tenant fairness on shared inference. A quiet tenant stays visible when a flooder arrives.

In research

mlxd

Tenant-fair Apple Silicon inference

A scheduler + admission layer on top of mlx_lm.server. Restoring tenant identity first, then fairness on unified-memory hardware.

Recent research

Index

2026-04-19 · research note

Tenant fairness on shared inference

A KVWarden Gate 2 note: 53.9 ms quiet TTFT, 61.5 ms under flooder pressure, and 26x better tail behavior than FIFO.

the lab

Two engineers, close to the work.

Coconut Labs is intentionally small. The work happens in the open at github.com/coconut-labs and shows up here when there is a result worth standing behind.

How we work

Building something at this layer? Write us.

info@coconutlabs.org

latest note · 2026-04-19 (13 weeks ago)14 commits this weekkvwarden gate 2 · 1.14× solo · 26× better than fifo4 repos tracked6 rfc openupdated 56d ago