independent inference research
coconutlabs
Schedulers, systems notes, and reproducible measurements for shared inference.
KVWarden Gate 2: 1.14× of solo TTFT under load. 26× better than FIFO.
Coconut Labs works on the shared layer of inference: scheduling, fairness, cache pressure, and the measurements that keep claims honest.
The lab is small by design. Fewer abstractions between the benchmark, the note, and the code.
The quiet tenant should still have a name.
Projects
KVWarden
1.14x of solo, 26x better than FIFO
Tenant fairness on shared inference. A quiet tenant stays visible when a flooder arrives.
Read moremlxd
Tenant-fair Apple Silicon inference
A scheduler + admission layer on top of mlx_lm.server. Restoring tenant identity first, then fairness on unified-memory hardware.
Read morethe lab
Two engineers, close to the work.
Coconut Labs is intentionally small. The work happens in the open at github.com/coconut-labs and shows up here when there is a result worth standing behind.
Building something at this layer? Write us.