Notes from the runtime.
Design decisions, architecture notes, and field reports from building llmdot.
-
Managed-by-default: why CPU is the headline path, not the fallback
Most .NET inference stories start with a GPU and treat CPU as the leftover. llmdot starts the other way around — and the resulting deployment story is the actual product advantage.
-
Four execution templates for every 1–8B model we care about
How a small config-driven design collapses the modern decoder zoo into four execution templates — and why that matters for a .NET runtime that wants to stay small.
-
Why we picked GGUF as the ingestion format for .NET
GGUF is what the open model community actually publishes. For a .NET inference runtime, picking it as the primary format eliminates a class of problems before code is written.