Name: llmdot
Author: Cognisoc

What it is

llmdot is a native .NET runtime for local language model inference, built around the GGUF model format. It executes major decoder-only transformer and hybrid architectures in the 1–8B parameter range — including multimodal variants — through architecture-agnostic execution templates resolved from GGUF metadata at load time.

The default path is pure managed code with zero native runtime dependencies, focused on CPU-first execution. Optional packages provide GPU acceleration through thin backend adapters.

What it is for

llmdot is for .NET developers shipping local, private, or offline AI features without fighting the inference stack — desktop, edge, server, and worker workloads where packaging simplicity, deployment predictability, and platform portability matter as much as raw throughput. If you have ever thought “I just want to load a GGUF file in my ASP.NET Core app and stream tokens”, llmdot is built for you.

What it is not

It is not the fastest inference engine on every hardware target.
It is not a replacement for vendor-optimized GPU runtimes for large-scale serving.
It does not require ONNX conversion or proprietary model packaging.
It does not target frontier-scale (70B+) models as an early milestone.
It does not target NPUs — NPUs are graph compilers, not programmable compute. See the architecture doc for the reasoning.

Status

Pre-alpha. The specification, architecture, and execution template design are stable. Implementation is in active development. Do not use in production yet.

Track progress in the roadmap.

Supported runtimes

Target frameworks are net8.0, net9.0, and net10.0 — net8.0 LTS is the compatibility floor. The codebase ships with Nullable enabled, warnings as errors, and LangVersion=13.0.

Optional GPU backends target Metal on Apple Silicon and Vulkan on Linux and Windows. Both are dispatched per-operation through an IComputeBackend contract; CPU remains the default fallback.

License & contributing

MIT-licensed. Design feedback is welcome — please read the vision and architecture documents before opening an issue. The areas most useful to contribute to right now are GGUF quantization format coverage, managed kernel optimization, tokenizer correctness across BPE variants, and test fixtures for additional model families.

Explore the features, read how the engine works, see where it fits, or jump to the quickstart. llmdot is one project in the Cognisoc local-inference stack.

About llmdot