LLM pruning and knowledge distillation on HPC.
Joint research endeavour with HLRS, AMD, and HPE on pruning and distilling LLMs at scale on AMD MI300A hardware inside Germany's secure national HPC.
- HLRS
- AMD
- HPE
North Star
We research the systems around frontier LLMs: retrieval, small-model optimization, and runtime verification — the layer that makes AI grounded, auditable, and production-ready.
Each entry is a working artifact: a paper, a project, an engine, a library, or an open model. The thread runs from theory to deployable systems.
Joint research endeavour with HLRS, AMD, and HPE on pruning and distilling LLMs at scale on AMD MI300A hardware inside Germany's secure national HPC.
Dennis, our Co-Founder, set the foundation for LLM auditability at runtime during the EU-funded FFplus project.
High-performance multi-vector retrieval that treats the document, not the chunk, as the unit of search. Much simpler pipeline plumbing with a new quantization scheme for high efficiency-to-recall.
Structured pruning of LLMs and encoders, with distillation back into smaller, deployment-ready students. Built for cost, latency, and recoverable quality.
Specialist small models for the boring parts of production AI: compression, PII detection, relation extraction, agent routing, privacy-aware classification.
We work with universities, infrastructure partners, and enterprise labs on the hard parts of retrieval, small-model optimization, and runtime verification. If your research or product touches that surface, we want to talk.