For some firms, AI on the public internet isn't an option.
We build AI inside your perimeter.
Data sovereignty, regulatory categorisation, controlled-environment requirements. When client confidentiality or patient data can't leave the building, on-premise AI is the answer.
Speak to a consultantWho chooses on-prem
Four sector-specific drivers. Each is a real conversation we have with clients in 2026.
Client confidentiality and legal professional privilege. Client data cannot flow to a third-party model provider, full stop. On-prem AI is the only option.
Patient data, NHS DSPT, MHRA medical-device categorisation. Data residency requirements rule out most cloud AI options for clinical-decision-support use cases.
Regulatory data, trading data, customer financial records. FCA expectations on data control and operational resilience make on-prem the lower-risk choice for many workloads.
Classified or controlled-unclassified data. Often a mandate rather than a preference.
When on-prem makes economic sense
Pulled from our internal research dossier (May 2026). Refreshed every six months.
| Workload profile | Recommended deployment | Break-even threshold | Rationale |
|---|---|---|---|
| Low-volume occasional use | API | Not relevant | API per-token cost is trivially low. Dedicated GPU infrastructure has no path to amortise at this volume. |
| Sustained moderate use (10k–100k tokens/day) | Cloud / API | Break-even ~3–6 months | Cost crosses over to favour self-hosted at sustained moderate volumes, but the API path remains the lower-risk default unless sovereignty drives otherwise. |
| High-volume sustained use (>1M tokens/day) | On-prem | Break-even within 6–12 months | GPU TCO beats per-token API pricing materially. Self-hosted infrastructure amortises and operations costs become predictable. |
| Data-sovereignty critical | On-prem regardless of volume | Economics irrelevant | Driver is non-economic. Legal, regulatory or contractual posture makes the decision before TCO enters the conversation. |
| Sandbox / prototyping | API | Not relevant | Speed and flexibility matter more than cost. Engineer iteration time dominates total spend at this stage. |
Source: Dossier D.4 — AI Engineering Market State, last updated 2026-05-13. Refreshed every six months.
What we deploy
Model selection
We select open-source models based on capability, licence terms, and operational sustainability. Our current shortlist is in the engineering dossier — happy to walk through what fits your use case.
Infrastructure
GPU sizing, networking, isolation, monitoring. Designed for your data classification, your performance needs, and your operations team's capability.
Operational layer
Monitoring, model updates, capacity planning, evaluation cadence. The agent doesn't go stale because someone forgot about it.
Compliance-first scoping
Every engagement starts with the regulatory and contractual posture, not the model. The data classification, the legal basis, the supervisory expectations — those frame the architecture before a GPU is sized.
Senior engineering throughout
The consultant scoping the deployment is the engineer specifying the infrastructure. No handovers to junior delivery teams. Production-grade decisions made by people accountable for the production outcome.
Audit trail by design
Every prompt, every response, every model update logged from day one. Built for the audit you'll inevitably face, not retrofitted under regulator pressure.
On-prem AI fits inside a managed IT relationship
On-premise AI is not a side project. It runs on infrastructure that has to be patched, monitored, backed up, scaled, and retired — the same disciplines that already govern the rest of your IT estate. The cleanest engagements are the ones where AI sits inside an existing managed IT relationship, not bolted on as a separate stack with separate vendors and separate accountability.
Where we deliver the underlying managed IT, the AI workload inherits the same monitoring, the same change control, the same incident response, and the same senior engineering bench. Where another partner runs the IT, we work alongside them and document the boundary precisely so neither side ends up holding a stranded responsibility when something breaks.
See Managed IT ServicesModel and infrastructure recommendations on this page reflect our May 2026 market view. We re-evaluate every six months.