Large Language Model Development

Utility and bust: scale in future AI provision

Download

Person wearing a virtual reality headset with a hand raised forward, symbolizing innovation and interaction in the digital world.

Accelerate AI Transformation

Boost Productivity

download

contact

Key takeaways

Large Language Model (LLM) development and hosting technology has developed and diffused making it now widely accessible.

Our investigation and experiments show that it is now possible for virtually any company to host and customise near state-of-the-art models. However, costs are such that it is not realistic to offer full-scale, open-source models at prices that are competitive with the hyperscale providers.

This paper provides a demonstration of what can be done with commodity hardware, and the implications of the change in AI technology that enabled this demonstration are discussed.

Three futures for AI technology are described:

a future where scale dominates,
a future where scale is important,
and a future where scale is irrelevant.

Current evidence points to the ‘scale matters’ future and not the ‘scale dominates’ future, but there are also indications that scale may be even less important. Download our white paper to find out more how it is possible for virtually any company to host and customise near state-of-the-art models

FAQ: Enterprise LLM Development

What is model distillation in large language model development?

Model distillation is a technique in which a smaller “student” model is trained to replicate the behaviour of a larger “teacher” large language model (LLM). It reduces model size and computing requirements while retaining most of the original performance.

In enterprise environments, model distillation enables cost‑efficient deployment of LLMs on mid‑range GPU infrastructure rather than hyperscale clusters. When combined with quantisation and parameter‑efficient tuning, it significantly lowers inference costs and latency. Distillation is especially valuable for domain‑specific applications where extremely large models are unnecessary.

The full Thought Leadership piece explains how distillation is reshaping the economics of LLM infrastructure and influencing enterprise AI strategy. Download the report for benchmarks and implementation guidance.

What is the difference between model distillation and quantisation?

Model distillation and quantisation are both techniques used to optimise large language models, but they improve efficiency in different ways. Distillation reduces model size by training a smaller model to imitate a larger one. Quantisation reduces memory usage by lowering numerical precision (for example, from 16‑bit to 8‑bit).

Distillation alters the model’s architecture and number of parameters, whereas quantisation changes how those parameters are represented. Used together, these techniques can substantially reduce GPU requirements and inference costs without causing significant performance degradation.

The Thought Leadership piece explores how combining distillation and quantisation enables near‑state‑of‑the‑art performance on more accessible hardware. Download the full paper for detailed technical benchmarks.

Is building a private LLM more cost‑effective than using API‑based models?

Building a private LLM can be more cost‑effective at scale, particularly for sustained workloads with high token volumes and stringent data‑governance requirements. However, the total cost of ownership depends on factors such as infrastructure investment, engineering expertise, and utilisation rates.

API‑based models provide rapid deployment and elasticity, but long‑term usage fees can exceed the cost of operating a fine‑tuned, self‑hosted model. Enterprises must assess GPU capital expenditure, MLOps maturity, compliance risk, and opportunities for strategic differentiation.

The full report compares API pricing scenarios with self‑hosted and distilled model strategies under varying utilisation assumptions. Download the analysis for detailed cost modelling.

gatedDownload.step1

gatedDownload.step2

gatedDownload.step3