Buy a workstation-class box you own. Stop renting a meter that never stops.
A third-party AI API charges you per token, and the bill grows with use. An open-weight model you own is a mostly fixed cost up front, then inference is roughly electricity. For sustained daily work, owning is the cheaper position over time.
A rented meter compounds. An owned box settles.
The two models behave in opposite ways over time. A metered API starts near zero and rises with use, and it rises fastest exactly when AI is working hardest for you. An owned box costs more on day one and then flattens, because once the hardware is paid for, running inference on it is close to the cost of the power it draws.
So the answer depends on how you use AI. For light, occasional, bursty use, a public API is cheap and convenient and there is no reason to own anything. For long-running agents, automations, and sustained heavy daily usage, the metered bill overtakes the owned box and keeps going, while the owned box keeps saving.
The entry box pays for itself in about seven months, then keeps saving.
Picture the two costs as lines over time. The metered API line starts low and rises with every token. The owned box line starts higher, at the price of the hardware, then barely moves. Early on, renting looks cheaper. Then the two lines cross, and from that point the owned box is ahead and the gap widens every month.
For the entry-tier box on a sustained workload, that crossover lands at about month seven. After that, every month is money the metered API would still be charging and the box no longer is. Measured on output, open-weight inference runs roughly 10x to 30x cheaper than the proprietary APIs, which is why the box settles so quickly.
Assumes 10M output tokens per month. Power-only opex, excludes labour. Hardware prices approx. June 2026.
Modelled 36-month TCO. Strix payback approx month 7, DGX payback approx month 17. Rates as of June 2026. Hosted-API pricing, self-hosting economics differ.
Start with one box. Add compute and nodes as you grow.
On-prem pricing has three moving parts: the box itself, the compute you add to it, and the nodes you scale out to. You start where your workload is today and step up only when load demands it. The integration does not change as you climb the ladder, so each step builds on the last instead of replacing it.
-
The entry box
A workstation-class box you own, in the low thousands. It is sized for a small regulated firm running real daily work, and it is the box that pays for itself in about seven months on a sustained workload.
AMD Strix Halo 128GB class, starts under $2,000, approx June 2026, rising.
-
The step-up box
A higher-throughput box for heavier workloads, still hardware you own.
NVIDIA DGX Spark, approx $4,699, approx June 2026, rising.
-
Compute add-on
Add GPU capacity to a box as your throughput needs grow, without changing your integration. Priced per unit of compute added. Talk to us for a figure sized to your workload.
-
Per-node scale-out
When one box is not enough, scale to multiple nodes. Priced per node, so cost tracks capacity. Talk to us for a figure sized to your deployment.
Software, updates, new-model compatibility, and support, on a yearly licence.
On top of the hardware, an annual Hive licence keeps the software current and supported. It covers ongoing software updates, compatibility with new open-weight models as they ship so your box runs newer and stronger models over time, and direct support from the team that builds it. The hardware is a one-time purchase you own. The licence is the yearly cost of keeping it current and supported, so the box you bought keeps improving rather than ageing out.
Not ready to own hardware yet? Start on Hive Cloud.
If you are not ready to buy a box, Hive Cloud runs the same software on a hosted, region-locked instance at a fraction of typical hosted-API pricing. It is the fastest way to start and the lowest up-front cost. Talk to us for a figure sized to your usage.
Pinterest cut AI costs roughly 90 percent and raised accuracy roughly 30 percent by running and customising the open-weight Qwen3-VL model on its own infrastructure.
as of June 2026
Results reflect Pinterest's specific implementation. Your outcomes will depend on your hardware, model choice, and workload.
On-prem pricing covers software licensing and support. Hardware procurement, security hardening, and regulatory validation are the customer's responsibility. Hive Cloud deployments run on third-party infrastructure and do not carry the same data-residency properties as the on-prem edition.
Frequently asked questions
How long does an on-prem Hive box take to pay for itself?
It depends on your inference volume. The model on this page shows the breakeven point where the owned box passes metered cloud.
What is included in the annual Hive licence?
Continuous software updates, validation and availability of new supported open-weight models, and the platform that runs them on your hardware.