In recent years, many organisations have focused on GenAI pilots to demonstrate feasibility, with 2024 and 2025 centered on testing possibilities. By 2026, the main argument is clear: leaders must shift their emphasis from experimentation to delivering measurable ROI, strong unit economics, and predictable costs. AI's future depends on moving from innovation initiatives to products that drive sustainable margins.
The FinOps Foundation notes that AI and ML growth is accelerating, introducing higher-cost resources and new spending models such as tokens, GPUs, and managed AI services. FinOps is key to managing these costs. The Foundation cites Gartner’s $644B GenAI spending estimate in 2025 and IDC’s projection that by 2027, 75% of organisations will integrate GenAI with FinOps processes.
AI challenges traditional cloud cost models. While classic applications focus on CPU, memory, and traffic, GenAI introduces new cost drivers, including token usage, retrieval latency, vector search, GPU time for fine-tuning, and hidden data costs across pipelines and storage. Token pricing can be particularly unexpected, as organisations pay for both input and output, with costs rising rapidly as usage grows.
For this reason, FinOps for AI is becoming essential. The FinOps for AI working group provides practical guidance on scope, forecasting, optimisation, and unit economics, and encourages a value-driven approach that links AI spending to outcomes and business value rather than just cost savings (https://www.finops.org/wg/unlocking-ai-business-value-with-finops)
AI will also enhance FinOps. We will see more 'AI for FinOps' solutions, such as agents that answer cost questions, detect anomalies, recommend actions, and automate guardrails. The Foundation already provides examples, including routing queries to cost-effective models, prompt compression, token estimation, and anomaly detection to reduce waste.
Microsoft Foundry is adding model routing and transparent token pricing for various models. Microsoft’s Foundry blog details token pricing per million tokens, showing that a 'mini' model can cost much less than the full model for input and output.
A customer support assistant using 10 million input and output tokens per month incurs high costs if all traffic goes to the most expensive model. Routing 70% of requests to a smaller model and 30% to the full model can reduce token costs by over 50% without compromising quality for most interactions. This shows FinOps’ practical value in managing cost-to-serve per chat, case, or segment.
A customer support assistant using 10 million input and output tokens per month incurs high costs if all traffic goes to the most expensive model. Routing 70% of requests to a smaller model and 30% to the full model can reduce token costs by over 50% without compromising quality for most interactions. This shows FinOps’ practical value in managing cost-to-serve per chat, case, or segment.
The data aspect is also key. GenAI needs strong data foundations and analytics. Microsoft Foundry, for example, helps teams monitor and manage consumption by exposing costs as capacity units, with pooling and dashboards. In 2026, CFOs will ask, “How much does this AI feature cost per transaction?” The answer will include tokens, GPU time, and Foundry capacity supporting the system.
In 2026, AI success will be measured not only by impressive demonstrations but by unit economics and business value. Teams that adopt FinOps for AI early will scale AI with confidence, driving tangible business outcomes and setting a new standard for operational excellence.

Comments
Post a Comment