Skip to main content

FinOps for AI & AI for FinOps are knocking on our doors

 In recent years, many organisations have focused on GenAI pilots to demonstrate feasibility, with 2024 and 2025 centered on testing possibilities. By 2026, the main argument is clear: leaders must shift their emphasis from experimentation to delivering measurable ROI, strong unit economics, and predictable costs. AI's future depends on moving from innovation initiatives to products that drive sustainable margins.

The FinOps Foundation notes that AI and ML growth is accelerating, introducing higher-cost resources and new spending models such as tokens, GPUs, and managed AI services. FinOps is key to managing these costs. The Foundation cites Gartner’s $644B GenAI spending estimate in 2025 and IDC’s projection that by 2027, 75% of organisations will integrate GenAI with FinOps processes.
AI challenges traditional cloud cost models. While classic applications focus on CPU, memory, and traffic, GenAI introduces new cost drivers, including token usage, retrieval latency, vector search, GPU time for fine-tuning, and hidden data costs across pipelines and storage. Token pricing can be particularly unexpected, as organisations pay for both input and output, with costs rising rapidly as usage grows.
For this reason, FinOps for AI is becoming essential. The FinOps for AI working group provides practical guidance on scope, forecasting, optimisation, and unit economics, and encourages a value-driven approach that links AI spending to outcomes and business value rather than just cost savings (https://www.finops.org/wg/unlocking-ai-business-value-with-finops)
AI will also enhance FinOps. We will see more 'AI for FinOps' solutions, such as agents that answer cost questions, detect anomalies, recommend actions, and automate guardrails. The Foundation already provides examples, including routing queries to cost-effective models, prompt compression, token estimation, and anomaly detection to reduce waste.


Microsoft Foundry is adding model routing and transparent token pricing for various models. Microsoft’s Foundry blog details token pricing per million tokens, showing that a 'mini' model can cost much less than the full model for input and output.
A customer support assistant using 10 million input and output tokens per month incurs high costs if all traffic goes to the most expensive model. Routing 70% of requests to a smaller model and 30% to the full model can reduce token costs by over 50% without compromising quality for most interactions. This shows FinOps’ practical value in managing cost-to-serve per chat, case, or segment.
The data aspect is also key. GenAI needs strong data foundations and analytics. Microsoft Foundry, for example, helps teams monitor and manage consumption by exposing costs as capacity units, with pooling and dashboards. In 2026, CFOs will ask, “How much does this AI feature cost per transaction?” The answer will include tokens, GPU time, and Foundry capacity supporting the system.
In 2026, AI success will be measured not only by impressive demonstrations but by unit economics and business value. Teams that adopt FinOps for AI early will scale AI with confidence, driving tangible business outcomes and setting a new standard for operational excellence.

Comments

Popular posts from this blog

Why Database Modernization Matters for AI

  When companies transition to the cloud, they typically begin with applications and virtual machines, which is often the easier part of the process. The actual complexity arises later when databases are moved. To save time and effort, cloud adoption is more of a cloud migration in an IaaS manner, fulfilling current, but not future needs. Even organisations that are already in the cloud find that their databases, although “migrated,” are not genuinely modernised. This disparity becomes particularly evident when they begin to explore AI technologies. Understanding Modernisation Beyond Migration Database modernisation is distinct from merely relocating an outdated database to Azure. It's about making your data layer ready for future needs, like automation, real-time analytics, and AI capabilities. AI needs high throughput, which can be achieved using native DB cloud capabilities. When your database runs in a traditional setup (even hosted in the cloud), in that case, you will enc...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

[Post Event] Azure AI Connect, March 2025

On March 13th, I had the opportunity to speak at Azure AI Connect about modern AI architectures.  My session focused on the importance of modernizing cloud systems to efficiently handle the increasing payload generated by AI.