Skip to main content

Phase 3 of Intelligent Cloud Modernisation: Native integration of AI workloads

 In Phase 1, we refactor applications to become cloud-native. In Phase 2, we modernise data so GenAI can work with trusted, fresh, and governed information. But even after these two phases, many organisations still treat AI like a side experiment. A small team builds a model, deploys it somewhere, and the rest of the platform stays the same. This is where real adoption stalls.

Phase 3 is the moment when AI stops being a PoC and becomes a native workload inside the platform. It means training, inference, embeddings, retrieval, and orchestration are treated like first-class production components, with the same standards you apply to any critical service: scalability, security, observability, and lifecycle management.


1. Make inference a standard platform capability

Most GenAI systems fail in production not because the model is weak, but because inference is not engineered as a real service should be. Inference presents latency, burst traffic, and cost challenges. Deploying it as a VM endpoint results in ongoing reliability and cost issues.
Native integration means inference is deployed using scalable patterns: containerised endpoints and clear network boundaries.
Use Azure Machine Learning managed online endpoints for hosting models with autoscaling and versioning, and expose them securely through Azure API Management. For GenAI, many teams integrate Azure OpenAI as a managed service and build a model gateway pattern in front of it for routing, throttling, and policy enforcement.

2. Embed embeddings + retrieval into the architecture (not as an add-on)

If GenAI answers without context, it will hallucinate. Phase 2 modernised data, but Phase 3 is where retrieval becomes architectural. Here, implement RAG through embeddings generation, vector indexing, hybrid search, relevance tuning, and citations.
Use Azure OpenAI embeddings plus Azure AI Search vector indexes. Treat this as a shared retrieval service used by multiple applications, not a one-off implementation within a single chatbot. When retrieval is a shared platform capability, you avoid duplication, reduce cost, and improve governance.

3. Introduce orchestration and agent workflows carefully

Agents, AI systems that perform actions, are powerful, but they also present risk. Instead of building agents that can do everything, prioritize designing bounded agents with limited scope, tightly controlled tools, clear permissions, and robust auditability.
Native integration means agent orchestration becomes part of your platform patterns: tool calling, workflow management, and event-driven triggers.
Use Azure AI Studio/Prompt Flow (or your preferred orchestration layer) to define multi-step workflows and connect actions via Azure Functions, Logic Apps, or Event Grid. When you do this correctly, agents become reliable automation workers, not unpredictable bots.

4. Build MLOps as a product pipeline, not a data science pipeline

In GenAI solutions, models drift, prompts evolve, retrieval needs tuning, and guardrails change. Without lifecycle management, AI quickly becomes fragile.
Native integration means MLOps is part of the delivery system: versioning, testing, deployment, rollback , monitoring, and evaluation.
Use Azure Machine Learning for model registry, pipelines, and controlled deployments. Combine it with GitHub Actions or Azure DevOps for CI/CD so model changes follow the same discipline as code.

5. Add AI observability and cost controls from the start

AI workloads require monitoring new operational metrics: token usage, latency by prompt type, retrieval quality, model errors, and drift (model performance changes over time). Monitoring only CPU and memory fails to address the real issues.
Use Application Insights and Azure Monitor for end-to-end tracing and latency measurement, plus Azure Cost Management for unit economics (cost per request, cost per prediction). This is also where FinOps for AI begins, because AI workloads can explode in cost if left uncontrolled.

Final thought

Phase 3 is where AI becomes real. Not as a lab experiment, but as a platform capability. When inference, retrieval, orchestration, and lifecycle management are integrated natively, AI can scale safely across teams and products. Without Phase 3, organisations stay stuck in pilots. With it, they start building AI features like any other digital capability, reliable, governed, and ready for growth.

Comments

Popular posts from this blog

Why Database Modernization Matters for AI

  When companies transition to the cloud, they typically begin with applications and virtual machines, which is often the easier part of the process. The actual complexity arises later when databases are moved. To save time and effort, cloud adoption is more of a cloud migration in an IaaS manner, fulfilling current, but not future needs. Even organisations that are already in the cloud find that their databases, although “migrated,” are not genuinely modernised. This disparity becomes particularly evident when they begin to explore AI technologies. Understanding Modernisation Beyond Migration Database modernisation is distinct from merely relocating an outdated database to Azure. It's about making your data layer ready for future needs, like automation, real-time analytics, and AI capabilities. AI needs high throughput, which can be achieved using native DB cloud capabilities. When your database runs in a traditional setup (even hosted in the cloud), in that case, you will enc...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

[Post Event] Azure AI Connect, March 2025

On March 13th, I had the opportunity to speak at Azure AI Connect about modern AI architectures.  My session focused on the importance of modernizing cloud systems to efficiently handle the increasing payload generated by AI.