Skip to main content

Phase 1 of Intelligent Cloud Modernisation: Cloud-native refactoring for AI-Native

 When organisations tell me they're already in the cloud, I like to ask, 'Are you cloud-hosted or cloud-native?' This difference matters a lot for AI. If you move workloads to Azure without changing them (lift-and-shift), you keep old features like tightly connected parts, local state, vertical scaling, and manual processes. This setup works for everyday tasks, but it often can't keep up with AI inference, agent workflows, or real-time data pipelines.

That's why, in our Intelligent Cloud Modernisation for AI-Native approach, cloud-native refactoring comes first. This isn't a complete rewrite all at once. Refactoring means carefully reshaping the system so it can scale, adapt, and safely add intelligence. Cloud-native usually means building strong, manageable systems with containers, microservices, clear infrastructure, and automation.
Below is the journey, with examples from the Microsoft ecosystem  that can accelerate each step.


1 Understand what we have (application discovery + dependency mapping)

Before we refactor, we must see the real architecture. Most legacy apps have hidden dependencies, and AI workloads will quickly expose them.
A practical Microsoft starting point is Azure Migrate for discovery and assessment of servers and workloads, and Azure Monitor + Application Insights to learn runtime behaviour (hot paths, slow dependencies, failure points). If the estate is large, Microsoft Defender for Cloud can also reveal risky configurations that must be corrected before modernising.
In bigger projects, teams often use internal copilots or agents to summarise what they find and create a clear refactoring backlog. Microsoft Copilot Studio can help by connecting to documentation and producing consistent assessments. This doesn't replace engineers, but it speeds up the analysis.

2) Break the monolith into deployable components (modularisation)

Refactoring begins when we carve the system into smaller parts: API layer, business services, data access, and background jobs. This is not only for engineering elegance; it enables independent scaling for AI inference or data ingestion.
In the Microsoft ecosystem, the target runtime is often Azure Kubernetes Service (AKS) or Azure Container Apps. For modern API exposure, Azure API Management helps bring governance, throttling, versioning, and security around services. Microsoft’s Well-Architected guidance also supports containerization and separation of concerns to improve scaling and reliability.

3) Externalise state (make compute stateless)

AI workloads scale horizontally. If sessions or files are stuck in one VM, scaling becomes expensive and fragile. The goal is to move the state into managed services.
For Azure refactoring, this usually means Azure Cache for Redis for session state and caching, and Cosmos DB / Azure SQL / Managed Instance, depending on workload type. Files move into Azure Blob Storage. Once compute is stateless, AKS or Container Apps can scale fast, and AI endpoints can burst when demand spikes.

4) Make changes safe and repeatable (CI/CD + IaC)

Refactoring is risky if releases are manual. AI-native platforms require continuous evolution, so Phase 1 must include automation.
Here, Microsoft tooling includes GitHub Actions or Azure DevOps Pipelines for CI/CD, and Bicep or Terraform for infrastructure-as-code. This creates consistent environments (DEV/TEST/PROD), repeatable deployments, and supports the Build–Run–Evolve model.

5) Add observability by design (monitoring and tracing)

AI brings new types of failures, like slow inference, delays in retrieval, and bottlenecks with dependencies. Refactoring isn't finished if you can't see what's going on.
Set up Azure Monitor, Application Insights, and Log Analytics to track key metrics like latency, error rates, and saturation. For AKS, add Container Insights. This follows cloud architecture best practices by making monitoring and resilience core parts of your system, especially as you add AI features later.

6) Prepare for AI integration (clean boundaries and model-ready interfaces)

Even in Phase 1, we should design for future AI. That means clean APIs, event-driven interfaces, and the ability to plug in model endpoints later without rewriting everything.
Microsoft tools that help with this include Event Grid for event-driven integration, Service Bus for reliable messaging, and Azure Functions for lightweight orchestration. These prepare your system for future GenAI patterns, like RAG and agents, where workflows use tools and services in a controlled way.
How do agents fit in? At this stage, you can start using 'engineering agents.' Copilot-based assistants can help generate scaffolding, CI/CD templates, infrastructure modules, or even update documentation automatically. They don't replace refactoring, but they make the process smoother and faster.

Conclusion
Cloud-native refactoring turns cloud-hosted technical debt into a platform ready for AI. Without this step, later phases like data modernisation and AI integration become more difficult, slower, and more expensive.

Comments

Popular posts from this blog

Why Database Modernization Matters for AI

  When companies transition to the cloud, they typically begin with applications and virtual machines, which is often the easier part of the process. The actual complexity arises later when databases are moved. To save time and effort, cloud adoption is more of a cloud migration in an IaaS manner, fulfilling current, but not future needs. Even organisations that are already in the cloud find that their databases, although “migrated,” are not genuinely modernised. This disparity becomes particularly evident when they begin to explore AI technologies. Understanding Modernisation Beyond Migration Database modernisation is distinct from merely relocating an outdated database to Azure. It's about making your data layer ready for future needs, like automation, real-time analytics, and AI capabilities. AI needs high throughput, which can be achieved using native DB cloud capabilities. When your database runs in a traditional setup (even hosted in the cloud), in that case, you will enc...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

[Post Event] Azure AI Connect, March 2025

On March 13th, I had the opportunity to speak at Azure AI Connect about modern AI architectures.  My session focused on the importance of modernizing cloud systems to efficiently handle the increasing payload generated by AI.