When organisations tell me they're already in the cloud, I like to ask, 'Are you cloud-hosted or cloud-native?' This difference matters a lot for AI. If you move workloads to Azure without changing them (lift-and-shift), you keep old features like tightly connected parts, local state, vertical scaling, and manual processes. This setup works for everyday tasks, but it often can't keep up with AI inference, agent workflows, or real-time data pipelines.
That's why, in our Intelligent Cloud Modernisation for AI-Native approach, cloud-native refactoring comes first. This isn't a complete rewrite all at once. Refactoring means carefully reshaping the system so it can scale, adapt, and safely add intelligence. Cloud-native usually means building strong, manageable systems with containers, microservices, clear infrastructure, and automation.
Below is the journey, with examples from the Microsoft ecosystem that can accelerate each step.
1 Understand what we have (application discovery + dependency mapping)
Before we refactor, we must see the real architecture. Most legacy apps have hidden dependencies, and AI workloads will quickly expose them.
A practical Microsoft starting point is Azure Migrate for discovery and assessment of servers and workloads, and Azure Monitor + Application Insights to learn runtime behaviour (hot paths, slow dependencies, failure points). If the estate is large, Microsoft Defender for Cloud can also reveal risky configurations that must be corrected before modernising.
In bigger projects, teams often use internal copilots or agents to summarise what they find and create a clear refactoring backlog. Microsoft Copilot Studio can help by connecting to documentation and producing consistent assessments. This doesn't replace engineers, but it speeds up the analysis.
2) Break the monolith into deployable components (modularisation)
Refactoring begins when we carve the system into smaller parts: API layer, business services, data access, and background jobs. This is not only for engineering elegance; it enables independent scaling for AI inference or data ingestion.
In the Microsoft ecosystem, the target runtime is often Azure Kubernetes Service (AKS) or Azure Container Apps. For modern API exposure, Azure API Management helps bring governance, throttling, versioning, and security around services. Microsoft’s Well-Architected guidance also supports containerization and separation of concerns to improve scaling and reliability.
3) Externalise state (make compute stateless)
AI workloads scale horizontally. If sessions or files are stuck in one VM, scaling becomes expensive and fragile. The goal is to move the state into managed services.
For Azure refactoring, this usually means Azure Cache for Redis for session state and caching, and Cosmos DB / Azure SQL / Managed Instance, depending on workload type. Files move into Azure Blob Storage. Once compute is stateless, AKS or Container Apps can scale fast, and AI endpoints can burst when demand spikes.
4) Make changes safe and repeatable (CI/CD + IaC)
Refactoring is risky if releases are manual. AI-native platforms require continuous evolution, so Phase 1 must include automation.
Here, Microsoft tooling includes GitHub Actions or Azure DevOps Pipelines for CI/CD, and Bicep or Terraform for infrastructure-as-code. This creates consistent environments (DEV/TEST/PROD), repeatable deployments, and supports the Build–Run–Evolve model.
5) Add observability by design (monitoring and tracing)
AI brings new types of failures, like slow inference, delays in retrieval, and bottlenecks with dependencies. Refactoring isn't finished if you can't see what's going on.
Set up Azure Monitor, Application Insights, and Log Analytics to track key metrics like latency, error rates, and saturation. For AKS, add Container Insights. This follows cloud architecture best practices by making monitoring and resilience core parts of your system, especially as you add AI features later.
6) Prepare for AI integration (clean boundaries and model-ready interfaces)
Even in Phase 1, we should design for future AI. That means clean APIs, event-driven interfaces, and the ability to plug in model endpoints later without rewriting everything.
Microsoft tools that help with this include Event Grid for event-driven integration, Service Bus for reliable messaging, and Azure Functions for lightweight orchestration. These prepare your system for future GenAI patterns, like RAG and agents, where workflows use tools and services in a controlled way.
How do agents fit in? At this stage, you can start using 'engineering agents.' Copilot-based assistants can help generate scaffolding, CI/CD templates, infrastructure modules, or even update documentation automatically. They don't replace refactoring, but they make the process smoother and faster.
Conclusion
Cloud-native refactoring turns cloud-hosted technical debt into a platform ready for AI. Without this step, later phases like data modernisation and AI integration become more difficult, slower, and more expensive.

Comments
Post a Comment