Skip to main content

Posts

Private doesn't mean invisible - What enterprise AI chats really mean

Recent posts

AI Ntive cloud reference architecture on Microsoft Azure

After 17 years working with cloud technology, I’ve seen a clear pattern. AI projects rarely fail because the model is weak. More often, the problem is that the platform was built for traditional applications, not for AI. GenAI and agents add extra demands on the architecture. AI also brings unpredictable traffic and new security and governance challenges. Here’s a reference architecture I use when designing AI-native platforms on Microsoft Azure. It’s not a strict blueprint, but a practical structure to keep teams aligned and prevent surprises as the solution grows. User and API entry layer Start with a clear entry point. Focus on predictable performance, strong security, and access control. On Azure, many teams use Azure Front Door or Application Gateway for incoming traffic, then add Azure API Management to manage API exposure, throttling, authentication, and versioning. A common mistake is exposing AI endpoints directly to the internet. It might seem quick for a proof of concept, bu...

Azure Governance that scales: guardrails for fast and safe delivery

 For large organizations, Azure success depends on solid governance, clear requirements, planned initiatives, and business priorities. Start with a clear hierarchy to apply rules consistently across the organization, not just to individual projects. First, I set up core elements: management groups, subscriptions, resource groups, and then resources. This structure is practical and important for scaling access and compliance controls. Management groups matter if you have multiple subscriptions and want a uniform baseline. I keep them shallow, three to four levels, since more are hard to manage. Azure allows up to six (excluding the tenant root and subscription level). Assignments at higher levels cascade down, so hierarchy matters. I use subscriptions as boundaries for billing and scaling. Splitting development, testing, and production into separate subscriptions isolates costs and risks. A dedicated subscription for shared network services, such as ExpressRoute or Virtual WAN, simp...

What a company needs to be able to deliver Cloud AI Native solutions

 Cloud AI-Native delivery means turning AI from a basic demonstration into a scalable platform. This requires modern cloud infrastructure, up-to-date & well-organized data, engineering practices suitable for operating AI at scale, and processes to ensure AI is used safely and responsibly. So, what does a company actually need to do to make this work? Build platforms, not just projects A company must design and build reusable foundations. Reliable frameworks have to support products and teams, rather than creating isolated projects. This means the company must be able to create reference architectures, standard templates, clear approaches, and clear processes for how teams work. Security, cost control, and operational monitoring must be built into the platform design at the start, not added later. Modernise applications, not just move them A company must migrate from lift-and-shift systems to cloud-native ones. This calls for skills in refactoring, containerisation, breaking mon...

Phase 5 of Intelligent Cloud Modernisation: Build-run-evolve of AI-Native solutions

 By Phase 5, most organisations have working systems. Applications are refactored, data modernised, AI integrated, and governance established. It is tempting to think the journey is over. AI-Native platforms are not classic IT. You don’t deploy and forget. Models drift, prompts evolve, embeddings go stale, costs shift, and user expectations change quickly. This is why Phase 5 is a continual Build–Run–Evolve cycle. In the image I use for this phase, the cycle is simple: Build → Run → Evolve. Behind this simplicity lies a serious message: AI requires automation and operational discipline on par with engineering. Build: Focus on making delivery repeatable, not dependent on individual effort. In AI projects, ‘heroic delivery’ is common: one team member deploys the model, another fixes the pipeline, and a few keep the platform alive. This does not scale. Build means we standardise how we build and release everything: infrastructure, applications, data pipelines, prompts, models, policie...

Phase 4 of Intelligent Cloud Modernisation: AI brings more risks (Governance for AI)

 During the first three phases of the AI-Native journey, we focus on major engineering tasks. We refactor applications to be cloud-native, update data for RAG and real-time context, and make AI workloads a core part of the platform. At this point, many organizations are excited because the technology is working. Demos are impressive, agents respond, and models help users. However, a new challenge appears: intelligence also brings risk. Without proper governance, the same AI that creates value can also cause harm. This leads to Phase 4, where governance becomes as important as architecture and data. It is not just a compliance task, but a practical way to enable safe AI scaling. 1. Data security and access control become more critical than ever GenAI systems reveal information in ways that traditional applications never could. While a report or dashboard only shows what it is meant to, a GenAI assistant can combine data and produce unexpected answers. If access controls are weak, se...

Phase 3 of Intelligent Cloud Modernisation: Native integration of AI workloads

 In Phase 1, we refactor applications to become cloud-native. In Phase 2, we modernise data so GenAI can work with trusted, fresh, and governed information. But even after these two phases, many organisations still treat AI like a side experiment. A small team builds a model, deploys it somewhere, and the rest of the platform stays the same. This is where real adoption stalls. Phase 3 is the moment when AI stops being a PoC and becomes a native workload inside the platform. It means training, inference, embeddings, retrieval, and orchestration are treated like first-class production components, with the same standards you apply to any critical service: scalability, security, observability, and lifecycle management. 1. Make inference a standard platform capability Most GenAI systems fail in production not because the model is weak, but because inference is not engineered as a real service should be. Inference presents latency, burst traffic, and cost challenges. Deploying it as a VM...