Skip to main content

Cloud Modernization for AI: Data and Workflows (Pill 2 of 5 / Cloud Pills)

 AI is everywhere! To make an AI system work, run and provide us assistance, we need to provide the fuel for AI. To ensure that you have a working AI system, first, you must ensure that your data workflows run smoothly and are scalable for the high demand of the AI components. The challenge is not just to provide access to all your data (in a cheap data store). The real challenge is to modernise and optimise the entire lifecycle of your data, from ingestion to storage and processing. Ultimately, you need to ensure that you have a system with good performance and controlled (or less) bottlenecks. Vendors like Microsoft Azure provide a set of tools, cloud services, and data infrastructure designed to complement AI solutions.


One of the first challenges that needs to be addressed is data storage. As we know, AI systems require access to a large amount of data in different formats (structured and unstructured). A storage strategy that can manage the payload generated by AI applications is required to avoid performance issues and bottlenecks. The lack of a unified access layer can lead to data inconsistency and latency. This is a common challenge for companies with datasets in multiple locations that must be exposed to the AI system. For this purpose, a solution like Microsoft Fabric can be successfully used to unify a scalable storage solution for large datasets, with a unified data governance layer under the same roof.

Another challenge is related to ETL/ELT—more or less data ingestion and data transformation. Inside the organisation, the data is fragmented across different sources. AI models require high-quality, clean data that can be used by the models. To achieve this, services like Azure Data Factory or Data Factory of Microsoft Fabric can be used to move, transform, and aggregate all required data in one common repository.

Once the data is securely stored, the next challenge is ensuring seamless data ingestion and transformation. AI models rely on clean, high-quality data, yet raw data is often fragmented across disparate sources. Azure Data Factory enables organisations to automate data movement and transformation, integrating with various storage solutions such as Azure SQL Database, Azure Cosmos DB, and external SaaS applications. Organisations can ensure their AI models are trained on consistent and reliable data by setting up data pipelines that cleanse, standardise, and aggregate information in near real-time.

Another critical aspect of optimising data workflows is the processing layer, where AI models extract insights from vast datasets. Traditional data warehouses often struggle with handling AI data workloads' high-volume and high-velocity nature. Microsoft Fabric, a unified analytics platform, helps overcome these challenges by providing an integrated environment where data engineers and AI developers can collaborate efficiently. By combining the power of Synapse Data Warehouse with Spark-based big data analytics, organisations can run AI-driven queries at scale without performance degradation.

Leveraging real-time data streaming and event-driven architectures further enhances the efficiency of AI workflows. Many AI applications, such as fraud detection and predictive maintenance, require continuous data ingestion and real-time inference. Azure Stream Analytics and Azure Event Hubs allow businesses to process streaming data from IoT devices, transactional systems, and web applications with minimal latency. This capability ensures that AI models always work with the latest data, enabling faster and more accurate decision-making.

Finally, organisations must optimise AI workloads for performance and cost efficiency. Running large-scale AI workloads on Azure requires balancing on-demand resources with reserved capacity to avoid overspending. Azure Cost Management helps track and optimise expenses by analysing resource utilisation patterns and recommending cost-saving measures such as Spot VMs for non-critical workloads.


Optimising data workflows for AI is not just about improving storage and processing speeds. It requires a holistic approach integrating storage efficiency, seamless data pipelines, scalable processing, and cost optimisation. By leveraging Microsoft Azure’s AI and data services, businesses can create a robust, AI-ready infrastructure that accelerates innovation while ensuring operational sustainability.

Comments

Popular posts from this blog

Why Database Modernization Matters for AI

  When companies transition to the cloud, they typically begin with applications and virtual machines, which is often the easier part of the process. The actual complexity arises later when databases are moved. To save time and effort, cloud adoption is more of a cloud migration in an IaaS manner, fulfilling current, but not future needs. Even organisations that are already in the cloud find that their databases, although “migrated,” are not genuinely modernised. This disparity becomes particularly evident when they begin to explore AI technologies. Understanding Modernisation Beyond Migration Database modernisation is distinct from merely relocating an outdated database to Azure. It's about making your data layer ready for future needs, like automation, real-time analytics, and AI capabilities. AI needs high throughput, which can be achieved using native DB cloud capabilities. When your database runs in a traditional setup (even hosted in the cloud), in that case, you will enc...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

[Post Event] Azure AI Connect, March 2025

On March 13th, I had the opportunity to speak at Azure AI Connect about modern AI architectures.  My session focused on the importance of modernizing cloud systems to efficiently handle the increasing payload generated by AI.