Skip to main content

Cloud Modernization for AI: Data and Workflows (Pill 2 of 5 / Cloud Pills)

 AI is everywhere! To make an AI system work, run and provide us assistance, we need to provide the fuel for AI. To ensure that you have a working AI system, first, you must ensure that your data workflows run smoothly and are scalable for the high demand of the AI components. The challenge is not just to provide access to all your data (in a cheap data store). The real challenge is to modernise and optimise the entire lifecycle of your data, from ingestion to storage and processing. Ultimately, you need to ensure that you have a system with good performance and controlled (or less) bottlenecks. Vendors like Microsoft Azure provide a set of tools, cloud services, and data infrastructure designed to complement AI solutions.


One of the first challenges that needs to be addressed is data storage. As we know, AI systems require access to a large amount of data in different formats (structured and unstructured). A storage strategy that can manage the payload generated by AI applications is required to avoid performance issues and bottlenecks. The lack of a unified access layer can lead to data inconsistency and latency. This is a common challenge for companies with datasets in multiple locations that must be exposed to the AI system. For this purpose, a solution like Microsoft Fabric can be successfully used to unify a scalable storage solution for large datasets, with a unified data governance layer under the same roof.

Another challenge is related to ETL/ELT—more or less data ingestion and data transformation. Inside the organisation, the data is fragmented across different sources. AI models require high-quality, clean data that can be used by the models. To achieve this, services like Azure Data Factory or Data Factory of Microsoft Fabric can be used to move, transform, and aggregate all required data in one common repository.

Once the data is securely stored, the next challenge is ensuring seamless data ingestion and transformation. AI models rely on clean, high-quality data, yet raw data is often fragmented across disparate sources. Azure Data Factory enables organisations to automate data movement and transformation, integrating with various storage solutions such as Azure SQL Database, Azure Cosmos DB, and external SaaS applications. Organisations can ensure their AI models are trained on consistent and reliable data by setting up data pipelines that cleanse, standardise, and aggregate information in near real-time.

Another critical aspect of optimising data workflows is the processing layer, where AI models extract insights from vast datasets. Traditional data warehouses often struggle with handling AI data workloads' high-volume and high-velocity nature. Microsoft Fabric, a unified analytics platform, helps overcome these challenges by providing an integrated environment where data engineers and AI developers can collaborate efficiently. By combining the power of Synapse Data Warehouse with Spark-based big data analytics, organisations can run AI-driven queries at scale without performance degradation.

Leveraging real-time data streaming and event-driven architectures further enhances the efficiency of AI workflows. Many AI applications, such as fraud detection and predictive maintenance, require continuous data ingestion and real-time inference. Azure Stream Analytics and Azure Event Hubs allow businesses to process streaming data from IoT devices, transactional systems, and web applications with minimal latency. This capability ensures that AI models always work with the latest data, enabling faster and more accurate decision-making.

Finally, organisations must optimise AI workloads for performance and cost efficiency. Running large-scale AI workloads on Azure requires balancing on-demand resources with reserved capacity to avoid overspending. Azure Cost Management helps track and optimise expenses by analysing resource utilisation patterns and recommending cost-saving measures such as Spot VMs for non-critical workloads.


Optimising data workflows for AI is not just about improving storage and processing speeds. It requires a holistic approach integrating storage efficiency, seamless data pipelines, scalable processing, and cost optimisation. By leveraging Microsoft Azure’s AI and data services, businesses can create a robust, AI-ready infrastructure that accelerates innovation while ensuring operational sustainability.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills)

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills) The idea that moving to the cloud reduces the costs is a common misconception. The cloud infrastructure provides flexibility, scalability, and better CAPEX, but it does not guarantee lower costs without proper optimisation and management of the cloud services and infrastructure. Idle and unused resources, overprovisioning, oversize databases, and unnecessary data transfer can increase running costs. The regional pricing mode, multi-cloud complexity, and cost variety add extra complexity to the cost function. Cloud adoption without a cost governance strategy can result in unexpected expenses. Improper usage, combined with a pay-as-you-go model, can result in a nightmare for business stakeholders who cannot track and manage the monthly costs. Cloud-native services such as AI services, managed databases, and analytics platforms are powerful, provide out-of-the-shelve capabilities, and increase business agility and innovation. H...