Skip to main content

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Microsoft supports two large data warehouse and data analytics solutions inside (1) Azure - Azure Synapse + Data Lake and (2) Snowflake. Both of them are two mature products that are similar from many points of view. 

More than one time, I was in a situation when customers were asking: 'What approach should I have? Should I go with an Azure Synapse centric approach or Snowflake on top of Azure?" 

There are a lot of factors that can influence the decision like business, team skills, data strategy, compliance, data model or expected features. The next section of the article covers the differences between Snowflake on Azure and Azure Synapse centric from different dimensions. Based on the public features publicly available on the market at the end of 2021.

Both solutions provide:

  • A separate compute and storage pricing
  • Compliant with ANSI-SQL​
  • Semi-structure and structure data sources​
  • Data virtualization support
  • Native support for Pause/Resume and Scale of the compute​
LAST EDIT: After I finished the article, I realized that it is long, and most of you would like to see the conclusion. Let's start with the ending, and if you want to drill down, you can read the rest of the article (smile).

Final Thoughts
Both approaches offer similar experience from the data warehouse capabilities. There are small flavours of each of them, like Secondary Index available only on Snowflake or Stored Procedures supported only on Azure Synapse. This is not a stopper in most cases because there are workarounds for them. 

Below you can find a direct comparison between the two approaches. 
Scale: 1 (lowest) -5 (highest)

Snowflake is the big winner if you look for a solution provided as SaaS, where cost economics can be kept simple, with reduced effort on the ops and management side.

The most significant impact is at Governance, Security, Automation, Scalability and Integration, where Azure Synapse Centric solution is 3 steps ahead. You can do it also using Snowflake, but the Azure Synapse approach provides native support for data governance, key vaults, WAF, integration with native Azure Monitor, policy management and Azure DevOps and GitHub. Lack of integration with a CI/CD and Source Control system is a big minus for large systems.

Dimensions

The security dimension is covered in detail in the second part of the article. I dedicated an entire section because of the complexity and the high number of different perspectives.

Indexing

  • Snowflake on Azure: No support for manual indexing, driven by 'perform by default paradigm. ​
  • Azure Synapse Centric: Typical indexing experience and support for automatically indexes data. Partitioning data on 'disk' empowers the MPP backend.​

Integration

  • Snowflake on Azure: Well integrated with Azure ecosystem​
  • Azure Synapse Centric: Strong integration with Azure ​

Data Sharing

  • Snowflake on Azure: Build-in inside Snowflake​
  • Azure Synapse Centric: Provided through Azure Data Sharing capabilities​

Queries

  • Snowflake on Azure: Cross-database queries.SnowPipesprovide roughly the same functionality as Synapse Pipelines.​
  • Azure Synapse Centric: Cross-database queries in some instances (e.g., Serverlessinstances). Synapse Pipelines allow for trigger-based file loads.​

Cost

  • Snowflake on Azure: Compute by the 'credit', align with product tier​
  • Azure Synapse Centric: Per hour at various DWU levels​

Scale

  • Snowflake on Azure: 't-shirt' sizes which correspond to the quantity ofVMutilizedfor compute​
  • Azure Synapse Centric: Data Warehouse Unit (DWU) for compute​

Data Governance

  • Snowflake on Azure: No direct support. Only using 3rdpartiestools​
  • Azure Synapse Centric: Build-in support of data governance​

Security Dimension

There are 11 security criteria that I took into account. Covering security at different layers and tools and mechanisms to do the governance and monitoring. Azure Synapse centric solution is well integrated with the Microsoft Azure ecosystem and provides better E2E security. At the data layer, both offer mature security and governance features. 

Data Security

  • Snowflake on Azure: Encryption at REST, hierarchical key model, automatic key rotation, yearly re-keying, tri-secret secure, time travel, fail-safe. Additional security data provided by Azure storage services used by Snowflake.
  • Azure Synapse Centric: Encryption at REST, complete secrets management using Azure Key Vault, time travel, fail-safe, data segregation, at-rest data protection, in-transit data protection, data redundancy

Application Security

  • Snowflake on Azure: Managed using RBAC and DAC at Snowflake level. IaaS centric at Azure services level.
  • Azure Synapse Centric: Managed using RBAC and DAC, supported by Azure PaaS services.

Endpoint Security

  • Snowflake on Azure: Provided by Azure Services used by Snowflake.
  • Azure Synapse Centric: Full Azure Security features include IAM, Private Endpoints and WAF (Application Firewall).

Access Control

  • Snowflake on Azure: Build-in support for RBAC and DAC. Integration with Azure AD as an external Identity Provider is available.
  • Azure Synapse Centric: Full support for RBAC (using Azure AD RBAC) and DAC.

Network Security

  • Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
  • Azure Synapse Centric: Managed virtual network for all serviced used by Azure Synapse.

Perimeter Security

  • Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
  • Azure Synapse Centric: Native integration with Azure perimeter features (e.g. Private Link, VNET, VPN).

Physical Security
  • Snowflake on Azure: Microsoft designs, build and operate data centers in a way that strictly controls physical access to the areas where your data is stored.
  • Azure Synapse Centric: Microsoft designs, build and operate data centres in a way that strictly controls physical access to the areas where your data is stored.
Operations Management
  • Snowflake on Azure: Limited capability. A combination between native Snowflake features and Azure IaaS services.
  • Azure Synapse Centric: Collect, analyze, scale, automate, backup and recovery capabilities for all Azure Services, governed by Azure WAF
Security Audit and Tracking
  • Snowflake on Azure: Security audits at IAM, application and Snowflake services level. Not fully integrated with Azure Services audit.
  • Azure Synapse Centric: Audit at all levels, integrated into a common repository and dashboard for tracking and monitoring capabilities. 
Monitoring and Recommendations
  • Snowflake on Azure: Limited capability without 3rd parties support. Not fully integrated with Azure IaaS services used by Snowflake.
  • Azure Synapse Centric: Consolidated monitoring experience in one storage and dashboard powered by powerful AI and ML services to improve security. 
Policy Management
  • Snowflake on Azure: Policy management capability at network, account, user and data level. No direct integration with policy capabilities provided by Azure for
  • Azure Synapse Centric: Policy management capabilities at data, application and infrastructure level, fully integrated with Azure Services. 
Core Features
An Azure Synapse centric solution supports data governance with Pureview and native integration with Azure Services for ELT, ETL and Spark. CI/CD pipelines and Source Control are supported by GitHub and Azure DevOps integration.
Snowflake has significant advantages when pause and resume are required, and secondary index or in-memory capability is essential. The most significant advantage of Snowflake is the SaaS offer, where there are no maintenance and operational cost of the platform itself. 




Pricing
The devil is in the details is true when we talk about pricing estimations. Nevertheless, the order of magnitude is the same. The running cost of a large solution running on top of Azure Synapse or Snowflake is similar. Running costs cannot be considered a significant differential factor.  

Computation
  • Snowflake on Azure: Per-second basic (credit-based)
  • Azure Synapse Centric: Per-hour basic​ (execution calculated per second)​
Storage
  • Snowflake on Azure: Per TB/month​
  • Azure Synapse Centric: Per GB/month​
Tiers
  • Snowflake on Azure: 4 Computation tiers​
  • Azure Synapse Centric: One tier is available.​ Indirect tiers/size available for integration services (e.g., SQL Pool, Spark Pool)​
Saving plans
  • Snowflake on Azure: No public information is available​
  • Azure Synapse Centric: Up to 28% using Synapse Commit Units​​
Overall cost
  • Snowflake on Azure: Cost per query per hour is higher but includes ETL capability. Additional cost is generated by Azure Services that are required to run the solution. ​
  • Azure Synapse Centric: Azure Synapse cost per query per hour is lower. The additional cost is generated by Azure Services that are required around Synapse.​

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...