Skip to main content

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Microsoft supports two large data warehouse and data analytics solutions inside (1) Azure - Azure Synapse + Data Lake and (2) Snowflake. Both of them are two mature products that are similar from many points of view. 

More than one time, I was in a situation when customers were asking: 'What approach should I have? Should I go with an Azure Synapse centric approach or Snowflake on top of Azure?" 

There are a lot of factors that can influence the decision like business, team skills, data strategy, compliance, data model or expected features. The next section of the article covers the differences between Snowflake on Azure and Azure Synapse centric from different dimensions. Based on the public features publicly available on the market at the end of 2021.

Both solutions provide:

  • A separate compute and storage pricing
  • Compliant with ANSI-SQL​
  • Semi-structure and structure data sources​
  • Data virtualization support
  • Native support for Pause/Resume and Scale of the compute​
LAST EDIT: After I finished the article, I realized that it is long, and most of you would like to see the conclusion. Let's start with the ending, and if you want to drill down, you can read the rest of the article (smile).

Final Thoughts
Both approaches offer similar experience from the data warehouse capabilities. There are small flavours of each of them, like Secondary Index available only on Snowflake or Stored Procedures supported only on Azure Synapse. This is not a stopper in most cases because there are workarounds for them. 

Below you can find a direct comparison between the two approaches. 
Scale: 1 (lowest) -5 (highest)

Snowflake is the big winner if you look for a solution provided as SaaS, where cost economics can be kept simple, with reduced effort on the ops and management side.

The most significant impact is at Governance, Security, Automation, Scalability and Integration, where Azure Synapse Centric solution is 3 steps ahead. You can do it also using Snowflake, but the Azure Synapse approach provides native support for data governance, key vaults, WAF, integration with native Azure Monitor, policy management and Azure DevOps and GitHub. Lack of integration with a CI/CD and Source Control system is a big minus for large systems.

Dimensions

The security dimension is covered in detail in the second part of the article. I dedicated an entire section because of the complexity and the high number of different perspectives.

Indexing

  • Snowflake on Azure: No support for manual indexing, driven by 'perform by default paradigm. ​
  • Azure Synapse Centric: Typical indexing experience and support for automatically indexes data. Partitioning data on 'disk' empowers the MPP backend.​

Integration

  • Snowflake on Azure: Well integrated with Azure ecosystem​
  • Azure Synapse Centric: Strong integration with Azure ​

Data Sharing

  • Snowflake on Azure: Build-in inside Snowflake​
  • Azure Synapse Centric: Provided through Azure Data Sharing capabilities​

Queries

  • Snowflake on Azure: Cross-database queries.SnowPipesprovide roughly the same functionality as Synapse Pipelines.​
  • Azure Synapse Centric: Cross-database queries in some instances (e.g., Serverlessinstances). Synapse Pipelines allow for trigger-based file loads.​

Cost

  • Snowflake on Azure: Compute by the 'credit', align with product tier​
  • Azure Synapse Centric: Per hour at various DWU levels​

Scale

  • Snowflake on Azure: 't-shirt' sizes which correspond to the quantity ofVMutilizedfor compute​
  • Azure Synapse Centric: Data Warehouse Unit (DWU) for compute​

Data Governance

  • Snowflake on Azure: No direct support. Only using 3rdpartiestools​
  • Azure Synapse Centric: Build-in support of data governance​

Security Dimension

There are 11 security criteria that I took into account. Covering security at different layers and tools and mechanisms to do the governance and monitoring. Azure Synapse centric solution is well integrated with the Microsoft Azure ecosystem and provides better E2E security. At the data layer, both offer mature security and governance features. 

Data Security

  • Snowflake on Azure: Encryption at REST, hierarchical key model, automatic key rotation, yearly re-keying, tri-secret secure, time travel, fail-safe. Additional security data provided by Azure storage services used by Snowflake.
  • Azure Synapse Centric: Encryption at REST, complete secrets management using Azure Key Vault, time travel, fail-safe, data segregation, at-rest data protection, in-transit data protection, data redundancy

Application Security

  • Snowflake on Azure: Managed using RBAC and DAC at Snowflake level. IaaS centric at Azure services level.
  • Azure Synapse Centric: Managed using RBAC and DAC, supported by Azure PaaS services.

Endpoint Security

  • Snowflake on Azure: Provided by Azure Services used by Snowflake.
  • Azure Synapse Centric: Full Azure Security features include IAM, Private Endpoints and WAF (Application Firewall).

Access Control

  • Snowflake on Azure: Build-in support for RBAC and DAC. Integration with Azure AD as an external Identity Provider is available.
  • Azure Synapse Centric: Full support for RBAC (using Azure AD RBAC) and DAC.

Network Security

  • Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
  • Azure Synapse Centric: Managed virtual network for all serviced used by Azure Synapse.

Perimeter Security

  • Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
  • Azure Synapse Centric: Native integration with Azure perimeter features (e.g. Private Link, VNET, VPN).

Physical Security
  • Snowflake on Azure: Microsoft designs, build and operate data centers in a way that strictly controls physical access to the areas where your data is stored.
  • Azure Synapse Centric: Microsoft designs, build and operate data centres in a way that strictly controls physical access to the areas where your data is stored.
Operations Management
  • Snowflake on Azure: Limited capability. A combination between native Snowflake features and Azure IaaS services.
  • Azure Synapse Centric: Collect, analyze, scale, automate, backup and recovery capabilities for all Azure Services, governed by Azure WAF
Security Audit and Tracking
  • Snowflake on Azure: Security audits at IAM, application and Snowflake services level. Not fully integrated with Azure Services audit.
  • Azure Synapse Centric: Audit at all levels, integrated into a common repository and dashboard for tracking and monitoring capabilities. 
Monitoring and Recommendations
  • Snowflake on Azure: Limited capability without 3rd parties support. Not fully integrated with Azure IaaS services used by Snowflake.
  • Azure Synapse Centric: Consolidated monitoring experience in one storage and dashboard powered by powerful AI and ML services to improve security. 
Policy Management
  • Snowflake on Azure: Policy management capability at network, account, user and data level. No direct integration with policy capabilities provided by Azure for
  • Azure Synapse Centric: Policy management capabilities at data, application and infrastructure level, fully integrated with Azure Services. 
Core Features
An Azure Synapse centric solution supports data governance with Pureview and native integration with Azure Services for ELT, ETL and Spark. CI/CD pipelines and Source Control are supported by GitHub and Azure DevOps integration.
Snowflake has significant advantages when pause and resume are required, and secondary index or in-memory capability is essential. The most significant advantage of Snowflake is the SaaS offer, where there are no maintenance and operational cost of the platform itself. 




Pricing
The devil is in the details is true when we talk about pricing estimations. Nevertheless, the order of magnitude is the same. The running cost of a large solution running on top of Azure Synapse or Snowflake is similar. Running costs cannot be considered a significant differential factor.  

Computation
  • Snowflake on Azure: Per-second basic (credit-based)
  • Azure Synapse Centric: Per-hour basic​ (execution calculated per second)​
Storage
  • Snowflake on Azure: Per TB/month​
  • Azure Synapse Centric: Per GB/month​
Tiers
  • Snowflake on Azure: 4 Computation tiers​
  • Azure Synapse Centric: One tier is available.​ Indirect tiers/size available for integration services (e.g., SQL Pool, Spark Pool)​
Saving plans
  • Snowflake on Azure: No public information is available​
  • Azure Synapse Centric: Up to 28% using Synapse Commit Units​​
Overall cost
  • Snowflake on Azure: Cost per query per hour is higher but includes ETL capability. Additional cost is generated by Azure Services that are required to run the solution. ​
  • Azure Synapse Centric: Azure Synapse cost per query per hour is lower. The additional cost is generated by Azure Services that are required around Synapse.​

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills)

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills) The idea that moving to the cloud reduces the costs is a common misconception. The cloud infrastructure provides flexibility, scalability, and better CAPEX, but it does not guarantee lower costs without proper optimisation and management of the cloud services and infrastructure. Idle and unused resources, overprovisioning, oversize databases, and unnecessary data transfer can increase running costs. The regional pricing mode, multi-cloud complexity, and cost variety add extra complexity to the cost function. Cloud adoption without a cost governance strategy can result in unexpected expenses. Improper usage, combined with a pay-as-you-go model, can result in a nightmare for business stakeholders who cannot track and manage the monthly costs. Cloud-native services such as AI services, managed databases, and analytics platforms are powerful, provide out-of-the-shelve capabilities, and increase business agility and innovation. H...