Microsoft supports two large data warehouse and data analytics solutions inside (1) Azure - Azure Synapse + Data Lake and (2) Snowflake. Both of them are two mature products that are similar from many points of view.
More than one time, I was in a situation when customers were asking: 'What approach should I have? Should I go with an Azure Synapse centric approach or Snowflake on top of Azure?"
There are a lot of factors that can influence the decision like business, team skills, data strategy, compliance, data model or expected features. The next section of the article covers the differences between Snowflake on Azure and Azure Synapse centric from different dimensions. Based on the public features publicly available on the market at the end of 2021.
Both solutions provide:
- A separate compute and storage pricing
- Compliant with ANSI-SQL
- Semi-structure and structure data sources
- Data virtualization support
- Native support for Pause/Resume and Scale of the compute
The security dimension is covered in detail in the second part of the article. I dedicated an entire section because of the complexity and the high number of different perspectives.
Indexing
- Snowflake on Azure: No support for manual indexing, driven by 'perform by default paradigm.
- Azure Synapse Centric: Typical indexing experience and support for automatically indexes data. Partitioning data on 'disk' empowers the MPP backend.
Integration
- Snowflake on Azure: Well integrated with Azure ecosystem
- Azure Synapse Centric: Strong integration with Azure
Data Sharing
- Snowflake on Azure: Build-in inside Snowflake
- Azure Synapse Centric: Provided through Azure Data Sharing capabilities
Queries
- Snowflake on Azure: Cross-database queries.SnowPipesprovide roughly the same functionality as Synapse Pipelines.
- Azure Synapse Centric: Cross-database queries in some instances (e.g., Serverlessinstances). Synapse Pipelines allow for trigger-based file loads.
Cost
- Snowflake on Azure: Compute by the 'credit', align with product tier
- Azure Synapse Centric: Per hour at various DWU levels
Scale
- Snowflake on Azure: 't-shirt' sizes which correspond to the quantity ofVMutilizedfor compute
- Azure Synapse Centric: Data Warehouse Unit (DWU) for compute
Data Governance
- Snowflake on Azure: No direct support. Only using 3rdpartiestools
- Azure Synapse Centric: Build-in support of data governance
Security Dimension
There are 11 security criteria that I took into account. Covering security at different layers and tools and mechanisms to do the governance and monitoring. Azure Synapse centric solution is well integrated with the Microsoft Azure ecosystem and provides better E2E security. At the data layer, both offer mature security and governance features.
Data Security
- Snowflake on Azure: Encryption at REST, hierarchical key model, automatic key rotation, yearly re-keying, tri-secret secure, time travel, fail-safe. Additional security data provided by Azure storage services used by Snowflake.
- Azure Synapse Centric: Encryption at REST, complete secrets management using Azure Key Vault, time travel, fail-safe, data segregation, at-rest data protection, in-transit data protection, data redundancy
Application Security
- Snowflake on Azure: Managed using RBAC and DAC at Snowflake level. IaaS centric at Azure services level.
- Azure Synapse Centric: Managed using RBAC and DAC, supported by Azure PaaS services.
Endpoint Security
- Snowflake on Azure: Provided by Azure Services used by Snowflake.
- Azure Synapse Centric: Full Azure Security features include IAM, Private Endpoints and WAF (Application Firewall).
Access Control
- Snowflake on Azure: Build-in support for RBAC and DAC. Integration with Azure AD as an external Identity Provider is available.
- Azure Synapse Centric: Full support for RBAC (using Azure AD RBAC) and DAC.
Network Security
- Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
- Azure Synapse Centric: Managed virtual network for all serviced used by Azure Synapse.
Perimeter Security
- Snowflake on Azure: Provided by the native Azure Services behind Snowflake.
- Azure Synapse Centric: Native integration with Azure perimeter features (e.g. Private Link, VNET, VPN).
- Snowflake on Azure: Microsoft designs, build and operate data centers in a way that strictly controls physical access to the areas where your data is stored.
- Azure Synapse Centric: Microsoft designs, build and operate data centres in a way that strictly controls physical access to the areas where your data is stored.
- Snowflake on Azure: Limited capability. A combination between native Snowflake features and Azure IaaS services.
- Azure Synapse Centric: Collect, analyze, scale, automate, backup and recovery capabilities for all Azure Services, governed by Azure WAF
- Snowflake on Azure: Security audits at IAM, application and Snowflake services level. Not fully integrated with Azure Services audit.
- Azure Synapse Centric: Audit at all levels, integrated into a common repository and dashboard for tracking and monitoring capabilities.
- Snowflake on Azure: Limited capability without 3rd parties support. Not fully integrated with Azure IaaS services used by Snowflake.
- Azure Synapse Centric: Consolidated monitoring experience in one storage and dashboard powered by powerful AI and ML services to improve security.
- Snowflake on Azure: Policy management capability at network, account, user and data level. No direct integration with policy capabilities provided by Azure for
- Azure Synapse Centric: Policy management capabilities at data, application and infrastructure level, fully integrated with Azure Services.
- Snowflake on Azure: Per-second basic (credit-based)
- Azure Synapse Centric: Per-hour basic (execution calculated per second)
- Snowflake on Azure: Per TB/month
- Azure Synapse Centric: Per GB/month
- Snowflake on Azure: 4 Computation tiers
- Azure Synapse Centric: One tier is available. Indirect tiers/size available for integration services (e.g., SQL Pool, Spark Pool)
- Snowflake on Azure: No public information is available
- Azure Synapse Centric: Up to 28% using Synapse Commit Units
- Snowflake on Azure: Cost per query per hour is higher but includes ETL capability. Additional cost is generated by Azure Services that are required to run the solution.
- Azure Synapse Centric: Azure Synapse cost per query per hour is lower. The additional cost is generated by Azure Services that are required around Synapse.
Comments
Post a Comment