Snowflake fundamental & comparing with other cloud services

varunkumar inbaraj
5 min readAug 11, 2020

Cloud-based data warehouses

Play with Snowflake Trial

You need to give some basic user details and A cloud provider is mandatory given that snowflake are cloud agnostic and sit on top of AWS, GCP, and Azure.

Available region based on cloud platform

GCP : US and Europe

AWS : US, Canada, EU, Asia

Azure : US, Canada EU, Southeast Asia and Switzerland

Try 30-Day Free Trial with $400 worth of free usage with sign-up.

https://trial.snowflake.com/

User Management

Security and account administrators (i.e. users with the SECURITYADMIN role or higher) can create and manage Snowflake users through SQL or the web interface

Refer : https://docs.snowflake.com/en/user-guide/admin-user-management.html#user-roles

Connecting to Snowflake

Allows 3rd-party tools and technologies, as well as the Snowflake-provided clients, in the Snowflake ecosystem for connecting to Snowflake.

Like other clouds snowflake can be accessed by CLI, connectors, and drivers

Refer : https://docs.snowflake.com/en/user-guide-connecting.html

Data into Snowflake

Loading data from files staged in any of the following locations, regardless of the cloud platform for your Snowflake account:

  • Internal stages
  • Amazon S3
  • Google Cloud Storage
  • Microsoft Azure blob storage

Steps to load

Resources : https://docs.google.com/document/d/1Q5xfsC522qCTbIZHBbHUjk058ZpgDqZuC0_JagtQhAo/edit?usp=sharing

Refer : https://docs.snowflake.com/en/user-guide-data-load.html

Unloading from Snowflake

You can unload to cloud or internal storage. It support any valid delimiter is supported; default is comma (CSV) and other format JSON, parquet. It support following compression gzip, bzip2, Brotli and Zstandard.

Refer : https://docs.snowflake.com/en/user-guide/data-unload-considerations.html

Note : Cloud providers apply data egress charges in either of the following use cases:

  • Data is transferred from one region to another within the same cloud platform.
  • Data is transferred out of the cloud platform.

Data Sharing

Secure Data Sharing enables sharing selected objects in a database in your account with other Snowflake accounts.

No actual data is copied or transferred between accounts. All sharing is accomplished through Snowflake’s unique services layer and metadata store. Hence the only charges to consumers are for the compute resources (i.e. virtual warehouses) used to query the shared data.

Refer : https://docs.snowflake.com/en/user-guide/data-sharing-intro.html

Snowflake Credits

It’s used to pay for the consumption of resources on Snowflake. A Snowflake credit is a unit of measure, and it is consumed only when a customer is using resources, such as when a virtual warehouse is running, the cloud services layer is performing work, or server-less features are used.

Refer : https://docs.snowflake.com/en/user-guide/credits.html#virtual-warehouse-credit-usage

Parameter Management

It has three types of parameters that can be set for your account:

  • Account parameters that affect your entire account.
  • Session parameters that default to users and their sessions.
  • Object parameters that default to objects (warehouses, databases, schemas, and tables).

All parameters have default values, which can be overridden at the account level. To override default values at the account level, you must be an account administrator (i.e. user granted the ACCOUNTADMIN role).

Refer : https://docs.snowflake.com/en/user-guide/admin-account-management.html#viewing-parameters-for-your-account

Comparing with other cloud services

Category 1 : Reporting

All the top 4 platforms Snowflake, AWS, Azure and GCP provide ready integration with tools such as Tableau, Qlikview, and Google Data Studio, which can then be leveraged for building reports and also for ad-hoc data analysis.

Category 2 : Component Used

Unlike Amazon and Google Cloud, the Snowflake platform actually provides data warehousing capabilities that internally use either Amazon, Google, or Azure platforms. It doesn’t provide any hardware resources of its own, instead, the entire setup is physically deployed on the customer’s selected underlying cloud.

This platform is largely focused on providing extremely fast and optimized query processing engines. This means that unlike the other cloud platforms, the customer will have to prepare its data externally and then load into Snowflake. For customers who already have dedicated ETL tools that can source and transform data, Snowflake can provide an extremely fast and scalable query processing engine.

Customers would somehow source the data from On-Premise/Clous into a staging area using one of the pipeline products in the market. They would then need to process this data using dedicated hardware and finally, move it back into Snowflake.

Category 3 : Speed: Snowflake is faster than other clouds

Snowflake edged out BigQuery in terms of raw speed, with queries taking, on average, 12.74 seconds. Meanwhile, BigQuery clocked in at 14.43 seconds per query, on average. In other words, Snowflake was faster in these tests.

If you’ve been comparing Snowflake and BigQuery’s performance, you might have seen somewhat different results. This is partly due to the different methodologies used in those benchmark tests, and partly due to the fact that these results are the most recent benchmark data we have available.

Category 4 : Machine learning or Advance Analytics

Snowflake platform is its integration with a number of third-party data science specialists rather than provide any dedicated component of its own for machine learning specifically. This architecture allows Snowflake customers to leverage the platform as a core data storage engine while piping the data into specialized third-party tools via the Partner Connect platform.

Unlike Snowflake, GCP have Datalab, BigQuery, Dataproc, Dataflow, Dataprep, Tensorflow and AWS have S3/RDS, SageMaker and Third-party source connectors

Category 5 : Cost

When comparing with other clouds on pricing is the fact that they’re billed somewhat differently. Because BigQuery/Redshift is billed per query, you really do only pay for what you use. You don’t pay for idle time on BigQuery the way that you would with Snowflake. This means that, even though Snowflake is cheaper by the query on average, if your workflow doesn’t include a lot of continuous use of your data warehouse, you might find that a BigQuery-based setup is actually cheaper.

If you have very large data, but a spiky workload (i.e. you’re running lots of queries occasionally, with high idle time), BigQuery will probably be the cheaper and easier for you. If you have a steadier, more continuous usage pattern when it comes to queries and the data you’re working with, it may be more cost effective to go with Snowflake, since you’ll be able to cram more queries into the hours you’re paying for.

About Me !

--

--