Increase Your Snowflake Performance with New Larger Warehouses
In today’s competitive landscape, it has become necessary for many businesses to store and process increasingly larger amounts of data. From ad hoc use cases to large data science modeling initiatives to complex ETL workloads, customers are looking for better ways to process several terabytes—or even petabytes—of data.
As our customers have turned to Snowflake to handle larger and more complex workloads, they have asked us to provide warehouse compute capabilities that allow them to meet stringent performance requirements in terms of ETL pipeline performance and ad hoc query response times.
To help customers meet these challenges, Snowflake is proud to announce the general availability of 5X-Large and 6X-Large warehouses on AWS, a new solution that provides increased compute resources to run faster than ever. These new warehouse offerings enable customers to provision even larger compute clusters that are roughly double and quadruple the size of our existing largest warehouse size (4X-Large).
Recently, we ran the TPC-DS industry benchmarks with these new sizes. The entire power run consists of running 99 queries against the 100 TB scale TPC-DS database. Out of the box, all these queries execute on a 5XL warehouse in a warm run of 1,789 seconds (the difference between a warm and cold run is ~14%, or 1,789s vs. 2,037s). This is ~52% improvement over our previous run on a 4XL (3760s) and ~25% improvement from when we first introduced 5XL in public preview late last year. As we continue to double the compute resources, we have been able to maintain near-linear scalability through 5XL across these benchmarks. This is especially important as it allows our customers to maintain similar price-performance numbers across their query workloads.
Customer success stories
These larger warehouses enable users to analyze billions of rows of data, giving them faster insight on various questions. For example, data scientists can run several ad hoc queries to make sense of a given data set, data analysts can find useful insights that were previously hidden, and database administrators can load and process data to meet tighter SLAs.
Beyond the benchmarks, we have seen several real-world use cases that have either been unlocked or sped up on these new warehouse offerings. A large media company uses 5XL warehouses to do a backfill to reprocess incoming data. It currently has a soft SLA of 3 hours. The existing 4XLs were taking several hours to process the data, which would slow down its downstream analysis. By upsizing to 5XLs, the company is now able to comfortably meet its SLA, allowing it to run further testing and analysis in a more reasonable time. As the company’s data grows, it is considering more usage of these larger warehouses.
Another major customer went to 5XLs because its weekend ETL jobs were too large to run on the existing 4XLs, resulting in OOM errors. It batched these jobs across multiple 4XLs in order to get it to run. However, to reduce complexity, the company needed clusters with the same or more CPUs but fewer network connections to avoid throttling its queries, and the 5XLs fit the bill.
Key benefits of 5XL and 6XL warehouses:
Improved performance on larger workloads
Transparent resizing between warehouse sizes without additional maintenance
Visibility into and management control over query performance
Source: Snowflake, Author: Bharath Sitaraman