areto consulting blog

Deep Performance Analysis with the New “query operator stats” in Snowflake

Blogbeitraege Featured Images 15

Snowflake users can finally dig deep into query stats with this new function. This data was only available in the query profile visualization, and now you can get it in your SQL world with GET_QUERY_OPERATOR_STATS(). Let’s review the new function, and how to use it to get deeper insights while benchmarking.

The old ways

Snowflake classic query profile interactive visualization


GET_QUERY_OPERATOR_STATS – Snowflake Documentation

select * from table ( get_query_operator_stats ($ lqid )); +…

select *
from table(get_query_operator_stats(last_query_id()));
Query stats for a random query

What’s missing: Going beyond one query

Using the new query stats to analyze a whole run of TPCH benchmarks


Sample Data: TPC-H – Snowflake Documentation

create or replace table query_sessions2
select *, sum(is_session_start) over(order by start_time) session_id
from (
select query_id, query_text, query_type, warehouse_size, start_time, total_elapsed_time
        , bytes_scanned, rows_produced, compilation_time, execution_time
        , queued_provisioning_time, query_tag
        , regexp_substr(query_text, '--Query(..)', 1,1,'',1)::int qid
        , iff(qid < lag(qid) over(order by start_time) ,1,0) is_session_start
from table(information_schema.query_history(result_limit=>1000))
where bytes_scanned > 1000
and query_text like '%--Query%'
and start_time > '2023-02-06 18:00'
order by start_time desc
  • Look for all queries in the recent query history within the classic information_schema.query_history.
  • It looks for all the queries annotated with a comment like --Query11. This is how we find the TPCH related queries, and it helps it to extract the id of each query for our convenience.
  • We are only interested in queries that scanned more than 1000 bytes (make sure to turn off the results cache to get the queries to actually run every time).
  • Then the query uses a combination of each query id and lag() to identify each different benchmark run, and give each run a different session_id.
select warehouse_size, session_id, query_tag, sum(total_elapsed_time), sum(bytes_scanned)
    , count(*), min(start_time) start_time
from query_sessions2
group by 1,2, 3
order by session_id;
The numbers for each run of the TPCH benchmarks

Let’s go deeper — get all the query stats

How to get GET_QUERY_OPERATOR_STATS() for multiple queries?

create or replace table query_operator_stats_cache2
select 1::integer qid, 1::integer session_id, ''::string query_tag, *
from table(get_query_operator_stats(last_query_id()))
-- just for the schema
limit 0;
  query_id string;
  query_tag string;
  qid int;
  session_id int;
  c1 cursor for
select query_id, qid, session_id, query_tag
from query_sessions2 
where query_id not in (select query_id from query_operator_stats_cache2);
open c1;
for record in c1 do
fetch c1 into query_id, qid, session_id, query_tag;
insert into query_operator_stats_cache2
select :qid, :session_id, :query_tag, * from table(get_query_operator_stats(:query_id));  
end for;
return query_id;

Analyzing the table scans

create or replace view analyze_table_scans2
select qid, query_id, session_id, query_tag
 , operator_attributes:table_name::string tablename
 , array_size(operator_attributes:columns) table_ncols
 , execution_time_breakdown:overall_percentage::float time_percent
 , execution_time_breakdown:local_disk_io::float local_disk_io
 , execution_time_breakdown:processing::float processing
 , execution_time_breakdown:synchronization::float synchronization
 , operator_statistics:io:bytes_scanned::int bytes_scanned
 , operator_statistics:io:percentage_scanned_from_cache::float percentage_scanned_from_cache
 , operator_statistics:network:network_bytes::int network_bytes
 , operator_statistics:pruning:partitions_scanned::int scanned 
 , operator_statistics:pruning:partitions_total::int total
 , scanned/total ratio
from query_operator_stats_cache2
where operator_type = 'TableScan';
Detailed stats for each table scanned within each query
select session_id, query_tag, avg(percentage_scanned_from_cache)
from analyze_table_scans2
group by 1, 2
order by session_id;
Average percentage scanned from cache for each benchmark run
select qid, tablename, percentage_scanned_from_cache, bytes_scanned
from analyze_table_scans2
where session_id=1
order by percentage_scanned_from_cache;
Find out which tables had to bring data back from cold storage in the wh-Small 2nd run

Improving performance: Partition pruning

select sum(scanned) partitions_scanned, sum(total) partitions_total, partitions_scanned/partitions_total ratio
from analyze_table_scans2
order by qid, session_id, tablename;
Almost no partition pruning
create table lineitem as
select *
from snowflake_sample_data.tpch_sf10.lineitem
order by l_shipdate;

create table orders as
select *
from snowflake_sample_data.tpch_sf10.orders
order by o_orderdate;
Sorting and re-materializing in a small cold wh: 34.3s
create view part as select *
from snowflake_sample_data.tpch_sf10.part;
create view supplier as select *
from snowflake_sample_data.tpch_sf10.supplier;
create view partsupp as select *
from snowflake_sample_data.tpch_sf10.partsupp;
create view nation as select *
from snowflake_sample_data.tpch_sf10.nation;
create view region as select *
from snowflake_sample_data.tpch_sf10.region;
create view customer asselect *
from snowflake_sample_data.tpch_sf10.customer;
select session_id
    , sum(scanned) partitions_scanned
    , sum(total) partitions_total
    , partitions_scanned/partitions_total ratio
    , count(distinct qid) queries
from analyze_table_scans2
group by 1
order by 1
Session 4 to 7 had 13% less total partitions, and scanned 36% less partitions
select warehouse_size, session_id, query_tag, sum(total_elapsed_time), sum(bytes_scanned)
    , count(*), min(start_time) start_time
from query_sessions2
group by 1,2, 3
order by session_id;
Session 4 was 18% faster than session 0: Pruning wins.

Lessons learned

  • You can use the new get_query_operator_stats() to get deep stats for your queries.
  • Within these deep stats you can find information like the # of partitions scanned by each query.
  • Storing tables ordered by date can help a lot with pruning — as in this specific case.
  • Snowflake should re-materialize the shared sample tpch_sf10 tables, so anyone benchmarking them can get faster results with no effort.
  • Even when Snowflake had not optimized these tables for the benchmark, Mimoune Djouallah found out that Snowflake was the fastest of the benchmarked candidates. Snowflake is fast — and with small changes we can make it even faster.
  • Note that any benchmarks shown here reflect an arbitrary set up at an arbitrary point of time. Your numbers might vary.

Source: Felipe Hoffa on Medium

Blog-Beitrag teilen

areto Marketing

+49 221 66 95 75-0