Tool Mastery

londonhanson · ‎11-23-2022

How to Improve Snowflake Read and Write Speed

Environment Details

Tips and tricks for improving performance when writing to or reading from Snowflake within Designer.

Alteryx Designer
Snowflake
Simba Snowflake Driver
- Version 2.25.1.1018
ODBC Data Sources Manager

Optimization Tips

image.png
1. Limit the data. Snowflake is a columnar database, meaning the width of the data (number of columns) has more of an effect than the number of rows. Use the Visual Query Builder, custom query, or a Select Tool to deselect unnecessary columns.

image.png
2. Use an Auto-Field Tool after an Input Data Tool for faster downstream processing. The Auto-Field Tool find the smallest possible size of each column and changes the data type to the smallest one possible, reducing the size of the data.

image.png
3. Remove/Disable all Browse Tools. Browse tools create temporary files of the data, so disabling or removing Browse Tools shortens the total execution time. To disable, navigate to: Workflow Configuration > Runtime > Disable All Browse Tools.

image.png
4. Explicitly state the database name. If the database is not included in a Snowflake DSN or DSN-less connection string, the primary keys for the entire account are retrieved. These metadata queries are time consuming, so specifying the database allows for more efficient querying. Example, where "CS_DB" is the database name:

snowbl:DSN=Simba Snowflake;UID=cs_snf-00; PWD=__EncPwd1__;Staging=local; Method=user|||CS_DB.tablename

image.png
5. Decrease the logging level. Logging information adds to the driver execution time. When not actively debugging an error, it is recommended to lower the logging verbosity within the driver configuration (ODBC Manager > Select Connection > Configure > Tracing (0-6), with 6 being the most descriptive log level):

0 = Off

1 = Fatal

2 = Error

3 = Warning

4 = Info

5 = Debug

6 = Trace

image.png
6. Use Bulk-Loading. This allows the data to be staged, or prepared, in another location before being written to Snowflake all at once. This can either be done locally or through Amazon S3 buckets. The three types of local staging are User, Table, and Internal Named (Snowflake).

image.png
7. Adjust the chunk size in the Bulk Load. The number of load operations that run in parallel cannot exceed the number of data files to be loaded. The chunk size recommended by Snowflake (Snowflake Whitepapers) for the optimal number of parallel operations is roughly 100-250 MB (or larger) in size compressed.

image.png
8. If the above tips do not help, an upgrade to Designer version 2022.1+ is recommended. Defect TDCB-3986 relating to each chunk of data including PUT and COPY API commands is fixed, greatly improving write speed.

image.png

9. Change the Bulk-Loading staging method. Use internal named stage or Amazon S3 buckets if configured, as this allows the preparation/processing of the staging files to be done in Amazon or Snowflake, respectively. Often times, this will have more available space/processing power than the client machine.

image.png
10. Use In-DB Tools whenever possible. Much like the staging methods mentioned in step 9, In-DB tools allow the processing to be done in the Database, rather than the client machine.

Tool Mastery

Become a Tool Master

How to Improve Snowflake Speed

How to Improve Snowflake Read and Write Speed

Environment Details

Optimization Tips

Additional Resources