Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

How to Improve Snowflake Speed

londonhanson
Alteryx
Alteryx
Created

How to Improve Snowflake Read and Write Speed

 

Environment Details

 

Tips and tricks for improving performance when writing to or reading from Snowflake within Designer.

  • Alteryx Designer
  • Snowflake
  • Simba Snowflake Driver
    • Version 2.25.1.1018
  • ODBC Data Sources Manager
 

 

Optimization Tips

image.pngimage.png
1. Limit the data. Snowflake is a columnar database, meaning the width of the data (number of columns) has more of an effect than the number of rows. Use the Visual Query Builder, custom query, or a Select Tool to deselect unnecessary columns.

image.pngimage.png
2. Use an Auto-Field Tool after an Input Data Tool for faster downstream processing. The Auto-Field Tool find the smallest possible size of each column and changes the data type to the smallest one possible, reducing the size of the data. 


image.pngimage.png
3. Remove/Disable all Browse Tools. Browse tools create temporary files of the data, so disabling or removing Browse Tools shortens the total execution time. To disable, navigate to: Workflow Configuration > Runtime > Disable All Browse Tools.


image.pngimage.png
4. Explicitly state the database name. If the database is not included in a Snowflake DSN or DSN-less connection string, the primary keys for the entire account are retrieved. These metadata queries are time consuming, so specifying the database allows for more efficient querying. Example, where "CS_DB" is the database name:
snowbl:DSN=Simba Snowflake;UID=cs_snf-00; PWD=__EncPwd1__;Staging=local; Method=user|||CS_DB.tablename

 image.pngimage.png
5. Decrease the logging level. Logging information adds to the driver execution time. When not actively debugging an error, it is recommended to lower the logging verbosity within the driver configuration (ODBC Manager > Select Connection > Configure > Tracing (0-6), with 6 being the most descriptive log level):

0 = Off

1 = Fatal

2 = Error

3 = Warning

4 = Info

5 = Debug

6 = Trace
 

image.pngimage.png
6. Use Bulk-Loading. This allows the data to be staged, or prepared, in another location before being written to Snowflake all at once. This can either be done locally or through Amazon S3 buckets. The three types of local staging are User, Table, and Internal Named (Snowflake). 

image.pngimage.png
7. Adjust the chunk size in the Bulk Load. The number of load operations that run in parallel cannot exceed the number of data files to be loaded. The chunk size recommended by Snowflake (Snowflake Whitepapers) for the optimal number of parallel operations is roughly 100-250 MB (or larger) in size compressed.

image.pngimage.png
8. If the above tips do not help, an upgrade to Designer version 2022.1+ is recommended. Defect TDCB-3986 relating to each chunk of data including PUT and COPY API commands is fixed, greatly improving write speed.
 
image.pngimage.png
9. Change the Bulk-Loading staging method. Use internal named stage or Amazon S3 buckets if configured, as this allows the preparation/processing of the staging files to be done in Amazon or Snowflake, respectively. Often times, this will have more available space/processing power than the client machine. 

image.pngimage.png
10. Use In-DB Tools whenever possible. Much like the staging methods mentioned in step 9, In-DB tools allow the processing to be done in the Database, rather than the client machine. 

Additional Resources