Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

I would like to partition and cluster a BigQuery Output table in Datapr. How can I do this since you cannot partition a table after it has already been created. I am evaluating using SQL scripts for this, but options of when to run don't seem applicable

jvogeler
5 - Atom
 
1 REPLY 1
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Jennifer, this is a great question and there's more than one answer depending on what works best for your process. It is possible to create an empty table with a partition before you execute your Dataprep job and append your transformed data to that table. In this case, you could click on Add a SQL Script on the Job Run page and create the table before data is ingested.

 

Alternatively, which might be a cleaner option is to partition and cluster a table after it has already been created by adding a new sql script to run after your job has completed in Dataprep.

 

You can create a partition and cluster a table from a query result. For example, once you have finished transforming your data in the Dataprep UI, you’ll click on the Run button in the upper right-hand corner to go to the Run Job page. Towards the bottom of the page locate the Add New SQL Script button. Add your SQL script to create a new table from your existing table (the output from this job) and define your partition column. Make sure you update your criteria to Run After Data Publish (located just above the Validate SQL button) before you click on Add.

 

In my example below, I created a flow with a recipe and after transforming my data I'm going to write my table to BigQuery and call it sales_data_db. Then I'm going to click on Add SQL Script. Here I'm going to create a new table with a partition on date of last timestamp and cluster by customer city name. My new table will be called sales_cluster_by_city.

 

 

Let's say you want to drop the original table. Then you'll want to add another SQL script. In this one we will drop the original table name.

 

Your run job page should look similar to this:

 

Keep in mind that every time this job is ran you will be creating a new table, creating another table with a partition and cluster, and dropping the first table created in this pipeline. You will have to identify a process that works best for you and your use case but this is one way that you can accomplish creating a partition and clustering an output table from Dataprep.