Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

How come when I run a job, the following two issues happen: - The column names are replaced with "column1, column2, etc" - The rows get shuffled out of order?

See screenshots of the data prior to running and after.

3 REPLIES 3
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi @Kimberly Nguyen? ,

 

Question 1: Column names not included in your output

You are seeing this behavior due to your CSV publishing settings. By default, when you create a CSV on GCS, Dataprep will create a multipart file. This is the standard behavior because multipart files provide increased reliability in the event of network interruptions, and increased performance because files can be written in parallel.

 

If you want to ensure that your output CSV files contain headers, you can modify your publishing settings to create a single CSV file with a header. Here's how:

 

1. Navigate to the flow that contains the output object you are trying to run. Click on the output object. From the details panel on the right side of the screen, click the tab that reads "Destinations".

 

2. The "Destinations" tab shows your configured output actions. Click on the button that reads "Edit" next to the "Manual destinations" header.

 

3. On the "Publishing settings" screen, hover over your configured publishing action. Click the "Edit" button that appears on the right side of the screen to modify your CSV creation settings.

 

4. On the right side of the screen, click the link that reads "More options" to display additional CSV publishing settings.

 

5. Notice that the "Multiple File" option is currently selected. To include headers, check the box that reads "Include headers as first row on creation". This will also change your output to produce a single file instead of multiple files.

 

6. Click "Update" followed by "Save Settings" to configure this output to produce a single CSV with headers. From this point forward, whenever you run a job, your output will contain headers and exist as one CSV file.

 

Question 2: Rows are shuffled in the output

This occurs because Dataflow is a distributed processing engine. At any given point in your job execution, different chunks of your data could be processed by different worker nodes. This changes the order of rows in your dataset when your final output is produced. Additionally, Dataflow itself does not include a native sort function. However, if you need to ensure that your output dataset contains rows in a specific order, you can use the RANK function inside a "New formula" transformation to sort your data. This would be the final step in your recipe. Here's how that function would look in Dataprep:

 

Let me know if this helps! If it does, please mark the answer as "Best" so that other users know your question has been resolved. :)

This worked, thank you. Question...what if I dont have a "unique ID" column to sort by in the rank? Is there another way to put a row index?

Trifacta_Alumni
Alteryx Alumni (Retired)

If your source data originates from a file, you can use the $sourcerownumber metadata parameter to sort your records in the RANK function.