Alteryx Designer Cloud Discussions

474b9d412fbd7c578209 · ‎09-30-2021

See screenshots of the data prior to running and after.

Trifacta_Alumni · ‎09-30-2021

Hi @Kimberly Nguyen? ,

Question 1: Column names not included in your output

You are seeing this behavior due to your CSV publishing settings. By default, when you create a CSV on GCS, Dataprep will create a multipart file. This is the standard behavior because multipart files provide increased reliability in the event of network interruptions, and increased performance because files can be written in parallel.

If you want to ensure that your output CSV files contain headers, you can modify your publishing settings to create a single CSV file with a header. Here's how:

1. Navigate to the flow that contains the output object you are trying to run. Click on the output object. From the details panel on the right side of the screen, click the tab that reads "Destinations".

2. The "Destinations" tab shows your configured output actions. Click on the button that reads "Edit" next to the "Manual destinations" header.

3. On the "Publishing settings" screen, hover over your configured publishing action. Click the "Edit" button that appears on the right side of the screen to modify your CSV creation settings.

4. On the right side of the screen, click the link that reads "More options" to display additional CSV publishing settings.

5. Notice that the "Multiple File" option is currently selected. To include headers, check the box that reads "Include headers as first row on creation". This will also change your output to produce a single file instead of multiple files.

6. Click "Update" followed by "Save Settings" to configure this output to produce a single CSV with headers. From this point forward, whenever you run a job, your output will contain headers and exist as one CSV file.

Question 2: Rows are shuffled in the output

This occurs because Dataflow is a distributed processing engine. At any given point in your job execution, different chunks of your data could be processed by different worker nodes. This changes the order of rows in your dataset when your final output is produced. Additionally, Dataflow itself does not include a native sort function. However, if you need to ensure that your output dataset contains rows in a specific order, you can use the RANK function inside a "New formula" transformation to sort your data. This would be the final step in your recipe. Here's how that function would look in Dataprep:

Let me know if this helps! If it does, please mark the answer as "Best" so that other users know your question has been resolved. :)

474b9d412fbd7c578209 · ‎09-30-2021

This worked, thank you. Question...what if I dont have a "unique ID" column to sort by in the rank? Is there another way to put a row index?

Trifacta_Alumni · ‎09-30-2021

If your source data originates from a file, you can use the $sourcerownumber metadata parameter to sort your records in the RANK function.

Alteryx Designer Cloud Discussions

How come when I run a job, the following two issues happen: - The column names are replaced with "column1, column2, etc" - The rows get shuffled out of order?