Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

Is there a way to append new data from multiple files before starting the flow?

For example, one file has the first half of the year, and now I want to append the second half without manually merging the files in Excel ahead of time.

4 REPLIES 4
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Mr. Starke, are the files you would like to append local to your computer (i.e. are they excel files you are uploading to Trifacta) or are they in a database/hadoop file system?

I'm working with a bunch of CSV files that are currently on my desktop. I originally started a flow with just one of the files, but now I realize that I should probably combine them together somehow before doing all of my calculations in Wrangler. Can I make this work with just uploading the files from my desktop? Or should I store them in Hadoop? I'm not sure what would be best or if this is even possible.

 

My company owns the enterprise edition of Wrangler, if that makes a difference.

 

btw, just call me Toni :)

Trifacta_Alumni
Alteryx Alumni (Retired)

You got it, Toni!

 

There's two ways you can approach this.

 

Option 1:

  1. Import all of your CSV files into the same Flow
  2. Edit the Recipe of one of the files you would like to union
  3. (Optional) If there are any steps already in your recipe, select the first step, select the more options button (it's the three dot option to the right) and choose insert step before (we want to make sure the rest of the steps are applied to all of the appended datasets)
  4. Choose the Union transform from the Choose a transformation box
  5. Select the datasets you would like to Union together
  6. Select Align by name if all of the columns are named and you would like the columns to be matched by name, or Align by position if you would like the columns to be matched by position instead. (Note: you can manually match columns as well)
  7. Add to recipe when all set

 

Option 2:

  1. Upload all of the files to the same directory in Hadoop
  2. Import datasets to your Flow in Trifacta
  3. Locate the directory in HDFS
  4. Click on the '+' sign next to the directory and import & add to flow
  5. This will union (append) all of the datasets together before you even edit the recipe.

 

Hope this helps!

 

Thanks so much! I think I'm going to go with option 1 for now. It seems more straightforward for the short term.

 

Option 2 will be great if I decide that I want to run this flow every month and not have to deal with continually adding new files into the flow manually...but I'm not there yet. Really appreciate the thorough response.