Alteryx Designer Cloud Discussions

4c278664ee974166e813 · ‎09-12-2022

Having a problem with automatic concatenation. when I import a whole folder with multiple files in it it concats automatically.

About the files in folder:

most of the columns are same same in the files, but some files might have an extra column that other files does not contain. Also its possible that one common column is not in one specific file. Below image explains better the problem and under the image i have desired outcome described.

on the left you can see 1,2,3 which indicates the files. what u see above the firs black line is the data from file 1 and so on.

Looking at the image, Priority column is not part of the file 2 data (data between 2 black line). so instead of Priority Trifacta put the next column values under the wrong header. So header indexes are mixed and the concatenation is wrong.

I was hoping that Trifacta could detect this autoamtically and the column that is not part of the files would automatically be null values.

Same applies to the header in blue circle. Jira ID field is not part of the 1 and 3 data set but its part of second file. So I need new column called JIra ID and files with no such column should have null values.

AMiller_Tri · ‎09-13-2022

Hi @Giorgi Gobronidze? , thank you for your question!

I've noticed you've already discussed the matter with @Apeksha Prasad? in this thread.

To summarize Apeksha's last suggestion for future reference -

___

It is going to produce incorrect data. To handle this situation, we suggest the below ways:

#1. Correct the table structure before importing the data in Trifacta and use the parameterization functionality to upload or union as a first step in the recipe of your flow.

#2. Import the data into the flow as individual data sets and create recipes for each data set. Prepare the table structure as needed. Union them in a branched-out recipe to get the final result.

___

See what Apeksha is suggesting: instead of using the input parameterization feature, use the union tool from within the Transformer Grid itself (or create a union step from the Flow Canvas). Or fix all the schemas so they're shared, and only then use input parameterization.

That's because when columns are missing or being added in the middle of the schema (aka - 1st table has columns A|B|C and table 2 has either columns A|C or A|D|B|C), Trifacta can't "guess" what you, the user - expect it to do. The input parameterization method is meant for files with a shared, static schema.

An exception would be if the 1st table has a schema of A|B|C and a later table has a schema of A|B. So, the last column would simply yield empty\null values; but that's the extent of flexibility that parameterized input can have.

To your point, input parameterization doesn't have the logic to search for similarity in column names between headers; that ability is reserved only for the manual union tool (where there's the option to "auto-align")

However, that's an excellent idea for a feature! Please post it in the ideas section on this website, and if it gets enough traction from the community - it might get added in future versions of the product!

Regarding the matter, might I ask how many tables you're looking to union?

As it stands, our current suggestion is to use the manual union tool - and align the columns precisely as you see fit. You'd be surprised how helpful the union tool can be, even if it's not entirely "automatic". Only columns that don't align in name will be dropped to the lower section, and you'll be able to fix them in no time!

Please see documentation regarding the union tool here and input parameterization here

I hope this helps; please feel free to ask any further questions.

Thanks,

Amit.

4c278664ee974166e813 · ‎09-13-2022

Could you please explain step 1 in more detail. Where do I do that and how. Maybe attach the image. Im pretty new to tool.

To tour question :how many tables you're looking to union?

Number of tables will increase every month per new project.

AMiller_Tri · ‎09-13-2022

Hi @Giorgi Gobronidze? ,

Step 1 - we're referring to the option of fixing the structure of the files outside of Trifacta. E.g., in Excel\the database where the tables reside.

The "step 2" refers to something similar, but instead of fixing the structure outside of Trifacta - import all the files to a flow in Trifacta, fix the structures of all the files separately and then follow up with a union step.

Alternatively, you can use the union step from the get-go and the Union's UI to navigate the different schemas.

4c278664ee974166e813 · ‎09-13-2022

well, unfortunately to fix the4 structure outside of the Trifacta is out of option.

Alteryx Designer Cloud Discussions

Concat data