Alteryx Designer Cloud Discussions

4c278664ee974166e813 · ‎09-09-2022

Hello,

Im having a folder where are multiple files inside ( most of the columns are same, but some files might have extra different column than other files).

When I import the folder in trifacta from s3 bucket it automatically concatenates the files which is great but there is one problem. it repeats the column names

how can i fix it?

APrasad_Tri · ‎09-11-2022

Hi @Giorgi Gobronidze?, thank you for reaching out!

This is happening because the naming conventions of the columns are different. For. e.g. date_added and Date Added are considered 2 different values. Trifacta considers them as different values.

In order to clean up the column entries please add a step to delete the rows containing text after import. For e.g. We know that the date_added field should contain only date-related values or #_of_occurence should be an integer value, so we clean up and delete any row with text for the #_of_occurence column.

Let me know if this resolves the issue and feel free to get back to us for any issues.

Best,

Apeksha Prasad

4c278664ee974166e813 · ‎09-12-2022

Problem was caused because one of the file is having extra column name which is not shared with other files. thats the problem. and I think i have to concat them manually first to fix the problem

4c278664ee974166e813 · ‎09-12-2022

Hey Apeksha,

I dont seem to fix the problem. As I have mentioned above some files having an extra one column or less columns.

Lets bring some examples.

File 1 has columns:

A B C D

File 2 has columns:

A B C H D.

OR second example

File 1 columns:

A B C D E

File 2 columns:

A B D E

this causes above mentioned problem, any idea how this can be solved?

APrasad_Tri · ‎09-12-2022

Hi @Giorgi Gobronidze? ,

I see the problem you are trying to resolve. When we do a union, we need the same table structure across all tables. It means the column names, sequence of the columns, and total number of columns should be equal. E.g.

Table 1 has ABC columns

Table 2 has ABCD columns

Table 3 has ABD columns

Union or parameterization will pick table 1 and add all table data beneath it as

ABC

ABCD

ABD

It is going to produce incorrect data. To handle this situation, we suggest the below ways:

#1. Correct the table structure before importing the data in Trifacta and use the parameterization functionality to upload or union as a first step in the recipe of your flow.

#2. Import the data into the flow as individual data sets and create recipes for each data set. Prepare the table structure as needed. Union them in a branched-out recipe to get the final result.

Please do let us know if this helps.

Thanks,

Apeksha Prasad

Alteryx Designer Cloud Discussions

Importing and concatenation