Hello,
Im having a folder where are multiple files inside ( most of the columns are same, but some files might have extra different column than other files).
When I import the folder in trifacta from s3 bucket it automatically concatenates the files which is great but there is one problem. it repeats the column names
how can i fix it?
Hi @Giorgi Gobronidze?, thank you for reaching out!
This is happening because the naming conventions of the columns are different. For. e.g. date_added and Date Added are considered 2 different values. Trifacta considers them as different values.
In order to clean up the column entries please add a step to delete the rows containing text after import. For e.g. We know that the date_added field should contain only date-related values or #_of_occurence should be an integer value, so we clean up and delete any row with text for the #_of_occurence column.
Let me know if this resolves the issue and feel free to get back to us for any issues.
Best,
Apeksha Prasad
Problem was caused because one of the file is having extra column name which is not shared with other files. thats the problem. and I think i have to concat them manually first to fix the problem
Hey Apeksha,
I dont seem to fix the problem. As I have mentioned above some files having an extra one column or less columns.
Lets bring some examples.
File 1 has columns:
A B C D
File 2 has columns:
A B C H D.
OR second example
File 1 columns:
A B C D E
File 2 columns:
A B D E
this causes above mentioned problem, any idea how this can be solved?
Hi @Giorgi Gobronidze? ,
I see the problem you are trying to resolve. When we do a union, we need the same table structure across all tables. It means the column names, sequence of the columns, and total number of columns should be equal. E.g.
Table 1 has ABC columns
Table 2 has ABCD columns
Table 3 has ABD columns
Union or parameterization will pick table 1 and add all table data beneath it as
ABC
ABCD
ABD
It is going to produce incorrect data. To handle this situation, we suggest the below ways:
#1. Correct the table structure before importing the data in Trifacta and use the parameterization functionality to upload or union as a first step in the recipe of your flow.
#2. Import the data into the flow as individual data sets and create recipes for each data set. Prepare the table structure as needed. Union them in a branched-out recipe to get the final result.
Please do let us know if this helps.
Thanks,
Apeksha Prasad