Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.

Importing and concatenation

Hello,

 

Im having a folder where are multiple files inside ( most of the columns are same, but some files might have extra different column than other files).

 

When I import the folder in trifacta from s3 bucket it automatically concatenates the files which is great but there is one problem. it repeats the column names

 

how can i fix it?

 

 

4 REPLIES 4
APrasad_Tri
Alteryx Alumni (Retired)

Hi @Giorgi Gobronidze?, thank you for reaching out!

 

This is happening because the naming conventions of the columns are different. For. e.g. date_added and Date Added are considered 2 different values. Trifacta considers them as different values.

 

In order to clean up the column entries please add a step to delete the rows containing text after import. For e.g. We know that the date_added field should contain only date-related values or #_of_occurence should be an integer value, so we clean up and delete any row with text for the #_of_occurence column.

 

Let me know if this resolves the issue and feel free to get back to us for any issues.

 

Best,

Apeksha Prasad

Problem was caused because one of the file is having extra column name which is not shared with other files. thats the problem. and I think i have to concat them manually first to fix the problem

Hey Apeksha,

 

I dont seem to fix the problem. As I have mentioned above some files having an extra one column or less columns.

 

Lets bring some examples.

 

File 1 has columns:

 

A B C D

 

File 2 has columns:

A B C H D.

 

 

OR second example

 

File 1 columns:

 

A B C D E

 

File 2 columns:

 

A B D E

 

this causes above mentioned problem, any idea how this can be solved?

APrasad_Tri
Alteryx Alumni (Retired)

Hi @Giorgi Gobronidze? ,

 

I see the problem you are trying to resolve. When we do a union, we need the same table structure across all tables. It means the column names, sequence of the columns, and total number of columns should be equal. E.g.

Table 1 has ABC columns

Table 2 has ABCD columns

Table 3 has ABD columns

 

Union or parameterization will pick table 1 and add all table data beneath it as

ABC

ABCD

ABD

 

It is going to produce incorrect data. To handle this situation, we suggest the below ways:

 

#1. Correct the table structure before importing the data in Trifacta and use the parameterization functionality to upload or union as a first step in the recipe of your flow.

 

#2. Import the data into the flow as individual data sets and create recipes for each data set. Prepare the table structure as needed. Union them in a branched-out recipe to get the final result.

 

Please do let us know if this helps.

 

Thanks,

Apeksha Prasad