Alteryx Designer Cloud Discussions

404c8e6ac806a3c05f90 · ‎10-09-2020

Hi everyone!

I'm wrangling a dataset that most columns are divided in multiple lines. In the image I posted, for example, the number 13.7500 and 13.9024 should be in the same column, but in the first and second row respectively. And the values between these numbers should be on separated columns. So, each blue box in the image represents a single row.

Anyone knows how to wrangle this?

Thanks in advance!

AMiller_Tri · ‎10-09-2020

Hi @Daniel Sacramento? , thanks for posting this question.

If you could elaborate a little on the issue.

are the values in each blue box currently in 1 field each, and you'd like them separated into columns? Which means - you're currently having 2 records, but with 1 column - instead of multiple?
or something else?
If you could send a screenshot of how the data looks like within the transformer page it could give me some context.

In general - it looks like selecting the multiple spaces between 2 values, then hitting the suggestion option of "split upon" should solve this, just wanting to make sure I understand

Thanks,

Amit.

404c8e6ac806a3c05f90 · ‎10-12-2020

Hi Amit, thanks for your answer!

Your first point is correct. All the records are in "column1". After the wrangling, the blue box should be "flattened", so each value corresponds to a column.

I will post another screenshot, I hope this helps!

AMiller_Tri · ‎10-13-2020

Thanks for the clarification @Daniel Sacramento? .

So now we see that, not only all the values are in 1 column - but what you refer to as "1 record" is spread across multiple records.

Is this how Trifacta shows the data to you, the moment you open the file?

The reason I'm asking that is because defaultly - Trifacta create a new line each time it's bumped into "newline" (\n).

So, you'd like it to drop to new line only... after "a few newLines".

In order to understand better how your file is structured, you can -

click "remove structure" from the file, once imported into Trifacta. That way, you'll have fine-grained control as to when a new line is being created.
Send the file here (or at least a portion, or only these 2 records I'm seeing), so we'll get a better look at it, and understand what's causing it to drop lines the way it does.

Thanks,

Amit.

404c8e6ac806a3c05f90 · ‎10-13-2020

Thanks again @Amit Miller? .

When I import a new file into Trifacta, by default I uncheck the "Detect file structure", or something along those lines. I will send you the file here. When you open it, you will see that the first rows are informations about the measurement and the data itself starts at row 54.

Best regards.

AMiller_Tri · ‎10-13-2020

Thanks for the file @Daniel Sacramento?

You don't necessarily need to "unstructure" the file. Simply create a rownum() column (out of the $sourcerownum metadata parameter) and then remove all records which are of rownum=53 and lower (where the numbers start).

Now, this has been very interesting - there are 4 possible solutions (at least). I will present you with all of them. Guessing the first one is the most straight-forward.

The main goal here is to make sure all relevant numbers are in the same record.

In order to do that, we observe how many records actually belong to the same original row.

As per your explanation, each "original row" is composed out of 5 records, presented in Trifacta.

In order to emphasis that each 5 consequent records are "1 row", we can create this column:

ceiling(rownumber() / 5)

ordered by: $sourcerownum

so the first 5 records will have value of 1, next 5 records will have value of 2, etc.

So now, each 5 records have same value in rownum column.

Now we can use the pivot() transformation to change the structure of the table.

rowlabel - rownumcolumn
values - list(numbersColumn)

now for each record, we have all the numbers we need! (in an array)

option A

Now, do

"mergearrayelements"
split column by delimiter (get help through suggestions). Delimiter = ` +` (unlimited number of spaces, because it changes between 3 to 5 spaces)
in "advanced", decide how many columns to create (spoiler - you need ~25))

Option B

instead of mergingArrayElements - create a new array, with each element representing a number.

so we'll do "extract matches to array" from the array we've created, while the element pattern to extract will be `{number}`. Using Trifacta Pattern to our aid!

Now that we have an array of 25 numbers - we can use "unnest", and you can manually write all the elements you want to extract (so elements 0-24).

Options C+D are a bit less straightforward (revolving around range() function, creating objects, and unpivoting the table))

If you're curious, I can elaborate on those aswell, either through here or through a short zoom session.

Hope this makes sense, feel free to ask questions!

Thanks,

Amit.

404c8e6ac806a3c05f90 · ‎10-19-2020

Sorry for the delayed response, I was working on another project. But I can't thank you enough @Amit Miller? , very helpful and clever solution! I used the option A and worked like a charm!

AMiller_Tri · ‎10-19-2020

Great, glad to hear that @Daniel Sacramento? !

Happy wrangling.

Alteryx Designer Cloud Discussions

How to turn multiple lines in a single row?