Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

If the pattern in the main column is different in every rows, how to solve and add the similar values in columns?

TrifactaUsers
10 - Fireball

Working on Healthcare Import data annually, and most of the columns are comparably in good state such as date, total amount, price, origin etc... Beside those values, the main column consisting the important information such as dosage, package No., type, trademark, manufacturer, usage sector etc... are all written in 1 cell of a column for relating rows and the great problem is each row is different than others around 300-500 paths respectively. Example:

Column A

Row 1 - Dosage, Package No, Type, Trademark, Manufacturer, Usage

Row 2 - Trademark, Size, Type, Dosage, Usage

Row 3 - Package No, Usage, Dosage, Type

Row 4 - Manufacturer, Usage, Size, Dosage

 

Alternative version:

Row 1 - A, B, C, D, E

Row 2 - B, E, D, C, J

Row 3 - L, M, P, A, .....

 

Most of the row details are delimited by colon ",". And when spliting by delimitor the details are splited but you can see that the column which i want name Dosage has trademark, package No, Manufacturer details in it.

 

Is there a solution to make them in correct column positions?

8 REPLIES 8
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tsolmon, Do you have a way of identifying what each row value is, whether it's a pattern, lookup table, etc.? For example, if each component of a row was a unique cell value, do you have a way of identifying that a dosage is a dosage, a package number is a package number and not a trademark, a manufacturer is a manufacturer and not a type, etc.?

If you could provide an example row, that would help me as well!

TrifactaUsers
10 - Fireball

Hello, Mr. David

 

Thanks for reply,

In attached please find the sample data for reference.

In sample file, there are English and Local versions plus the expected result when using Trifacta.

 

Please kindly note that data entrying source is man made, not a scanned bar code. Because of it, there are many peoples making data entry to the main system by each of their perspectives and output data is similar to this or more complex.

 

There is no identifier for noticing patterns and dosage is not a type or package etc.. But when seeing it, it's identifiable.

 

My process after split and extract process:

  1. Standardize the patterns. Example: CAP, caps, CapsUle, etc = Capsule (Stacked here)
  2. And putting the correct values in correct columns. (Stacked here)

 

Am i need to export file to Excel etc to need a manual wrangle and again to import to Trifacta and so on?

 

Any reply would be appreciated.

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tsolmon, thanks for sending this!

 

I'll use your product column as an example, and then hopefully it will give you an idea of how you might apply this to the rest of your data

 

Here are the steps I would recommend (there are quite possibly many solutions out there, this is the one that comes to mind for me).

 

Step 1: Create a lookup table mapping each possible value to a number representing the order you would like the resulting columns to be in.

 

Step 2: In Trifacta, Extract all of the values in the product column between the commas as a list.

This step will look like this:

Transformation: Extractlist

Col: Product

On: `{any}+`

Delimiter: ','

As: Product_List

This will create a list of all of the values present in each cell.

 

Step 3: Flatten the list

This step will take each value in the Product_List array and create a new unique row for it, duplicating the values from the other columns

Transformation: Flatten

Col: Product_List

Step 4: Now, you want to do a lookup on the now Flattened Product_List column. Before doing the lookup I trimmed leading and trailing whitespace, then made both the extractlist_product column and the product column in the lookup table uppercase.

This will return a number for each value in the product_list column listed under the Mapping column

 

Step 5: Pivot the Mapping column, use the Any() function to return the Product values, group by all of the other columns in your dataset to return the correct number of rows.

Transformation: Pivot

Column: Mapping

Functions: Any(Product_List)

Group by: Product, columnA, columnB, etc.

which will give the following result:

 

This there may need to be some extra work done, and the lookup table may need to be more complex (especially for the indication column), but hopefully this will serve as a guide for how you could possibly accomplish this.

 

Let me know if you have any additional questions!

David

TrifactaUsers
10 - Fireball

Hello, David

 

Thanks for the very detailed guide.

Followed the all steps and managed.

 

It gives me more options and insights for manipulating in wrangle process. I would like ask one more question about the lookup table manually created, in Pro or Enterprise versions is manually created lookup table necessary or there is other solutions to manipulate it in the wrangling process?

 

Thanks for your kind support on guiding,

 

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tsolmon,

There aren't any differences in the functions or transformations available between the free, pro and enterprise versions. For more info on how they differ you can look at Triafcta's products page on the website https://www.trifacta.com/products/.

 

Another way to accomplish the above without having to create a lookup table outside of Trifacta would be the following:

 

Transformation: Derive

Formula: CASE([IN(extractlist_Product, ["Medicine"]), 1, IN(extractlist_Product, ["injection", "solution for injection", "drops"]), 2, ...., 5)

As: Mapping

 

What the above says is if the current row in extractlist_Product is in the array containing "Medicine", output 1, if it's in the array containing "injection", "solution for injection", or "drops", output 2 and so on.

 

It might be easier for you to do it this way, it is a matter of personal preference.

 

But to your original question, this would be available in all versions.

 

Best,

David

 

TrifactaUsers
10 - Fireball

Hello, David

Thanks for the alternate option.

 

Please check the following issues:

  1. The standardize function (clustering) i am confusing about

On above figure i couldn't find the function. I have tried to change the types of column to String, Date/Time, # etc... to find the STANDARDIZE function but no success.

But on the youtube video there is STANDARDIZE function below the LOOKUP function

 

2. The PATTERNS option is not available on Free Wrangler

 

It's also confusing, and that is the reason i thought that Free, PRO, Enterprise versions are different in functions. And some hypothesis that this pattern is locked and is it somehow influencing the notifying patterns in the process?

 

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tsolmon, the video you are referencing is from a much older version of the product. The standardize feature was deprecated after that version. We have tried to hide all the videos that make reference to that feature but apparently we have missed some. Apologies for that being confusing!

 

As for Patterns, yes, I forgot that feature did not make it into the free version this past release. So yes, there is actually a slight difference in the wrangling capabilities of Free vs Pro vs. Enterprise for this current release. Again, apologies for the confusion!

 

TrifactaUsers
10 - Fireball

Hello, David

 

Thanks for the clarification.

Now it's clear.

 

Thanks for your support on guided steps.