Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

Merging list values of a column ( technology) when another column have similar values but not the same ( company name).

AntonioS
5 - Atom

Hi team. Great Job. One business case that i'm working. I need to merge the values of one column ( technology) when companies name are similar ( company column). Example:

Let's say : 1st column ( company name) values that are similar ( amc, amc sl, amc automotive ) and 2nd column ( technology). There is a list of technology used in the company.

 

The outcome that i would like to have it's having only one row with a company name: amc sl , and the technology values merged, and sorted. ("autodesk inventor", ...)

 

Any help for this? thanks

4 REPLIES 4
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi @Antonio S? This is a good question. I will put down the details for multiple scenarios to solve this. Here's the first scenario. The assumption i am making is all companies are unique ex: companies like amc, amc sl, amc automative will stay unique.

 

So first step is to create an array column of unique Technology grouped by Company as below using UNIQUE function:

 

Second step is to Pivot the data with Row labels as Company because we want grouped by Company and under values we take

Any(unique_Technology) and UNIQUE(Company,100)

Third we will concatenate the array columns using function ARRAYCONCAT()

Finally use ARRAYSORT function to sort the merged array column.

Hope this helps.

 

Best

Vardan

 

Trifacta_Alumni
Alteryx Alumni (Retired)

The issue with similar company names is really the business rule you want to formulate to identify similar named companies. In my hypothetical example Google, Google inc and Google drive belong to one entity Google. You could use a cluster cleaning feature in Trifacta to uncover similar sounding/written clusters and standardize them. This blog is helpful to know more about cluster cleaning https://www.trifacta.com/blog/trifacta-for-data-quality-introducing-smart-cleaning/

 

However you could probably run into scenarios that may need some kind of hardcoding and using doublemetaphone functions.

In my hypothetical scenario I can extract first four alphabets and use that column as a group by Company column in my above answer. But the demerit is this solution may not be scalable and accurate 100%.

 

I hope this somewhat helps your cause. In any case it would help me to know your business case in more details from the complexity and scalability and come with a better response.

 

Best,

Vardan

AntonioS
5 - Atom

Hi @Vardan Negi? thanks for your answer. Unique Function helped for 2 things #1 to clean list and duplicates in the array , creating unique values, and #2 helped for grouping the unique companies as your example. Great.

 

Another thing that you recommended is very useful. The cluster function and cluster cleaning helped with this case of different companies names. I arranged and standardise with similar written companies ( in my case company names: amc and amc sl are the same company, only difference is second one is legal name, and first name is the company name).

 

As you said i ran this different scenarios, and i had to check the clusters to standardise the data.

Clustering is quite accurate , running some manual standardisation help me to accomplish the job.

 

+1

Thanks for the answer.

Trifacta_Alumni
Alteryx Alumni (Retired)

Glad you found it helpful.