Alteryx Designer Cloud Discussions

AntonioS · ‎11-20-2019

Hi team. Great Job. One business case that i'm working. I need to merge the values of one column ( technology) when companies name are similar ( company column). Example:

Let's say : 1st column ( company name) values that are similar ( amc, amc sl, amc automotive ) and 2nd column ( technology). There is a list of technology used in the company.

The outcome that i would like to have it's having only one row with a company name: amc sl , and the technology values merged, and sorted. ("autodesk inventor", ...)

Any help for this? thanks

Trifacta_Alumni · ‎11-20-2019

Hi @Antonio S? This is a good question. I will put down the details for multiple scenarios to solve this. Here's the first scenario. The assumption i am making is all companies are unique ex: companies like amc, amc sl, amc automative will stay unique.

So first step is to create an array column of unique Technology grouped by Company as below using UNIQUE function:

Second step is to Pivot the data with Row labels as Company because we want grouped by Company and under values we take

Any(unique_Technology) and UNIQUE(Company,100)

Third we will concatenate the array columns using function ARRAYCONCAT()

Finally use ARRAYSORT function to sort the merged array column.

Hope this helps.

Best

Vardan

Trifacta_Alumni · ‎11-20-2019

The issue with similar company names is really the business rule you want to formulate to identify similar named companies. In my hypothetical example Google, Google inc and Google drive belong to one entity Google. You could use a cluster cleaning feature in Trifacta to uncover similar sounding/written clusters and standardize them. This blog is helpful to know more about cluster cleaning https://www.trifacta.com/blog/trifacta-for-data-quality-introducing-smart-cleaning/

However you could probably run into scenarios that may need some kind of hardcoding and using doublemetaphone functions.

In my hypothetical scenario I can extract first four alphabets and use that column as a group by Company column in my above answer. But the demerit is this solution may not be scalable and accurate 100%.

I hope this somewhat helps your cause. In any case it would help me to know your business case in more details from the complexity and scalability and come with a better response.

Best,

Vardan

AntonioS · ‎11-23-2019

Hi @Vardan Negi? thanks for your answer. Unique Function helped for 2 things #1 to clean list and duplicates in the array , creating unique values, and #2 helped for grouping the unique companies as your example. Great.

Another thing that you recommended is very useful. The cluster function and cluster cleaning helped with this case of different companies names. I arranged and standardise with similar written companies ( in my case company names: amc and amc sl are the same company, only difference is second one is legal name, and first name is the company name).

As you said i ran this different scenarios, and i had to check the clusters to standardise the data.

Clustering is quite accurate , running some manual standardisation help me to accomplish the job.

+1

Thanks for the answer.

Trifacta_Alumni · ‎11-26-2019

Glad you found it helpful.

Alteryx Designer Cloud Discussions

Merging list values of a column ( technology) when another column have similar values but not the same ( company name).