Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

How do I normalize a code and descriptor (two columns) to select only a single variation among many descriptors for a given code?

ahouhoulis
5 - Atom

For example, my data set includes many rows of transactions with the same vendors over time, each vendor represented by a unique Vendor Number. Each vendor's name, however, may vary over the years, so I end up with multiple distinct Vendor Names for any given distinct Vendor Number.

 

The end result I need is a single Vendor Name for any given Vendor Number. In the absence of a Vendor Table to pull from, I'd like to pick the most recently-paid transaction of a given Vendor Number and use its Vendor Name only.

 

Thoughts? I know how to do this in SQL, but I'm trying to leverage Trifacta's native functions and practices.

 

3 REPLIES 3
AMiller_Tri
Alteryx Alumni (Retired)

Hi @Adam Houhoulis? , thank you for your question!

You can achieve the desired result in a few recipe steps:

1.Create a ROWNUMBER() column

First, we'd like to identify which record has the most up-to-date vendor name for each ID. To do this, we'll use the window function ROWNUMBER(). We'll sort by date (descending, hence a minus ('-') sign before the column) and group by company ID:

 

2.erase values that aren't the current company name

Now we ensure that all the values in the company name column that aren't in the row with the rownum==1 will be replaced with NULL values. We'll do this by using the IF() function:

Note that instead of creating the 'rownum' column and using it, we could use the MAXDATE() function to indicate which record has the most up-to-date company name:

Although it saves us a step, I think the ROWNUMBER() approach is more readable and would recommend it.

 

3.fill in the blanks

Now, for each company ID, we have one company name; the other values of this column are empty for each group.

We can use the FILL() function to fill in the gap:

  • grouping by company_id column
  • reverse-sorting by date

 

Now for each company_id, the company name will always be the same; the most up-to-date company name.

Note that this is dynamic; the newest company name will be chosen each time you run the job.

 

also, these steps can be done in 1 recipe step via nesting all the functions:

I would recommend splitting the process into at least 2 steps to make the process more readable.

 

I hope this makes sense; please feel free to ask any additional questions.

 

Thanks,

Amit.

 

ahouhoulis
5 - Atom

Excellent! Thank you so much @Amit Miller? !

AMiller_Tri
Alteryx Alumni (Retired)

You're very much welcome @Adam Houhoulis? ! =]