Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

How do I normalize a code and descriptor (two columns) to select only a single variation among many descriptors for a given code?

ahouhoulis
5 - Atom

For example, my data set includes many rows of transactions with the same vendors over time, each vendor represented by a unique Vendor Number. Each vendor's name, however, may vary over the years, so I end up with multiple distinct Vendor Names for any given distinct Vendor Number.

 

The end result I need is a single Vendor Name for any given Vendor Number. In the absence of a Vendor Table to pull from, I'd like to pick the most recently-paid transaction of a given Vendor Number and use its Vendor Name only.

 

Thoughts? I know how to do this in SQL, but I'm trying to leverage Trifacta's native functions and practices.

 

3 REPLIES 3
AMiller_Tri
Alteryx Alumni (Retired)

Hi @Adam Houhoulis? , thank you for your question!

You can achieve the desired result in a few recipe steps:

1.Create a ROWNUMBER() column

First, we'd like to identify which record has the most up-to-date vendor name for each ID. To do this, we'll use the window function ROWNUMBER(). We'll sort by date (descending, hence a minus ('-') sign before the column) and group by company ID:

 

2.erase values that aren't the current company name

Now we ensure that all the values in the company name column that aren't in the row with the rownum==1 will be replaced with NULL values. We'll do this by using the IF() function:

Note that instead of creating the 'rownum' column and using it, we could use the MAXDATE() function to indicate which record has the most up-to-date company name:

Although it saves us a step, I think the ROWNUMBER() approach is more readable and would recommend it.

 

3.fill in the blanks

Now, for each company ID, we have one company name; the other values of this column are empty for each group.

We can use the FILL() function to fill in the gap:

  • grouping by company_id column
  • reverse-sorting by date

 

Now for each company_id, the company name will always be the same; the most up-to-date company name.

Note that this is dynamic; the newest company name will be chosen each time you run the job.

 

also, these steps can be done in 1 recipe step via nesting all the functions:

I would recommend splitting the process into at least 2 steps to make the process more readable.

 

I hope this makes sense; please feel free to ask any additional questions.

 

Thanks,

Amit.

 

ahouhoulis
5 - Atom

Excellent! Thank you so much @Amit Miller? !

AMiller_Tri
Alteryx Alumni (Retired)

You're very much welcome @Adam Houhoulis? ! =]