Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

Replace Transform with Pattern Match Before 4 Digit Number

I'm a new user and am mostly through a Udemy course on Trifacta. Wanting to jump in and get going, I have an immediate need to clean up some of my real data. I have a series of concatenated SIC codes in a single cell. I would like to separate each SIC code with its short description into separate columns. My strategy is to replace every comma before a 4 digit SIC with a semicolon. Note, some descriptions include commas.

 

I have successfully inserted a semicolon but it did not "replace the comma." By replacing the comma, I do not need to remove the commas at the end of each SIC description after the columns are split, saving me steps. 

 

A sample of the data is attached as well as my recipe wrangle.

 

I'm loving Trifacta Wrangler and am eager to put it to real work. I greatly appreciate any community support.

 

Best,

Bryon 

2 REPLIES 2
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Bryon, thanks for reaching out!

 

Looking at the recipe you attached, it looks like your replace step reads

replace col: SIC with: ';' global: true after: `,` before: `{digit}{4}`

Which will replace the nonexistent 'space' between the comma and 4 digit code.

 

What you want instead is something like this:

replace col: SIC with: ';' on: ',' global: true before: `{digit}{4}`

 

The difference is that in the second case, you are replacing on the comma, rather than replacing after the comma. Specifying `{digit}{4}` as you did will prevent any of the other commas in the description from getting picked up.

 

I've included a recipe that you can use to follow. Note, I added a couple of steps after the split, in case you want to use those as guidance. The latter steps will unpivot all of the columns (so that you have one column with the column headers, and one column with all of the row values i.e. the 4 digit SIC code and corresponding description).

 

Hope this is helpful :)

David

Awesome, David! That did the trick. Thanks for the quick response!