Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove all after "DBA" (Doing Business As)

BonusCup
10 - Fireball

Hi,

 

I have some data coming in like the Source data below and having a hard time trying to clean it up.  I can do a bunch of contains filters for '(dba', '/dba', ' dba ' but I would think there is an easier way.  Can someone help with this?  TIA

 

SourceDesired
Name1 (DBA xxxxxxxx)Name1
Name, 2 (dba xxxxxxxx)Name, 2
Name, 3 LLC(dba)xxxxxxxxName, 3 LLC
Name 4/dba xxxxxxxxName 4
Name 5,Llc/DBA xxxxxxxxName 5,Llc
Name 6 dba xxxxxxxxName 6
Name 7 DBA xxxxxxxxName 7
BroadBandBroadBand
GoldbaumGoldbaum
diamonbackdiamonback
5 REPLIES 5
DataNath
17 - Castor

Hey @BonusCup, obviously worth testing on a larger sample size to see if there's any edge cases etc but the following does the trick for your examples:

 

3232.png

 

REGEX_Replace([Source], '(.*[a-z0-9]).*\b(?i)dba\b.*', '$1')

 

BonusCup
10 - Fireball

@DataNath I tested this on a larger set and it worked perfectly.  I did also find a couple others I need to remove also.  

 

c/o

C/O

C/o

 

I tried figuring out your regex using regex101 and adding those in but I didn't have any luck.  Can you help with those also and could you explain your regex so I can understand it in detail?  Thank you again.

DataNath
17 - Castor

Again I'd run this against a larger sample but should be a fairly simple adjustment:

 

3434.png

 

REGEX_Replace([Source], '(.*[a-z0-9]).*\b(?i)(dba|c/o)\b.*', '$1')

 

In terms of explaining the RegEx: We're looking for zero or more of any characters (to account for the spaces and commas etc in genuine names like Name, 3 LLC), up to a character that is a-z or 0-9 which should (unless any edge cases) signify the end of the legitimate name. We then check that this comes zero or more characters (to account for the possibility of a space or slash and so on as per your examples) before an instance of 'dba' or 'c/o' - with (?i) making it case insensitive - as a standalone word - ensured by surrounding this search in word boundaries (\b). We then just use a final zero or more char (.*) to signify the rest of the string.

binuacs
20 - Arcturus

@BonusCup another option

image.png

BonusCup
10 - Fireball

@DataNath I ran this against the full data set and every thing looks accurate.  Also, thank you for the detailed explanation of your regex.  I'm going to have to fully read up on it to get a better understanding.  Thanks again!

Labels