I am trying to group a list of URL based on their string values.
I want to group them into 5 clusters plus an additional cluster, where all the URLs that doesn’t match the criteria for any of the 5 buckets should be added.
I have been trying different functions including MATCH (which seems to work, but only for a single case) and the learn by example feature.
CASE on a single column seems to be the right function for the job, but I am having trouble making it work.
I have the following URLS
/XXX
/XXX/y
/XXX/x
/XXX/z/1
/XXX/z/2
I want them all grouped in a cluster named XXX, but I am having trouble getting the syntax right.
If I write “/XXX” n the “Value to compare” field dataprep finds the first of the above URL. I have tried to ad different wildcard characters at the end, but it seems like I only can find exact matches.
Its probably me having trouble getting the syntax right. I would greatly appreciate any help
Solved! Go to Solution.
Hi @S R? you can use MATCHES function to create a flagged column with 2nd parameter as a Trifacta pattern `/XXX`.
In case of case-conditions on single column you can put only exact matches eg: string literals.
But if you use case on multiple conditions, you can put condition as MATCHES(column1, `/XXX`) and value as 'XXX'
Best,
Vardan
Hi @Vardan Negi?
Thanks a lot. The multiple case conditions does the job., but raises an additional question :)
For one of the other URL groupings I need an exact match. Where I need match URLs that is exactly "/". Can I use "matches" for that? I tried using the exact function, but I can't make it work
@S R? this should work MATCHES(column1, `{start}/{end}`)
It does., Thank you very much!
:)