Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.

How do I tell if I have duplicate data in a column?

 
6 REPLIES 6
Trifacta_Alumni
Alteryx Alumni (Retired)

For a simple check of duplicate values, you can highlight the data with your cursor, which will trigger all duplicates to be highlighted and you can view them easily by moving up and down the dataset.

 

To find out how many duplicate values there are in each row:

  1. Follow the instruction above
  2. Select the "Countpattern" suggestion card that matches your requirement then click "add"
  3. The new column will show you how many matching values there are in each row

 

For the total number of duplicates:

  1. Select the newly generated column from the "countpattern" transform
  2. Select the "Aggregate" suggestion card that calculates the total number of duplicates
  3. The result will show a final number, which represents the total of duplicates.

 

For more information, click on the following,

https://docs.trifacta.com/display/PE/Deduplicate+Data

I'm not sure if this will work for string data. So I basically have a column of customer IDs and I want to make sure that every customer ID only occurs once. I'm not sure how the countpattern transform will accomplish this for me. How do I just see if customer ID "ABC" occurs once, customer ID "DEF" occurs once, etc?

TrifactaUsers
10 - Fireball

I have the same question.. Did you get a respons?

Trifacta_Alumni
Alteryx Alumni (Retired)

Gina answered the original question, which was how to "tell" or "see" if there is duplicate data in a column -- always remembering that the data visible in Wranger is a sample and may not represent your entire data set, depending on how large that is.

 

Actually enforcing uniqueness is also possible: see the Deduplicate Data page in the docs (https://docs.trifacta.com/display/SS/Deduplicate+Data). In the simplest case, whole rows may be duplicated -- see the Deduplicate Transform section. More likely, the data will contain multiple, differing rows with the same primary key (e.g., customer ID). See the Deduplicate Rows Based on a Primary Key section. Note that you will probably have to do some normalization and/or sorting of the relevant column(s) first. And of course, under this approach the row with the first instance of a given primary key value wins.

TrifactaUsers
10 - Fireball

Thnx.. we did it like this.. it works

 

 

 

TrifactaUsers
10 - Fireball

image