Start your journey with Alteryx Machine Learning - Take our Interactive Lesson today!

Alteryx Machine Learning Discussions

Find answers, ask questions, and share expertise about Alteryx Machine Learning.
Getting Started

Start your learning journey with Alteryx Machine Learning Interactive Lessons

Go to Lessons
SOLVED

Data treatment

Daniel_cof
6 - Meteoroid

Hello, i have a dataset with a number of string that represent a given code:

For example FFF14, FGT25 and i whanted to transform each unique string into a unique value for example

FFF13  1

FFF14 2

FFF13 1

FGT25 3.

 

How can i do this?

9 REPLIES 9
DataNath
17 - Castor

Hey @Daniel_cof, here's one way you could go about this, whereby you just use a Summarize tool to Group By the codes, which gets you a distinct list. After that, you can just use the RecordID tool to assign an ID to each and then join back using the codes as the key:

 

DataNath_0-1681399263461.png

Daniel_cof
6 - Meteoroid

Thank you it works.

But is there no automatic way to to this to all string variables?

DataNath
17 - Castor

@Daniel_cof not sure what you mean by automatically do this for all string variables? If you want a slightly simpler option than I have provided above then you can also use the Tile tool like so and just use a Select to remove the sequence number field and rename [Tile_Num] as you wish:

 

DataNath_0-1681401419945.png

Daniel_cof
6 - Meteoroid

Thank you again for your answer.

What i mean by automatic is that i have a large number of columns like i described and if thre was no automatic way to for each colum i selec to replace each unique string value with a unique number for example:

 

Col1      -> replace col values with ->  Col1  Col2      -> replace col values with ->  Col2

FFF14                ->                               1          CC14                  ->                            1

FFF13                ->                               2          CC14                  ->                            1

FFF14                ->                               1          CC14                  ->                            1

FFF15                ->                               3          CC17                  ->                            2

FFF14                ->                               1          CC18                  ->                            3

FFF15                ->                               3          CC14                  ->                            1

FFF13                ->                               2          CC14                  ->                            1

martinding
13 - Pulsar

Hi @Daniel_cof

 

I suppose you are trying to do some label encoding for categorical data, and so you should have some sort of ID field.

 

And if you do have an existing ID field, then you don't need to use the Record ID tool, but I put it here to keep the rows in order.

 

You can build a batch macro for automating this:

martinding_0-1681421788764.png

 

Daniel_cof
6 - Meteoroid

Thank you this works for my problem

Eden60
7 - Meteor

You can use label encoding to transform each unique string into a unique numeric value. Here's a short example in Python using scikit-learn:

 

 

from sklearn.preprocessing import LabelEncoder

data = ["FFF13", "FFF14", "FFF13", "FGT25"]
label_encoder = LabelEncoder()
transformed_data = label_encoder.fit_transform(data)

for string, encoded_value in zip(data, transformed_data):
    print(string, encoded_value)

 

 

 

FFF13 1
FFF14 2
FFF13 1
FGT25 0

 

 

Label encoding assigns a unique numeric value to each unique string based on either alphabetical order or the order of appearance in the dataset.

stevediaz
7 - Meteor

Hello

 

To transform each unique string in your dataset into a unique value, you can use a Python dictionary to create a mapping between the strings and the unique values. below is a simple Python code snippet to achieve this:

 

dataset = ["FFF13", "FFF14", "FFF13", "FGT25"]
unique_values = {}
unique_value_counter = 1

for code in dataset:
if code not in unique_values:
unique_values[code] = unique_value_counter
unique_value_counter += 1

# Now unique_values dictionary contains the mapping of strings to unique values
print(unique_values)

 

Output:

 

{'FFF13': 1, 'FFF14': 2, 'FGT25': 3}

 

the code begins by creating an empty dictionary called unique_values, which will be used to store the mapping between strings and their corresponding unique values. A counter variable unique_value_counter is set to 1.

During the loop through the dataset, each code is checked. If it is not already present in the unique_values dictionary, it is added as a key, and a unique value is assigned as its value, while simultaneously incrementing the unique_value_counter by 1.

 

By the end of the loop, the unique_values dictionary contains unique strings as keys and their corresponding unique values as values.

Finally, you can utilize this unique_values dictionary to map each code in the dataset to its respective unique value. For instance, you can retrieve the value 2 by accessing unique_values["FFF14"]. 

Refer source : Golang Training

 

Hope it will help you. 

 

Eden60
7 - Meteor

To transform each unique string in your dataset into a unique numerical value, you can use a technique called label encoding. In Python, you can achieve this using libraries like sci-kit-learn. Import LabelEncoder, fit it to your dataset, and transform the strings into numerical values, assigning each unique string a unique label. Salesforce Admin Certification