Hi all,
I'm quite new to Alteryx and would like to know if someone can help guide me or provide examples to clean this dataset. This data is too hard to me to clean as I'm just a beginner.
I know that I need to combine all of the data from 2014 to 2021 into one. Should I pivot the data and do Vlook up? What should I do? Thank you for your time!
https://tinyurl.com/eeocdataset
Downloads
There are also some labels under page 6-12
Solved! Go to Solution.
When you say add years in there, I am assuming you mean the years in the data set itself right? Or do you mean by the name of the file?
What do you mean by eliminating the nulls? You mean remove them entirely or change them to something else? Do you mean rows or columns specifically? You can use a Select Tool to deselect the columns you do not want or use a Data Cleanse tool to remove null Columns. If you use the latter, make sure to untick changing values to 0 (if numeric) or blank (if string).
I am currently away from my computer. I can help later if you are stuck.
Re your point on getting the YEAR, I see in your data that there is no mention of Year as a data field, only in the title of the file. Since I renamed it via exporting the .xlsx as a .yxdb with year as its name, this is what I did:
However, if you decide to use the .xlsx as the input, then you can do the following (This is optional):
The REGEX script is:
(\d{4}(?=\sPUF\.xlsx))
Re your point on removing the nulls, I do not think that is a good idea after looking at your data. Now, if you attach a summarize tool to your stream, and group by the null fields, you will see that your vast data has some non-null values attached to them.
Because of this, you have to decide if you wish to keep them or drop them via a Select tool.
@caltang thank you for your help. I did remove nulls, which caused skewness for the data (Gender). What should I do?
https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/How-do-you-handle-the-nulls-An...