Alteryx Designer Cloud Discussions

4290b9a70c03bc7ed4ca · ‎06-02-2021

I am working with data that is input into free text fields, and can pretty much take any value. Since the data is coming from a third party application, I don't have any control over validating the inputs.

I am seeing fields in the raw data that are coming in as just "\", or "legitimate information\". This causes issue when Dataprep\Trifacta is initially reading the file, and I don't know how to fix this. The data is coming in csv format.

Does anyone have any insights on how to work with un-escaped backslash characters?

Trifacta_Alumni · ‎06-02-2021

Hi @Floyd Winkelaar?,

By default, when you import files, Trifacta creates a structured dataset. This means that Trifacta automatically attempts to infer the structure of your file to create a header, rows, and columns. However, Trifacta's automatic structure inference does not always produce the intended results when you import a file with unescaped backslash characters.

The best way to work with files that may include un-escaped backslash characters is to create unstructured Trifacta datasets. You can remove the default structuring when you first import your file into Trifacta. Locate the card that represents your dataset on the right side of the "Import Data" screen and click on the text that reads, "Edit Settings":

This will open a pop-up that shows a preview of your dataset. At the bottom of the preview, you should see a checkbox that reads, "Detect structure (recommended)". Uncheck this box to create an unstructured dataset.

After unchecking the "Detect structure" option, your preview will update. Click "Save" to create an unstructured dataset.

Go ahead and add your unstructured dataset into a flow. It should display like the image below:

At this point, you need to add a recipe to your unstructured dataset. When you add a recipe to an unstructured dataset, Trifacta will create a set of initial steps to create rows, columns, and a header. Since these steps are generated inside a recipe, you will be able to adjust the logic to account for any formatting quirks like unescaped backslash characters. You can see the automatically generated recipe steps in the screenshot below:

Edit the recipe to adjust the recipe steps. You want to edit the step that splits your data into columns. In this example, the split step is the second step in the recipe.

When the "Split column" transformation opens, you should see a field at the bottom of the builder interface that reads "Ignore matches between". By default, Trifacta populates this field with an escaped quote. Since your data includes backslashes, this field causes your columns to be created incorrectly. Go ahead and delete the text from this field. Once you have done this, your transformation and preview will look like the image below:

That's it! Let me know if this helped; if it did, please mark the answer as "Best" so that other users know your question has been resolved.

4290b9a70c03bc7ed4ca · ‎06-03-2021

Thank you for your response, this worked.

Alteryx Designer Cloud Discussions

How do you deal with un-escaped backslash characters "\"?