Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

How to make the changes in early "wrangles" work down through each wrangle in a flow?

Why are changes in early stages/steps of a flow not quickly evaluated for later use in the flow when I go to wrangle the later stage? What are the best ways to design flows so users can successfully "refactor" through a wrangle at an early stages and see the evaluated results immediately at later steps in the flow? Are there certain scales of data and data types that are better to use in order to make evaluation in the steps of a flow work more efficiently?

4 REPLIES 4
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tom,

 

Changes to a recipe do immediately propagate to downstream recipe steps so changes should get reflected when you go edit downstream steps/recipes.

 

There are, however, a few transformations that affect your dataset's schema that are dependent on the data present in the sample when you add them. Pivot, header, and values to columns are all 'data dependent' in this way. Changes to recipe steps before these steps won't change the columns they produce - they need to be edited and re-saved or deleted and re-added for changes to take effect. This is to prevent breaking schema changes based on changing data/samples though in some cases can require undesired manual intervention to resolve. We are thinking about ways of solving for this in upcoming releases.

 

If you're seeing other examples of changes not reflecting that is unexpected so can you please send a note with details to support@trifacta.com so we can investigate further.

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Tom, changes to earlier stages of a flow will require you to collect new samples in the later stages. The changes will be picked up in the new samples. You can also generate an output to validate at any recipe in the flow, and then use that output statically as a dataset going forward.

 

Athena

thanks all. this is good background. can you point me to a specific example (e.g. video, etc) of how to handle schema changes and resampling while flow refactoring?

TrifactaUsers
10 - Fireball

Tom,

For a schema change, I have raised similar issue. pls go through below the below link about the issue.

 

https://community.trifacta.com/s/question/0D51L000058pyfiSAA/trifacta-source-dataset-do-not-get-update-after-datatype-change-or-column-addition-to-the-source-table

 

suggestions:

  1. Use SQL in the data set source and have alias name for tables. (select a1,* from table a1)
  2. If you notice schema change, then what I did was modify the data set by just changing the table alias name to a2 (select a2.* from table a2), here the data set will get refresh and apply the schema change to all the flow where this data set is used.
  3. As the schema refreshed and changed, you will get new sample and there may be chance in the result set might get error or produce wrong data type. So in the flow, last few steps should be always, set the data type for all the fields as it should be in target table
  4. Preferably, If the target is hive, change the date type,state, ssn and zip to string before it load.
  5. If it is append to target, then make sure you have the right order of columns as per the target the table.

I could add more points, but let me know if this what you are looking for? thank you.