Alteryx Designer Cloud Discussions

4369a3091e37dd36eabf · ‎11-08-2018

Trifacta_Alumni · ‎11-08-2018

Hi, Victor--

Right now, the GDP scheduler does not support chaining together schedules. Your best bet for now is to schedule the jobs asynchronously and to ensure that the source file for Job 2 is cleared out after the job is run. That way, when job #2 runs again, it won't pull in the old data.

We are continuing to look for ways to improve our scheduling capabilities. You might keep an eye on the Cloud Scheduler, which is cron scheduling service:

https://cloud.google.com/scheduler/

If it can run Dataflow template jobs, you could use it to schedule GDP jobs.

Here's some doc on how to run Trifacta jobs using their dataflow templates: https://cloud.google.com/dataprep/docs/html/Run-Job-on-Cloud-Dataflow_99745844

Hope that is somewhat helpful.

Cheers,

-SteveO

4a0098b7f9b243add27a · ‎11-12-2018

As a follow up to @Steve Olson?'s answer: Depending on what the scheduled outputs are from the first flow, i.e. if it is a file(s) stored on GCS you can do is have the first job scheduled within GDP and have the second job triggered by the output of the first using Cloud Functions 'listening' to the GCS endpoint of the first.

4369a3091e37dd36eabf · ‎11-13-2018

Thank you both for your help.

As my flow is very complex I am going to divide it in different modules.

What I am going to try is to use deference datasets (output of module 1 will be input of module 2). Do you know if this approach will work?? When running module 2 (with a reference dataset from module 1), will it run again the module 1??

Thank you for your help, I will keep you updated if this works

Trifacta_Alumni · ‎11-13-2018

Hi, Victor--

Yes, that is correct. If you create a reference object that is the output of module (flow) 1, whenever you run a job that includes a reference dataset for that source object, all upstream dependencies of that reference dataset are executed.

Please keep in mind:

Permission issues may apply. For example, if that reference depends on a connection or dataset to which you personally do not have access, your job may fail.
If upstream objects are owned by other users, you may not be able to see those objects. Things may change underneath you, and module (flow) 2 recipe may break without explanation.

If you are concerned with the above issues, you can do the following:

Module 1 generates an output.
When this job is run, you export results to create a new dataset.
This new dataset becomes your input for module (flow) 2.

In this case, however, you do have to re-run module (flow) 1 in order to get fresh data into flow 2.

Does that help?

Cheers,

-SteveO

4369a3091e37dd36eabf · ‎11-13-2018

Great!! That is really helpful. I still have to study which solution to use, but this will solve my issues.

Regarding what you say about permission issues, can Dataprep get data from different project??. I am having trouble to import datasets in my library than come from other sources. I would like to understand if these issues are related with permissions or if they are due to the lack of possibility to get data from different projects.

Thank you very much!

Trifacta_Alumni · ‎11-13-2018

Thanks, Victor. Short answer: yes.

Here is some doc to assist you in getting started with cross-product access:

https://cloud.google.com/dataprep/docs/concepts/cross-bq-datasets

https://cloud.google.com/dataprep/docs/concepts/gcs-buckets

Cheers,

-SteveO

4369a3091e37dd36eabf · ‎11-21-2018

Hello again,

Thank you for our answer. I am having trouble to connect to other projects. While I can work properly with Bigquery on other projects but not in Dataprep. I give an example to see if you can help me:

I am working on project "100" with dataprep on my account "juan@gmail.com" but I want to access project "200" (I am not owner of this project). From the given information I understand that in project 200 the owner has to give access as viewers to the accounts:

200-compute@developer.gserviceaccount.com

service-200@trifacta-gcloud-prod.iam.gserviceaccount.com

Is that correct? Is it enough with viewer permission? Does the other owner need to give me further access to the account juan@gmail.com?

Thank you for your help

Alteryx Designer Cloud Discussions

Is it possible to schedule one flow after another ?. I mean that I have different flows and I would like to start the second one once the first one finishes, as it need the data from the first flow. Thank you very much!