Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

I need to pickup only the latest file from a cloud storage bucket (csv file) and import it in Dataprep for data cleansing. The file name has date appended at the end. I was not able to find any option in the scheduling option.

Trifacta_Alumni
Alteryx Alumni (Retired)
 
8 REPLIES 8
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi,

 

You need to use a Datetime parameter in your input dataset name so Dataprep job will automatically run with the filename containing today's date (for example).

But you can change date format (found in your file name) or the date range for your file name :

 

Victor

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi Victor,

 

I tried scheduling as per your instructions however on reading the csv file,It seems to be working fine however I need to check the output in the subsequent runs with append option picking up the latest file only,

 

Could you help me with another issue below:

 

Dataprep picked up the first line(header) as a record while reading from csv. Could you please help me with any option available to identify the first line of every new csv file read as a header.

 

Thanks!

Trifacta_Alumni
Alteryx Alumni (Retired)

At the beginning of your recipe you can add a step to force a row number (the first one for example) to be the header.

You add this step from the menu or by clicking on the left side of the grid on a specific row.

 

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi @Victor Coustenoble? ,

 

I tried scheduling as per your instructions, however on running the job in append mode all files in the cloud bucket are getting appended. I have tested this by deliberately placing a file with tomorrow's date (filename_07.06.2020) in my cloud bucket.

 

As per the rule, the file with tomorrow's date should not be processed today. But this is not happening as of now.Do you have any suggestions?

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Hi,

 

Are you sure of the range (and format) of your date parameter in your job input dataset ?

 

What's happen when you run the job from the UI (not from the scheduler) ? Does the job take also all files found in the bucket or just the file with today's date ?

 

Victor

Trifacta_Alumni
Alteryx Alumni (Retired)

Earlier ,I scheduled the job to run every 1 hour, so maybe that's one of the reasons why my record count was increasing after every hour as it was re-processing the same file hourly.I have rectified that error by changing the schedule to once a day. I shall keep you updated with my progress on the issue.

 

Trifacta_Alumni
Alteryx Alumni (Retired)

Thanks @Victor Coustenoble? . Its working fine now.

Trifacta_Alumni
Alteryx Alumni (Retired)

Great and thanks for the news !