Engine Works

BrandonB · ‎09-07-2021

Are you tired of downloading exports from websites and changing your Input Data tool to bring in the latest file? Do you wish that your Alteryx workflows could automatically pull the latest data into your workflow so that you can schedule them on Alteryx Server without manual intervention? You may be in luck! Alteryx has the capability of connecting to something called an API which can do exactly that!

What exactly is a Web API?

API stands for Application Programming Interface and is a web development concept that allows you to access content from an external vendor in a nice format. This means that instead of downloading files to use in your analysis, you may be able to directly pull this data into the workflow directly from the source without dealing with files at all. This is different than web scraping, in which an entire webpage of data at formatting is pulled, and then complex parsing techniques are used to filter through extra formatting. An API is desirable because the data will be coming in as a format that does not require advanced regular expressions to get it into a workable state.

How do I figure out if the website that I download files from has an API?

The easiest way that I have found is to web search for “CompanyNameHere API Documentation.” This will help you to validate whether the source you would like to pull from has an API available, and the documentation is always the starting point when you are going to set things up within Alteryx Designer. Sometimes these documents can be found in “Developer” or “Resources” sections of company websites. You only need to worry about finding this documentation at this time. Below is an example of Yelp’s documentation that you would be looking for:

I found the API documentation, now how does it work?

The Download tool is going to be the primary focus of this article, although you may decide that you like to use the Python tool or R tool if you enjoy using code more. The Download tool uses what is called an Endpoint to pull data from an API. An Endpoint looks like a regular website URL at first glance, but it is a special kind of link that will return data directly into the workflow in a nice format to be used for analytics.

Where should I start?

The first step is looking at the API documentation to see what type of API it is (REST vs SOAP) and what kind of authentication it uses. Full transparency, this isn’t always the easiest to figure out but most of the time they will be REST APIs. If they are SOAP APIs, additional steps need to be taken (make note of this - you need to create the xml payload with a Formula tool).

For APIs that require authentication, this is like your security badge to get access to the data. You may see that it uses no authentication, Basic authentication, API Key Authentication, or Oauth. Try taking a look at the “Getting Started,” “Introduction,” or “Authentication” sections of your API documentation to check what authentication is needed, if any. Here is Yelp’s example which states that an API key is used. Make note of this method as it will influence the way that your workflow is built.

After a brief look at how the Download tool works, we are going to look at four examples of API calls. One does not leverage authentication, one uses Basic Authentication, one uses an API Key, and one uses Oauth. If you would like to skip ahead, examples of authentication for API calls can be found here.

How does the Download tool work for pulling data from APIs?

The Download tool documentation can be found here but we will cover the basics.

In the configuration window, the Basic tab has a variety of options that are used to retrieve data from a URL. The URL field is expecting a website link or API endpoint to be passed in so that it can connect and pull data. The Encode URL Text checkbox can generally remain checked because it converts unsafe ASCII characters into a format that can be transmitted over the internet. The other settings can all remain the same because for most API calls you will want your data downloaded as a string data type. Some scenarios may permit downloading files to temporary locations, but we won’t worry about that for now.

The second tab in the configuration window, Headers, is generally used to send authentication information to the API. You will commonly see API Keys sent as part of the headers, but ultimately the documentation will dictate where this information needs to be included. The Header values can either be hard coded as a Name/Value pair by clicking the Add button at the top, or they can be dynamically read in from upstream in the workflow by checking the boxes for the corresponding column names. This can be helpful when an authentication component is dynamically generated like in the case of an Oauth flow.

The third tab in the configuration window, Payload, is used to pass parameters that specify what data is to be retrieved. These payload parameters can be thought of similar to filters in that they are narrowing down the specific dataset to be returned by the API. Similar to the Headers, they can either be hard coded as a Name/Value pair, or the box can be checked to read the value from a column that is coming in from upstream in the workflow. Alternatively, you can also create a Query string upstream in the workflow like in a formula tool and then reference that by selecting the “Take Query String/Body from Field” option. You can also “Use Following for Query String/Body” if you would like to hard code the query into the bottom of the payload tab. Up at the very top of the Payload tab, there is a dropdown for whatever HTTP action the API has specified is needed for the endpoint. The GET and POST options will be used most commonly, but you have access to all of the standard HTTP Actions.

Finally the fourth tab is “Connection” which won’t be used for most API connections so we can disregard that for now.

What do examples of each authentication type look like?

No Authentication:

API Key Authentication:

Basic Authentication:

Oauth 2.0:

As you begin to build out your workflow, pay close attention to the API documentation. There will be some headers and payload values that are optional, but this will all be outlined in the documentation. For those of you that may already be using something like Postman and have a working API call, I would highly suggest clicking the code button on the right hand side so that you can see the cURL equivalent. Your headers and payload values will be easily identified from the snippet that is shown.

Once your API call has been successfully configured, you will likely see the results as either a JSON or comma separated string under the “DownloadData” column (JSON is generally more common). If you connect a JSON Parse tool after your Download tool, it will parse the JSON into a JSON Name and JSON ValueString column as shown below:

JSON takes the format of a key value hierarchy so to get our data into a nice tabular format we will want to use a Text to Columns tool to split the JSON Name column on the period delimiter and then follow up with a crosstab. We will use the new data as column headers and group by the numeric value which will serve as our record ID. Example shown below:

And with that, hopefully you have made your first successful API call! Leave your questions down below but hopefully this helps you on your journey to integrate all of your disparate data systems.