Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

What is the easiest approach to wrangle HTML content ?

44619e163d12cf5f0d39
8 - Asteroid

I have several HTML pages I want to wrangle, I don't care too much about the formatting since the data is already there but I have problem extracting the raw text. Removing the tags through wrangling is a pain. Any recommended approach ?

2 REPLIES 2
44619e163d12cf5f0d39
8 - Asteroid

By the way I am on Mac, so if there is any utility I could use to do the conversion I could create a script, if needed. TIA

Trifacta_Alumni
Alteryx Alumni (Retired)

There are several ways you can convert.

MacOS

You can use textutil in order to convert all html pages in the current folder to txt file

textutil -convert txt ./*.html

 

Linux

You could use unoconv to convert between all LibreOffice supported standards, including HTML to txt. More details and examples in https://linux.die.net/man/1/unoconv