HTML Parsing
HTML is often not thought of as a data format, but so much good data is only available on a web page. In this lesson, I use Jsoup and Clojure to parse out data from a big HTML table.
Specifically, I parse out all of the unicode emoji from this page. The documentation is pretty good. Here are some documents I referenced when planning this lesson:
Code
Code is available: lispcast/emoji
You can checkout the code in your local repo with this command:
$CMD git clone https://github.com/lispcast/emoji.git
$CMD cd emoji