HTML is often not thought of as a data format, but so much good data is only available on a web page. In this lesson, I use Jsoup and Clojure to parse out data from a big HTML table.

Specifically, I parse out all of the unicode emoji from this page. The documentation is pretty good. Here are some documents I referenced when planning this lesson:

Code

Code is available: lispcast/emoji

You can checkout the code in your local repo with this command:

$CMD git clone https://github.com/lispcast/emoji.git 
$CMD cd emoji