I recently wrapped up a project with Liferay in Chicago, and wanted some fun downtime work. I decided to do some work with Python / Scrapy for web scraping to get some data. But turns out Scrapy, which is great for basic web scraping, doesn’t work for some sites with JavaScript actions. So I looked into and implemented a Java / Selenium WebDriver (Chrome and Firefox) to do the second stage of the downloading and organizing of the data.

Really makes you start thinking about all the things that can go wrong when accessing a web page, and what may or may not be there in terms of DOM elements when working against a site with years of data. But all in all it was a success and I learned a lot.

My teenage daughter came into my home office while I was doing it and saw the browser working on its own, so I explained what I was doing — she said “Dad, you’re such a nerd”. I guess she’s right 🙂

But there is a practical application for all of that outside of what I was doing, and that is for testing purposes against a web site where unit tests don’t really handle all the cases. So not only was it fun and I learned a lot, but I can now apply that in future projects.