A guide to scraping historical snapshots of webpages from the Archive.org Wayback Machine.
The full code for the completed scraper can be found in the companion repository on github. Introduction I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it. It just seems like many of the things that I work on require me to get my hands on data that isn’t available any other way. I need to do static analysis of games for Intoli and so I scrape the Google Play Store to find new ones and download the apks.
An analysis of which stories are removed from the front page of Hacker News due to moderator intervention.
A data analysis of how many deaths the DST transition causes due to tired driving.
A data-driven exploration of how the Hacker News ranking algorithm works.
A brief response to Nayuki’s post about the use of simulated annealing to solve an image unshredding problem. An interactive demo is used to show that a simple greedy algorithm outperforms the SA, both in terms of results and computation time.
An overview of my work on optimizing phone keyboard layouts for Swype and T9. There’s some interesting history here as well as a novel simulation-based approach to keyboard optimization.