Internet Archaeology: Scraping time series data from Archive.org
A guide to scraping historical snapshots of webpages from the Archive.org Wayback Machine.
Apr 5, 2017
16 min read
Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more
The full code for the completed scraper can be found in the companion repository on github. Introduction I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it.
Mar 16, 2017
20 min read
The stories that Hacker News removes from the front page
An analysis of which stories are removed from the front page of Hacker News due to moderator intervention.
Mar 13, 2017
9 min read
How many people will actually die this week because of Daylight Savings Time?
A data analysis of how many deaths the DST transition causes due to tired driving.
Mar 12, 2017
6 min read
Reverse Engineering the Hacker News Ranking Algorithm
A data-driven exploration of how the Hacker News ranking algorithm works.
Mar 10, 2017
28 min read
A Greedy Image Unshredder
A brief response to Nayuki’s post about the use of simulated annealing to solve an image unshredding problem. An interactive demo is used to show that a simple greedy algorithm outperforms the SA, both in terms of results and computation time.
Oct 9, 2016
1 min read
Finding an Optimal Keyboard Layout for Swype
An overview of my work on optimizing phone keyboard layouts for Swype and T9. There’s some interesting history here as well as a novel simulation-based approach to keyboard optimization.
Apr 9, 2015
27 min read