Posts

Internet Archaeology: Scraping time series data from Archive.org

A guide to scraping historical snapshots of webpages from the Archive.org Wayback Machine.

Evan Sangaline

Apr 5, 2017 16 min read

Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more

The full code for the completed scraper can be found in the companion repository on github. Introduction I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it.

Evan Sangaline

Mar 16, 2017 20 min read

The stories that Hacker News removes from the front page

An analysis of which stories are removed from the front page of Hacker News due to moderator intervention.

Evan Sangaline

Mar 13, 2017 9 min read

How many people will actually die this week because of Daylight Savings Time?

A data analysis of how many deaths the DST transition causes due to tired driving.

Evan Sangaline

Mar 12, 2017 6 min read

Reverse Engineering the Hacker News Ranking Algorithm

A data-driven exploration of how the Hacker News ranking algorithm works.

Evan Sangaline

Mar 10, 2017 28 min read

A Greedy Image Unshredder

A brief response to Nayuki’s post about the use of simulated annealing to solve an image unshredding problem. An interactive demo is used to show that a simple greedy algorithm outperforms the SA, both in terms of results and computation time.

Evan Sangaline

Oct 9, 2016 1 min read

Finding an Optimal Keyboard Layout for Swype

An overview of my work on optimizing phone keyboard layouts for Swype and T9. There’s some interesting history here as well as a novel simulation-based approach to keyboard optimization.

Evan Sangaline

Apr 9, 2015 27 min read