sangaline.com
sangaline.com
About
Posts
Publications
Contact
Posts
Internet Archaeology: Scraping time series data from Archive.org
A guide to scraping historical snapshots of webpages from the Archive.org Wayback Machine.
Evan Sangaline
Apr 5, 2017
16 min read
Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more
The full code for the completed scraper can be found in the companion repository on github. Introduction I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it.
Evan Sangaline
Mar 16, 2017
20 min read
The stories that Hacker News removes from the front page
An analysis of which stories are removed from the front page of Hacker News due to moderator intervention.
Evan Sangaline
Mar 13, 2017
9 min read
How many people will actually die this week because of Daylight Savings Time?
A data analysis of how many deaths the DST transition causes due to tired driving.
Evan Sangaline
Mar 12, 2017
6 min read
Reverse Engineering the Hacker News Ranking Algorithm
A data-driven exploration of how the Hacker News ranking algorithm works.
Evan Sangaline
Mar 10, 2017
28 min read
A Greedy Image Unshredder
A brief response to Nayuki’s post about the use of simulated annealing to solve an image unshredding problem. An interactive demo is used to show that a simple greedy algorithm outperforms the SA, both in terms of results and computation time.
Evan Sangaline
Oct 9, 2016
1 min read
Finding an Optimal Keyboard Layout for Swype
An overview of my work on optimizing phone keyboard layouts for Swype and T9. There’s some interesting history here as well as a novel simulation-based approach to keyboard optimization.
Evan Sangaline
Apr 9, 2015
27 min read
Cite
×