The stories that Hacker News removes from the front page
UPDATE: I’ve spoken to @dang over at Hacker News and he’s been extremely understanding and helpful in both explaining and handling the situation. A new post has been created and it can be found at https://news.ycombinator.com/item?id=13867739.
My previous post accidentally had “(2010)” added to the title by a moderator and then users flagged the story because of this and it was automatically penalized after hitting a hidden flag threshold. It sounds like the other submissions were penalized due to excessive flagging and there has been some interesting discussion as to whether some users abuse this and possible solutions.
I published an article on Friday titled Reverse Engineering the Hacker News Ranking Algorithm and posted it on Hacker News. Now, I know that it’s always a crapshoot whether or not something makes it to the front page, but I had a hunch that this article had what it takes. It was directly relevant to the Hacker News audience in multiple ways, told a fairly interesting analysis story, and provided source code and data to make it easy for readers to try things out themselves. At the very least, I figured that if I could get enough attention to initially get on to the front page then the article would be fairly well-received.
You can imagine then that I was pretty happy when the story popped onto the front page after 45 minutes or so. Traffic started flowing in and I was looking forward to having some substantial discussion in the comments section. Then all of a sudden it was just gone. It didn’t just fall off the front page, it was like it completely disappeared from the rankings. A few minutes later @awsoutage left a comment noticing the same thing that I did
and, later on, so did @mos_basik
I have to admit that I found it a bit comforting that I wasn’t the only one who thought this all seemed a bit fishy.
I was basically rolling in Hacker News data after working on my last article so it seemed pretty natural to take a quick look and see how the ranking of my post changed over time. The data backed up what @awsoutage, @mos_bsik, and I had all noticed.
The story made it to the front page, started getting upvoted at a fairly high rate, and then disappeared within minutes. For comparison, take a look at the trajectories of some more typical stories that made it to the front page.
These seem to make a lot more sense; stories rise up to some peak position and then slowly drift down through the back pages over the course of days until they’re gone. Flagging and other factors affect these trajectories but they still fall off somewhat continuously. There’s also clearly some sort of auto-penalty that kicks in once stories are 15 hours old and other minor things but nothing nearly as extreme as disappearing from the rankings entirely.
My guess here is that moderator intervention was responsible for my post’s disappearance. It was on the front page for a grand total of 9 minutes, received 9 upvotes in that time, and then received an additional 11 upvotes after disappearing as people finished reading the article. Maybe “Reverse Engineering” in the title made a moderator worry that it was going to discuss how the spam prevention mechanisms worked? That’s the only thing I can think of even though that would be an unfounded fear. I had purposely avoided analyzing the spam prevention mechanisms and limited the analysis to the components of the algorithm that were already publicly available from posts by @pg and the Arc source code releases.
In any case, this occurrence seemed like a natural opportunity to extend my previous analysis.
I now strongly suspect that there’s a mechanism for moderators to remove stories from the top stories section or, at the very least, attach a factor similar to the lightweight-factor*
discussed in my last post that is so extreme that it effectively does the same thing.
The data signature of moderator intervention is relatively simple. Basically, if a story has a normal trajectory on the front page for a while and then instantaneously makes a huge jump downwards or disappears completely then it suggests that a moderator may have manually adjusted the story. To make this a little more concrete let’s limit it to stories that got at least 10 votes, dropped from the front page by over 100 spots in less than 30 seconds, and weren’t marked “dead”, “dupe”, or “flagged” (because those might organically result in the same behavior). Applying this filter to the data lets us pick out the trajectories of stories that were doing well before being severely penalized.
As you can see, my post has some company here. This has happened to 26 out of 1213 front page stories over the last couple of weeks or so. That’s about 2.1% so it’s not a particularly common occurrence but it is happening on a daily basis.
Now let’s take a closer look at the stories that have moderator fingerprints on them. Are they spam, are they critical of Y Combinator companies, or what?
A few other posts seem a bit off-topic, some may be a bit light on content, but for the most part I wouldn’t think twice if I saw any of these stories on the front page. I mean… they were all actually upvoted to the front page after all. Some of them are the types of stories that tend to invite flagging and might have been removed because of that (either manually or automatically).
This is most likely the case for the political posts. They’re usually fairly contentious and the moderators have expressed distaste for an overabundance of them (e.g. the Political Detox Week attempted a few months ago). I wouldn’t be surprised to see those removed, especially when the comment sections turn toxic (as they often do).
There are also a few articles that are thinly veiled affiliate spam. For example, the ShelfJoy links about books that Aaron Swartz and David Bowie loved fall into this category. They get called out in the comments for being very low-effort lists of Amazon affiliate links.
I have a hard time imagining others triggering any sort of hidden flag threshold. I obviously think that Reverse Engineering the Hacker News Ranking Algorithm falls into that category, but what about “Python Fire”? A new, useful open source project that was met with glowing praise in the comments? Take a look at the first sentences of some of the comments: “Convenient.”, “Some libraries just seem obvious in retrospect.”, “Looks great!”, “Looks pretty good.”, and “Looks fantastic.”. That just really doesn’t seem like a recipe for heavy flagging.
The “Investor to Airbnb CEO” post is the only one on the list that pertains to a Y Combinator company. I read through the comments to get a feel for the tone and how much controversy there was. As a whole, they’re fairly critical of the CEO. Take a look at the top comment.
I find this to be a very respectful comment that highlights the positive role that Y Combinator plays in the startup community while also explaining why that translates into it having some responsibility for taking a stance against unethical behavior. That said, it’s clearly critical of how Y Combinator has responded to controversies regarding the ethics of their companies.
There was another part of the comment that I found a bit chilling though: “I am looking forward to see[ing] constructive debate around this topic in the comments.” Did a moderator really read that and then disappear the story? This might just be an unfortunate coincidence caused by some flagging threshold but the pattern looks strikingly similar to that of my own post. It was on the front page for about 20 minutes, was getting rapidly upvoted, and then suddenly dropped hundreds of positions in the rankings.
The penalty was then reversed shortly after but it was off the front page at that point. The submission never got a fraction of the attention that it would have received otherwise (though it did briefly rise back on to the front page after this).
I’m very curious to hear some other people’s opinions on this. Were these stories removed automatically or was there moderator intervention? If you think it was the moderators then do their intentions make sense to you? I don’t want to jinx things but… I am looking forward to seeing constructive debate around this topic in the comments.