Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The end result was a massive world-wide outage.

The world wide outage was actually caused by deploying several incorrect programs in an incorrect system.

The root one was actually a bad query as outlined in the article.

Let’s get philosophical for a second. Programs WILL be written incorrectly - you will deploy to production something that can’t possibly work. What should you do with a program that can’t work? Pretend this can’t happen? Or let you know so you can fix it?



> Programs WILL be written incorrectly - you will deploy to production something that can’t possibly work. What should you do with a program that can’t work? Pretend this can’t happen? Or let you know so you can fix it?

Type systems provide compile time guarantees of correctness such that systems cannot fail in ways covered by the type system.

In this case, they used an unsound hole in the type system to do something that unnecessarily abandoned those compile-time invariants and in the process caused a world-wide outage.

The answer is not to embrace poking unsound holes in your type system in the first place.


It's not philosophical, half of the internet broke. There is a notion between "I really have no other choice but to crash" and "I might wanna crash at this moment because something is wrong but I won't and I'll try recovering".

In this particular case, it was the "limit 200" because "performance reasons" so I think there was more space to implement the latter than the former.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: