Sorry, but 'x crashed y% of the time' is absolutely meaningless. Crash rates for...

DougWebb · on June 9, 2014

Bridges have a very different quality standard than web browsers do. Chrome doesn't kill you and everyone around you when it crashes.

A better analogy would be bridge designers coming up with a design that required half as much maintenance or half as much material cost to achieve the same factor of safety, or which doubled the expected lifetime of the bridge for the same lifetime cost. Those would be significant improvements in the design, even though eventually the bridge will still have to be replaced (either all at once or piecemeal over its life).

coldtea · on June 9, 2014

>Imagine if 2.5% of all bridges would collapse due to engineering errors and then they'd improve it by a factor of two and hail that as immense progress...

Only software is not like bridges, and crashes happen and do not bring the end of the world.

Imaging having a new fangled machine, called a telephone, in 1930, only 1 in 10 (10%) calls where dropped mid-call. And then they managed to improve it to 2%.

That's not totally unlike how it was back then (heck, the analog telephone network was like that even in the eighties in some countries I know). And yet nobody thought of the phone network as "unusable" (compared to what? some non-existing non-crashing one?), and nobody blamed engineering in general.

Same for the early decades of the fax, same for the early dial-up internet, etc etc.

jacquesm · on June 9, 2014

> Only software is not like bridges, and crashes happen and do not bring the end of the world.

There's a joke about woodpeckers and software engineering that's a long time favorite of mine. I think that the attention to quality of the product is still vastly behind what we expect as normal from other branches of industry.

If a CPU contains a bug all the software guys are screaming blue murder, how was it possible that this several billion part highly timing sensitive design contained a bug that escaped detection during QA. And yet, as software guys we routinely wave off any bugs as though bugs are simply a fact of life and you'd better get used to them (and to the subsequent crashes).

All I'm saying is that there is something wrong about that picture, not that I have the solution, merely that it feels as though we should do better and should strive to do better. Much better.

The phone is a good example, if only because it's one of the few areas where reliability is top of the requirements list rather than for instance execution speed. That's why it should be no surprise that Erlang has a telecommunications background. It even factors in broken hardware to some extent.

Not exactly 100% relevant or on topic but interesting reading:

http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf

coldtea · on June 9, 2014

>The phone is a good example, if only because it's one of the few areas where reliability is top of the requirements list rather than for instance execution speed. That's why it should be no surprise that Erlang has a telecommunications background. It even factors in broken hardware to some extent.

Yes, but my point was it took half a century or more for the phone network to have the reliability we enjoy today.

(Not to mention how the mobile phone network STILL sucks donkeys balls in large parts of the states, including in highly populated urban areas).

jacquesm · on June 9, 2014

Wall half of North America went down due to a power failure the only thing that still worked was the phone network and that included mobile. I don't know your specific situation but to me things like LTE and thousands of simultaneous users of RF based infrastructure (think stadium with a 50K crowd) is (even though I can picture a lot of what's happening behind the scenes) testament to the effort telcos put into delivering the goods to their end users.

Even if it took half a century for the reliability to be 'right up there' what excuse do we have for sofware then? We're getting quite close to that half century.

coldtea · on June 9, 2014

>Even if it took half a century for the reliability to be 'right up there' what excuse do we have for sofware then? We're getting quite close to that half century.

For one, software is an ever-changing thing (new requirements, updates, changes to the OS and libraries etc). Something like a telephone network can basically be deployed and then just maintained.

Second, the complexity of our software stack in a modern OS is many times bigger than the phone network's. And it also plays together, with any random program the user might want to install.

Third, what the software can do now, is amazingly different (and more powerful) than what it did in 1950. E.g real time multiple video/audio stream manipulation with filters, and face recognition and what have you (and that's just on one app -- we're also running 20 others at the same time) -- compared to doing some simple business/math calculations.

Whereas the phone network still basically does the same thing: transfers data from one point to another and routes calls. It's an extremely more narrow field.

wiredfool · on June 9, 2014

The State of Washington has 3 floating bridges, and 1.5 formerly floating bridges that are on the bottom.

(not counting the new 520 bridge, which is not assembled into a bridge shape yet)

pcwalton · on June 9, 2014

> Imagine if 2.5% of all bridges would collapse due to engineering errors and then they'd improve it by a factor of two and hail that as immense progress...

As Justin was saying, most Chrome crashes are due to malware (and Firefox sees similar numbers I believe). Broadly speaking, bridges do not get malware.