What I'm really curious about is how bugs/errors in the iOS typesetting algorith...

evmar · on Feb 15, 2018

I wrote some similar bugs to this in the complex text handling in Chrome.

In text layout you do a lot of indexing into various arrays -- like the array of code units of the input string, or an array of metadata collected per-code point, or an array of data collected per-grapheme. Oftentimes those arrays are all the same length (like in simple text like Chinese) and mixing up which index to use where is no problem. And then when it goes wrong you're violating array bounds which is a crashable offense.

I agree that it's stupid to have code that causes crashes here. My only defense is that I had a lot of other work to do and complex test pathological cases only affect a fraction of users, none of whom were the ones yelling at me about other bugs. I am rooting for Servo, where they are using a language that defends against bad programmers like me.

PS: If a web page wants to crash, it easily do so by allocating memory in a loop, so making web pages that crash isn't as exciting as it is in Core Text in general. Of course crashes can often be escalated into RCE, but the Chrome sandbox was there to mitigate that.

cakoose · on Feb 16, 2018

> PS: If a web page wants to crash, it easily do so by allocating memory in a loop, so making web pages that crash isn't as exciting as it is in Core Text in general.

To allocate memory in a loop, you need some control over the JS. Websites try hard not to serve untrusted JS.

But websites serve untrusted text without a second thought. For example, I could post a comment on a news article and cause the article to be unviewable by anyone with a vulnerable browser.

wybiral · on Feb 16, 2018

You can serve infinite pages like this without JS or weird text issues: fan-pages[dot]herokuapp[dot]com

gok · on Feb 15, 2018

How would using Rust help this case? An out of bounds array would still lead to a crash, and thus a DoS in the crashing application. You could sandbox the text rendering into its own process to solve that, but then you could do that using unsafe languages anyway.

tomjakubowski · on Feb 15, 2018

Rust would significantly help to prevent the escalation of this crash into an RCE.

mrob · on Feb 15, 2018

Fully sandboxing an unsafe renderer might have unacceptable performance. E.g. you'd have to reset the internal state after every call, otherwise invalid text on a phishing website might be able to subvert the renderer to make it render text in the URL bar to read something different.

thezoq2 · on Feb 16, 2018

An out of bounds access doesn't have to lead to a crash. For most types in the standard library, the [] operator crashes but the "get" function returns Result<T, E> which you can deal with.

beojan · on Feb 16, 2018

So it's really no better than C++, where containers have a `.at` function that does bounds-checking and throws an exception if out of bounds.

cesarb · on Feb 16, 2018

The difference is that on Rust the [] operator does bounds checking: it'll reliably panic before accessing memory if the index is out of bounds. While on C++, the [] operator will happily let the program read or write outside the array bounds.

Depending on compile-time options, a Rust panic can either cause an immediate crash, or do something similar to throwing a C++ exception, complete with stack unwinding.

Manishearth · on Feb 16, 2018

Yeah, and because it's reliable, you don't need to worry about security issues -- the worst this can do is abort the application.

Whereas for this bug it's quite possible that it may be exploitable. Especially given that the crash backtrace doesn't always appear in the same place -- something is corrupting memory that gets discovered later. (This explains why I can sometimes get the string to render for a split second before crashing)

dom96 · on Feb 16, 2018

Wouldn't any language with exceptions work here? You just define an indexing operator that throws an exception instead of crashing and handle that exception outside the unicode handling function.

thezoq2 · on Feb 16, 2018

The advantage of rusts error handling is that it is explicit. The compiler knows that a function might result in an error and forces the programmer to deal with it, or pass it along.

In an exception based language you might forget to deal with the error and have it crash "higher up" in the code.

I also suspect that there might be a performance benefit but I could be completely wrong about that

peoplewindow · on Feb 16, 2018

It's actually slower to not use exceptions and what you describe is not an advantage - exceptions also force you to deal with it or pass it along, if the exception is checked. Bounds check failures aren't of course because that would be incredibly inconvenient and unwieldy, and anyway, you'd just pass it all the way up the stack to some much higher level point which is the only place you can sanely do something (like not render the string at all).

m0th87 · on Feb 16, 2018

> The advantage of rusts error handling is that it is explicit. The compiler knows that a function might result in an error and forces the programmer to deal with it, or pass it along.

Checked exceptions are explicit as well, though I'm not aware of a language that _only_ has checked exceptions.

> I also suspect that there might be a performance benefit but I could be completely wrong about that

Yes, this is a big advantage. Stack unwinding is very expensive.

jopsen · on Feb 15, 2018

> PS: If a web page wants to crash, it easily do so by allocating memory in a loop,

isn't this also a bug :)

And probably want we should fix.. I think all browsers are susceptible, though last I tried in Firefox one had to not just allocate but also fill the memory with garbage.

cortesoft · on Feb 16, 2018

He means crash the single web page doing the loop... how would you fix this? Isn't the correct behavior when an application goes past resource limits to crash the app? It doesn't crash the browser or any other open tabs.

jopsen · on Feb 17, 2018

> Isn't the correct behavior when an application goes past resource limits to crash the app?

Maybe, or maybe it would be better to exponentially decrease CPU time awarded to a tab over time. And ask the user to confirm that this tab should be allowed to do CPU / memory intensive computations, and otherwise reduce the frame rate, number of ticks, or crash early.

Maybe, big memory allocations and CPU intensive stuff should only be allowed on background workers. And even then, we should still reduce resource allocations to save battery, etc. and allow users to award more resources.

> He means crash the single web page doing the loop... how would you fix this?

It used to be that you could bring my entire system to halt by allocating too much memory in the browser (mostly chrome). Mostly due to swapping and just plain bad system configuration I suspect :)

wybiral · on Feb 16, 2018

Better process management and limitations between tabs on browsers can help.

For instance, this site will crash most Firefox browsers but not Chrome because they limit the modal dialog rates: fan-pages[dot]herokuapp[dot]com (be warned that it may crash your browser even with JS disabled).

LeifCarrotson · on Feb 16, 2018

> And then when it goes wrong you're violating array bounds which is a crashable offense.

Why is this a crashable offense? Probably a bit naive as a C# application developer, not a system developer, but shouldn't all this code be in the equivalent of a try/catch? If it throws an out-of-bounds error, just log the error and return, rendering what you've got.

Manishearth · on Feb 16, 2018

Systems languages like C++ and C don't usually do error handling that way. You do have try/catch in C++ but folks often disable it.

And even if you don't disable it, out of bounds array access will cause segfaults, not exceptions (unless you're using some abstraction over arrays).

crumbshot · on Feb 16, 2018

Someone posted a crash report: https://gist.github.com/steipete/a759b56dc1b5d94ba5b3c03ddd0...

Looks like a heap corruption, leading to a failure when allocating memory later on:

    Application Specific Information:
    abort() called
    *** error for object 0x604000673e00: Invalid pointer dequeued from free list
    
    Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
    0   libsystem_kernel.dylib        	0x00007fff73a30e3e __pthread_kill + 10
    1   libsystem_pthread.dylib       	0x00007fff73b6f150 pthread_kill + 333
    2   libsystem_c.dylib             	0x00007fff7398d312 abort + 127
    3   libsystem_malloc.dylib        	0x00007fff73a9bdbf nanozone_error + 502
    4   libsystem_malloc.dylib        	0x00007fff73a8fdac _nano_malloc_check_clear + 410
    5   libsystem_malloc.dylib        	0x00007fff73a901d7 nano_calloc + 72
    6   libsystem_malloc.dylib        	0x00007fff73a8acc0 malloc_zone_calloc + 87
    7   libsystem_malloc.dylib        	0x00007fff73a8b5d6 calloc + 30

Probably caused by an out of bounds write on some heap buffer. If this bug is also present on OS X, it would be interesting to see where it crashes with some of the malloc debugging flags enabled (https://developer.apple.com/library/content/documentation/Pe...), hopefully to get a crash a bit closer to the root cause.

vxNsr · on Feb 16, 2018

This char string will crash safari on my up to date Mac.

crumbshot · on Feb 16, 2018

Could you run Safari under lldb, setting environment variable DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib?

Should look something like this:

    $ lldb /Applications/Safari.app/Contents/MacOS/Safari
    (lldb) env DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
    (lldb) process launch

Assuming this is a heap buffer overflow, this should cause it to crash at the point of memory corruption, as it hits an adjacent guard page.

pkaler · on Feb 15, 2018

The crash seems to be in CoreText. CoreText is embedded/linked in Messages, Spotlight, Springboard, etc. CoreText is written in C.

The fix would be to rewrite CoreText in a memory safe language like Swift. This would be “hard”. Or put CoreText in an XPC container. This would both be “hard” and result in terrible performance.

For more details on how hard C, memory management, systems programming, and operating system development is please refer to your local copy of Modern Operating Systems by Andy Tannenbaum.

Someone1234 · on Feb 15, 2018

Or just move CoreText into its own process, and restart it when it crashes.

The big issue is that when CoreText crashes right now the kernel panics and the device restarts. If CoreText itself could crash safely, get restarted, and the OS continue running then these bugs would go from "significant" to "annoying." Even if CoreText crashing caused individual apps to also crash, that would be a big improvement the current situation.

Obviously we'd all like bug free fonts and text rendering, but if we call that goal aspirational (read: impossible), the best we can hope for today is handling the fault cases better than they're handled today. Bootloops are a pretty lame user experience.

lloeki · on Feb 16, 2018

From what I gathered so far this doesn't hit the kernel but the process. It seems to turn out that on iOS one of such processes happens to be Springboard, hence the UI (but not the kernel) gets a kick and restarts.

Maybe I missed something though.

floatboth · on Feb 16, 2018

Yeah, Springboard is responsible for not just the home screen but also notifications. So if that sequence of characters arrives in a notification, Springboard will restart… and try to show the notification again… and restart…

semi-extrinsic · on Feb 16, 2018

That's right, there's no kernel crash AFAICT.

Pissed my niece the hell off though, that I could remotely disable her Messenger.

logicallee · on Feb 16, 2018

Well she's right to be pissed if you did that.

On the other hand I had the same thought (but better self-restraint than you) - however I was thinking "nah, scrubbing bad messages server side is just an s/badstring// and I am sure the major non-encrypted messenger apps (where the server knows the strings) added that server-side, so people couldn't crash their contacts' apps, which the app company might get blamed for. This kind of hotfix shouldn't have negative effects, I'm sure there are already a few server-side manipulations of text (stuff like adding a space to very long lines, maybe a blacklist of certain malicious URL's that sort of thing.)

So I'm surprised your message was delivered as sent (if it's not encrypted end to end), unless you did this right when the news broke.

Manishearth · on Feb 16, 2018

Though it seems like providers have not yet figured out the full set of crashy things (an overly conservative thing to do would be to filter out zwnjs in <consonant, virama, consonant, zwnj, vowel> for the three languages listed). Twitter blocks the original one but not any Bengali variants; গ্য + zwnj + a bengali vowel will still crash it.

Piskvorrr · on Feb 17, 2018

Unfortunately, enumerating badness just a stopgap measure - as this seems, so far, to triggered by a specific combination of character classes, it at least possible that there a non-malicious yet crashy string: what now, if the Knights of Ni cannot stand to hear it, but if it a part of the message? The recipient might feel that something not right with the message, and the sender might not even know that the message has censored because a part of it seems to harmful tó intermediary code. (See what I have doing here?)

logicallee · on Feb 17, 2018

So, you're right - and the point you raise at the end (with your illustrative example) is a good one. It would be wrong for HN software to silently not deliver your message to me without telling you - just because tó was on some blacklist for some reason.

If it's possible to write "Your message could not be delivered" when messages match the blacklist (even leaving the sender to guess at what they did wrong) it would be better.

As a practical matter if you haven't built the infrastructure into your clients to tell the sender that their message won't be delivered, none of the choices the platform operator has seem great:

- Silently drop a few kinds of messages without informing sender. Seems bad for the reason you outlined.

- Silently modify messages before delivery, modifying them so they won't crash clients. This seems potentially very wrong.

- Deliver messages even if you know for sure they will crash the client upon view

Doesn't seem great to me either.

I guess the real solution is to have robust forced-upgrade on the client (after all, it's your software, you're responsible for it and if you build it to include updates that it is on you) but some users object to that and I suppose they could be justified - it is also a massive responsibility.

I guess there really aren't any perfect answers here.

vlovich123 · on Feb 16, 2018

You'd probably need 1 CoreText process per application which seems suboptimal. If you didn't you'd end up having 1 crash impact all processes (+ opens you up to things like trying to steal data between processes). There's another problem which is that CoreText is intended to be an extremely efficient API for processing lots of text. It would seem to me to be hard to do that while maintaining performance requirements.

nitwit005 · on Feb 15, 2018

There are usually less dramatic fixes like changing all the array accesses to be checked, or putting pages that will trigger a fault around the buffers that the library uses, and handling the fault hitting those buffers generates.

cek · on Feb 15, 2018

Putting a buggy system in a memory safe environment is certainly not 'the fix'. The fix is to find the precise bug or architectural deficiency and fix it.

raverbashing · on Feb 15, 2018

It's easier to failsafe something than make things perfect

Even better, when you failsafe you plan for the (unknown) future.

That's why we have circuit breakers, hydraulic and electric fuses, pressure relief valves, etc. Because no one thinks they can know all things that can go wrong in the future (with catastrophic consequences) and plan for that

defined · on Feb 16, 2018

That’s the reasoning behind the Erlang “let it crash” philosophy. It’s not advocating poor programming; it’s asking processes to handle whatever issues they can within reason, but otherwise to crash and be restarted by their supervisor process, rather than try to carry on in a probably erroneous state.

It’s also a recognition that in complex systems, something unanticipated is going to go wrong sometimes, and rather have a plan for handling the failure than pretend that the system will never hit a really bizarre failure mode.

Your circuit breaker analogy made me think of this.

mapmap · on Feb 15, 2018

I'm guessing C is how they get the performance they need. Re-writing in Obj-C or Swift would likely have speed tradeoffs.

ams6110 · on Feb 16, 2018

Obj-C is C, or more accurately a superset of it.

pjmlp · on Feb 16, 2018

C code only got fast thanks to 40 years of optimizer improvements, taking advantage of UB.

zxxon · on Feb 16, 2018

Huh? C is fast (compared to Swift) because using it doesn't imply sprinkling lots of sugar (like ARC) into the resulting machine code.

Simpler languages like Fortran can turn into even faster code than a C implementation. UB optimizations aren't that relevant for real-world performance.

pjmlp · on Feb 16, 2018

Code generated by C compilers is fast in 2018.

Code generated by C compilers for C64, Spectrum, Atari, Atari ST, Amiga, Mac, CP/M, MS-DOS, Windows 3.x, Nintendo, MegaDrive,... systems meant many times the code would be 80% like this:

    void some_func(/* params */) {
      asm {
         /* actual "C" code as inline Assembly */
      }
   }

Lots of Swift sugar also gets optimized away, and there is plenty of room to improvement.

The code that current C compilers don't generate, many times is related to taking advantage of UB.

They also generate extra code for handling stuff like floating point emulation though.

Just as an example, IBM did their whole RISC research using PL/8, including an OS and optimizing compiler using an architecture similar to what LLVM uses.

They only bothered with C, after making the business case that RISC would be a good platform for UNIX workstations.

zxxon · on Feb 16, 2018

Why bring these ancient home computer platforms into play? Those were totally different to program for. Why not compare a C compiler from 1998 to one from 2018, on x86 (no SSE of course)? C compilers have gotten better, but not spectacularly.

>> The code that current C compilers don't generate, many times is related to taking advantage of UB

Compilers are really smart in optimizing things that aren't relevant to the real world.

For example, this code would reduce to "return 32" in most modern compilers:

  int return32(){
    int x=1;
    for (int i=0; i<5; i++){
      x*=2;
    }
    return x;
  }

Does that make impact in real-world code? Almost certainly not, it's a contrived case. Most UB cases fall into the same category.

>> They also generate extra code for handling stuff like floating point emulation though.

Not necessarily.

pjmlp · on Feb 16, 2018

> wWhy bring these ancient home computer platforms into play? Those were totally different to program for. Why not compare a C compiler from 1998 to one from 2018, on x86 (no SSE of course)? C compilers have gotten better, but not spectacularly.

To clear up the myth among young generations that C compilers always generated fast code, regardless of the platform.

As for something more modern, in 1998, C code quality was still at a similar level to other system's languages, before they started to fade away thanks to the increase in UNIX, Linux and BSD adoption

For example, given that Delphi and C++ Builder share the same backend, their generated code was quite similar, even if it would require disabling some of the Delphi's security checks.

> Not necessarily.

Sure, it all depends on the CPU being targeted.

gurkendoktor · on Feb 15, 2018

It's harder than that. Even if you sandbox the code, then it could still happen that some particularly wacky layout code never terminates.

stcredzero · on Feb 15, 2018

you'd think you'd want to make sure they didn't cause hard crashes.

Any Apple people want to chime in? Why wasn't the general bug -- where there could be even the possibility of a "text crash" -- fixed?

ghusbands · on Feb 15, 2018

You realize that there's no such thing as "don't ever crash" fix, right? Maybe they added some defensive code and maybe they didn't, but if they're using an unsafe language, there's always the possibility of more such issues.

stcredzero · on Feb 15, 2018

You realize that there's no such thing as "don't ever crash" fix, right?

You answered universally for all contexts, ever, everywhere. That's a usually quite a foolish thing to do. In ObjectStudio Smalltalk, there was actually a place where you could define an empty lambda as the "top-level" exception handler. Once you did that, all Smalltalk exceptions did nothing. There you go: a "don't ever crash" fix, in a language many would call "unsafe." You are now technically wrong, which is the best kind!

Exceptions can be caught in C++, Objective-C, and in Swift. You can even do this for C. There is apparently a history of similar bugs where certain data crash processes in iOS. Depending on how you count, this is either #3 (strings) or #5.

https://www.theverge.com/2018/2/15/17015654/apple-iphone-cra...

Given that, why wouldn't Apple take steps to make sure that certain critical processes are architecturally immune to this sort of thing? Springboard going away is pretty horrendous. Messages app, given that it's a core functionality for a phone, is almost as bad.

if they're using an unsafe language, there's always the possibility of more such issues.

There are software projects where certain things simply can't be allowed to happen, ever. Apparently, Apple isn't operating at that level.

annabellish · on Feb 16, 2018

Yikes. Instead of having heap corruption which immediately causes a hard failure, you want heap corruption which is silently ignored and (?????) happens on a device in which people access banking and all their personal identity stuff?

stcredzero · on Feb 16, 2018

No. But the heap corruption shouldn't make certain facilities go away, or hang around looking broken. For example, Springboard could be separated into separate processes (display and monitor) and built on some kind of event queue. Then, if an event containing some kind of poison pill brought down the display, the monitor could note the crash and after a few retries, evict the poison pill event and bring the display back up without it.

It's the not even 2nd rater who gives up on something which causes such a huge hole in the user experience, saying to themselves, "Uhhhh, there's no such thing as a never crash fix." You're not a 1st rate programmer if you only think one step ahead and say, "Uhhh, you can't guarantee no crashes," then leave such a huge hole in your system. The 1st rater engages in a bit of lateral thinking. The real problem isn't to eliminate crashes. The real problem is to eliminate the hole in the UX! Anything you can detect, you can "fix," and sometimes a guaranteed "fix" every bit as good or better.

(It's exactly this kind of mediocre thinking that led to the hardware quality doldrums in the 90's. The OS crashed so often, hardware manufacturers started to make cheap machines that could only stay up for a few days anyways.)

glibgil · on Feb 15, 2018

If Apple people chime in I think they get beheaded. It never happens

acheron · on Feb 15, 2018

“The bomb says no, Brian.”

https://www.penny-arcade.com/comic/2007/05/23/

AceJohnny2 · on Feb 16, 2018

Absolutely. It's drilled and repeated on a regular schedule: Shut the fuck up.

saagarjha · on Feb 15, 2018

They do sometimes, but generally not for topics such as these.

userbinator · on Feb 16, 2018

this stuff is extraordinarily complicated to do for every language/all of unicode.

Perhaps this sounds a bit Anglocentric, but isn't it unfortunate that this string crashes the devices even of those people who have never heard of and likely won't ever need to use the language it's written in? The majority of people use a tiny fraction of Unicode --- the parts that cover the languages they use; everything else is useless to them --- or in cases like this, even a liability. It would greatly reduce the number of affected devices, especially with bugs like this having possible security implications, if the text rendering system were more modular and perhaps divided into separate optional components: Latin (maybe not optional), CJK, and other complex scripts. I know Windows has/had a similar feature:

https://msdnshared.blob.core.windows.net/media/TNBlogsFS/Blo...

This way, those who have no need for anything other than Latin scripts get the simple and hopefully much less buggy rendering algorithm, while those who do need the others can do so without unnecessarily burdening everyone else.

Manishearth · on Feb 16, 2018

Uh, no, Windows doesn't have a similar feature.

That Windows option is specifically for:

- linebreaking support for east and southeast asian languages: Many of them don't use spaces, so if you want to know where you can break lines it is best to know what the words are. For this you need to have a pretty large dictionary file stored.

- fonts

- probably various assets for RTL languages (changed backgrounds, changed layout files, etc)

This is Windows providing this option for a very specific thing (and also downloading some files), for which there is okayish fallback functionality. Not cfg'ing the entire text stack.

I don't think you can easily slice and dice a text stack so that you can only pull in the components needed for Latin scripts without making it even more prone to bugs. You could write entirely separate stacks specialized for each group of scripts. But you'd probably end up with one for Latin/Cyrillic/Greek, one for Chinese/Japanese (not Korean) and one for "all the rest". There's enough feature overlap between most of the complex languages that there's questionable benefit to separating that out.

For example, most of the underlying text functionality in Telugu or other Indic scripts is not overall different from things in Hangul or Arabic (Arabic is more complex, actually), it's just that Telugu has certain features that press the buttons in just the right way to cause this crash. Which means that if you want to prevent this crash for other language users, what you need to do is not attempt to render Telugu, not swap out the font stack.

Like, looking at the last iOS crash that happened -- with the Arabic text -- that was because chopping off the end of a string of Arabic text doesn't guarantee that the string will be shorter. Really, you can replicate this for most scripts, it's just subtler (Even English has support for this, if your kerning is extreme enough. As long as your stack supports kerning, it supports everything necessary to crash this bug). So, the root text stack functionality that lets this happen is necessary for all scripts, Arabic just ends up pushing the right buttons in the right order to cause a crash.

signal11 · on Feb 16, 2018

ಠ_ಠ

Many users in latin-alphabet locales use other character-sets all the time ¯\_(ツ)_/¯

[This post brought to you by KANNADA LETTER TTHA and KATAKANA LETTER TU.]

rlanday · on Feb 15, 2018

Why does any bug result in a crash rather than just random unexpected behavior? There's a limit to how much it's practical to isolate things. For example, Chrome uses different processes to render different web pages, so a crash in one renderer shouldn't affect the others, but it doesn't parse HTML in one process, compute layout in another, execute JavaScript in another, and so on.

blackflame7000 · on Feb 15, 2018

I bet you a dollar its related to Auto-Correct trying to interpret some malformed word.

arbie · on Feb 15, 2018

Why would Auto-Correct kick in on incoming texts or on browser pages?

blackflame7000 · on Feb 15, 2018

I read that it was related to text-boxes so I figured that could be a potentially hidden culprit. I haven't heard of web pages crashing due to embedded text but if so then I retract my hypothesis.

2nd Sentence of the article says text box but not sure what "other places" are:

"Basically, if you put this string in any system text box (and other places), it crashes that process."

ghusbands · on Feb 15, 2018

From the article's source article: "iMessages, [...] Facebook Messenger, WhatsApp, Gmail, and Outlook for iOS [...] can become disabled once a message is received". "It might be difficult to fix and delete the problem message".

From the article: "I’ve been testing it by copy-pasting characters into Spotlight so I don’t end up crashing my browser". "I can cause this crash to happen more reliably in browsers by clicking on the string".

So, not text-box specific and happening on display for a wide variety of applications, including web pages.