Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unicode allows for 17 planes, each of 65,536 possible characters (or 'code points'). This gives a total of 1,114,112. This crashing sequence is five characters long.

Your unit tests would have to go through 1.71650179e30 sequences to be guaranteed to catch this one. At a test rate of 1 millisecond per sequence, that's just 4×10^9 × the age of the universe, according to wolfram alpha.



Rendering Indic scripts with AAT fonts involves a series of finite state machines that are stored in the individual font. So don't forget to multiply by the number of different fonts that each need to be tested.


> Unicode allows for 17 planes, each of 65,536 possible characters (or 'code points'). This gives a total of 1,114,112.

Allows for 17 planes, but only small portion of those are actually used. According to Wikipedia[1], currently Unicode has 148944 codepoints + 128k private use ones (which might, or might not make sense to include in unit tests). So your time estimate is off by mere 5 orders of magnitude.


(148944 ^ 5) milliseconds to years = 2.324 quadrillion years .

Doesn't really change the result, imho.


Do you know how was this bug found?

Are there just enough people using iOS that these sorts of bugs can be found by mistake, or is someone fuzzing CoreText? Perhaps that can be applied to provide some kind of test coverage? Even if it’s not complete?


This sequence begins the Telugu word for "knowledge" so maybe someone texted that to someone and it went viral from there. This is, of course, only speculation.


Does the word include the zwnj character? How do you input it?


It does not include the zwnj, that somehow snuck in. Most keyboards don't support directly inputting a zwnj, but may support it for specific combinations. For example my Marathi keyboard supports typing eyelash rephs (e.g. in र्‍क) which includes zwj.

However I'm not aware of any such things in Telugu aside from explicit virama-showing which rarely exists in input methods (and doesn't end up with zwnj in the position shown here, but that could have happened after editing).


Huge oversight on my part, the string starts with 0xC1C, 0xC4D, 0xC1E, 0xC3E i.e. without the ZWNJ. I'm stumped

https://en.wiktionary.org/wiki/%E0%B0%9C%E0%B1%8D%E0%B0%9E%E...


true, but characters could be mapped to equivalence categories according to logic in the code




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: