This is surprising considering that SQLite is very heavily tested. It shows that ridiculous amounts of testing with 100% coverage of every code path and "millions and millions" of test cases still doesn't guarantee that the program always works as intended.
I think that this is an important lesson about testing. We should have fewer tests but we should try to get the most value possible out of each one and for developers that means actively seeking out unusual edge cases that are likely to break things.
(1) The coverage testing used by SQLite is very good at finding problems that occur when the system is used as it was intended. Fuzz testing is better for finding vulnerabilities that can be exploited by a hacker. The 100% MC/DC testing in SQLite is very useful in ensuring that the code does what is intended for sane inputs. And 100% MC/DC helps prevent us from breaking things as we evolve and enhance the code. But the MC/DC testing is less useful at fending off attackers.
(2) The magellan vulnerability exploits a bug in an SQLite extension, FTS3, which while very well tested, is not testing to 100% MC/DC. (See the second sentence at https://www.sqlite.org/testing.html#test_coverage)
Hence my takeaways from this episode include that I need to extend 100% MC/DC testing to all commonly used extensions in SQLite, including FTS3, FTS5, and RTREE, and I need to improve fuzz testing throughout SQLite but especially in extensions.
Advocates of "safe" language correctly observe that this particular problem would not have happened if SQLite were written in (say) Rust. Rewriting SQLite in Rust in not (yet) a viable solution. (See https://www.sqlite.org/whyc.html) But I can start moving SQLite in that direction, and perhaps make use of techniques taken from safe languages to improve its resistance to attack.
Hopefully soon, “moving in that direction” can be done by slowly porting to Checked C, while always retaining an executable artifact. https://github.com/Microsoft/checkedc
I think that this is an important lesson about testing. We should have fewer tests but we should try to get the most value possible out of each one and for developers that means actively seeking out unusual edge cases that are likely to break things.
Source: https://www.sqlite.org/testing.html