I failed to verify implementation of zlib and found a buffer overflow in Lean Runtime.
AI agents are being found Very good Detecting vulnerabilities in large-scale software systems.
Anthropic was apparently so intimidated by the vulnerability-discovery capabilities of Mythos, they decided not to release it because it was “too dangerous” (laughter). Whether you believe the hype about these latest models or not, it seems undeniable that the writing is on the wall:
The cost of discovering security bugs is falling, and most software running today was never built to withstand that kind of scrutiny. We are facing a looming software crisis.
In the face of this coming tsunami, there has recently been growing interest in formal verification As a solution. If we state and prove properties about our code using a mechanical device, can we create robust, secure, and stable software that can overcome this barrier of incoming attacks?
A recent development in the Lean ecosystem has addressed this question. 10 agents autonomously created and proved an implementation of Zlib, Lean-Zip, which was an impressive landmark result. Quote from Leo de Moura, chief architect of Lean FRO (here):
With apologies for the AI-slop (Leo seems to have a penchant for it), the main takeaway is that lean-zip It’s not just another implementation of zlib. This is an implementation that has been verified as correct from end to end, guaranteed by Lean to be completely free of implementation bugs.
What does “verified as correct” actually look like? Here is one of the main theorems (Github):
theorem zlib_decompressSingle_compress (data : ByteArray) (level : UInt8)
(hsize : data.size < 1024 * 1024 * 1024) :
ZlibDecode.decompressSingle
(ZlibEncode.compress data level) = .ok data
For Any Byte array less than 1 gigabyte, calling
ZlibDecode.decompressSingle at the output of ZlibEncode.compress
Prepares basic data. The decompress function is exactly the opposite of compression. This pair of functions is Completely Correct.
or is it?
I pointed to a cloud agent lean-zip Over a weekend, armed with AFL++, Address Sanitizer, Valgrind and UBSan. Above 105 million fuzzing executionsit was found:
- zero memory vulnerabilities In verified lean application code.
- a stack buffer overflow In Lean 4 runtime (
lean_alloc_sarray), influencing every version of Lean to date. (bug report, pending resolution) - denial of service In
lean-zipA collection of parsers, which were never verified.
<a href