No one can write correct C, or C++. And I say this as someone who has written C and C++ on an almost daily basis for nearly 30 years. I listen to C++ podcasts. I watch C++ conference talks. I enjoy reading and writing C++.
C++ has served us well, but this is 2026, and the environment of 1985 (C++) or 1972 (C) is not the environment of today.
I’m certainly not the first person to say this. I remember reading a post from some prominent person about a decade ago that said a good case could be made that using C++ is a SOX violation. And while I didn’t buy into the rest of his rant (nor his confusion about “it” vs. “it is”), I never disagreed on that point.
Over time I found this to be more and more true. A lot more things are undefined behavior (UB) than you might expect.
Everyone knows that double-free, use-after-free, accessing outside the bounds of an object (like an array), and accessing uninitialized memory is UB. After all, C/C++ is not a memory safe language. And yet as an industry we seem unable to stop making those mistakes again and again.
But there is more. more subtle. More illogical.
It’s not about optimization
Some people think that as long as they compile with optimizations turned on, undefined behavior can’t harm them. They believe that the compiler is somehow being intentionally hostile, saying “Aha! UB! I can do whatever I want here!”, and that won’t happen without optimizations turned on.
This is wrong.
UB does not mean that the compiler can take advantage of your carelessness. UB means that the compiler can assume that your code is valid. This means that the intent of your code is very clear when read by a human, there is no way to express Between compiler stages or modules.
UB means that the compiler does not even need to enforce certain special cases in its code generation, because they “can’t happen”.
The compiler, and indeed the underlying hardware as well, is playing a game of telephone with your UB intentions. It may be exactly what you wanted, but there are no guarantees now or for the future.
UB is everywhere
The following is not an attempt to enumerate all UBs in the world. This is simply claiming that UB is everywhere, and if someone can’t do it correctly, how is it fair to blame the programmer? my point is this All
Non-trivial C/C++ code has UB.
Reaching for an object that is not aligned correctly
Take this code as an example:
int foo(const int* p) {
return *p;
}
If this function is called with a pointer that is not correctly aligned (probably meaning at an address that is a multiple of ). sizeof(int)But who knows), it’s UB. C23 6.3.2.3.
On Linux Alpha, in Some? In such cases it will simply be stuck in the kernel, which will emulate the software you intended. In other cases it will (probably) crash your program with SIGBUS.
On SPARC this will cause a SIGBUS.
Sure, on x86/amd64 (just “x86” from now on) it’s probably fine. Hey, it’s probably a nuclear lesson too. x86 is famously extremely forgiving about cache coherency subtleties.
So here we have three cases:
- The kernel lent a helping hand (for alpha). Some? Burden)
- Crash (other alpha loads and SPARC)
- no problem (x86)
What about ARM, RISC-V and others? What about the architecture of the future? may also be specialized in future architecture int-pointer registers It does not populate the lowest bits, because such pointers may not exist.
Even if it works, the compiler may one day change from using one load instruction to another, and suddenly it won’t be fixed by the kernel.
Because The compiler is not obliged to generate assembly instructions that operate on unaligned pointers.. Because it’s UB.
Or how about this:
void set_it(std::atomic<int>* p) {
p->store(123);
}
int get_it(std::atomic<int>* p) {
return p->load();
}
Is this operation atomic when the object is not aligned correctly? This is a wrong question to ask. Mu, ask questions. This is UB. (But yes, in practice this could easily be a nuclear problem)
If you want to be even more convinced, you can try to imagine what happens if something you thought you were reading atomically spreads the pages. But don’t think too much about it, otherwise you might conclude that “it’s okay”. it. This is UB.
Actually, it was UB even before that
don’t blame foo() Function, above. The act of dereferencing the pointer was not the problem. Only Make The indicator was high enough to be a problem.
Example:
bool parse_packet(const uint8_t* bytes) {
const int* magic_intp = (const int*)bytes; // UB!
int magic_raw = foo(magic_intp); // Probably crashes on SPARC.
int magic = ntohl(magic_raw); // this is fine, at least.
[…]
}
That’s the cast problem, no foo().
It is perfectly valid for the compiler to assign specific meaning to one’s lower bits, such as garbage collection or security tagging bits. int*.
isxdigit() But char input
bool bar(char ch) {
return isxdigit(ch);
}
isxdigit() is a simple function that takes and returns a character 1 If it is a hex digit. 0-9 or above. It can also take value EOF. Uh, okay. what is the cost EOF? Per C23 7.4p1 we know it’s a intAnd we can infer that it cannot be represented unsigned char.
isxdigit() so one takes intno a char. all values of char fit inside intSo we should be fine. casting from char To int fits, so according to section 6.3.1.3 we’re fine, right?
No, because if bar() When called with a value other than 0-127, And on your architecture char Is signed (Implementation defined, per 6.2.5, paragraph 20 in C23), then the integer value becomes negative.
And the following is a valid implementation of isxdigit()Who knows what memory will be read from this. It could also be I/O mapped memory, causing things to happen that are more than just getting random values or crashing. This can start the motor. This is less likely in applications running in desktop operating systems than in embedded systems, of course. But userspaces are network drivers (for performance), so even userspaces won’t protect you.
int isxdigit(int c) {
if (c == EOF) {
return false;
}
return some_array[c];
}
casting from float To int
int milliseconds(float seconds) {
int tmp = (int)(seconds * 1000.0); /* WRONG */
return tmp + 1; /* WRONG separately (signed overflow is UB) */
}
When a finite value of real floating type is converted to an integer type[…]If
the value of the integral part cannot be represented by the integer type, the
behavior is undefined.
— 6.3.1.4
And, by default, if float is a non-finite value then it is also UB.
So how do you compare floats? INT_MAX? Do you put float?
int? No, this is the UB you want to avoid. so you cast INT_MAX float? How do you know it can be accurately represented? maybe casting INT_MAX
To float Rounded to a value that cannot be displayed intAnd your comparison becomes non-representative?
Maybe the following works? You’ll miss representing some really high values, but maybe that’s okay?
int milliseconds(float seconds) {
const float ftmp = seconds * 1000.0f;
if (!isfinite(ftmp)) {
// or other error reporting.
return 0;
}
if ((float)(INT_MIN + 1000) > ftmp) {
// or other error reporting.
return 0;
}
if ((float)(INT_MAX - 1000) < ftmp) {
// or other error reporting.
return 0;
}
// Now safe to convert.
const int tmp = (int)ftmp;
if (INT_MAX == tmp) {
// or other error reporting.
return 0;
}
// Now safe to add.
return tmp + 1;
}
I just wanted to convert a float to int. 🙁
I bet there’s a lot of code out there that takes a value in seconds, and converts it to integer milliseconds just by multiplying and casting.
object at address zero
Most programmers won’t have to deal with this, but I don’t think there is any C standards-compliant way in practice to put an object at address zero. This can come in the OS kernel and embedded coding.
By 6.3.2.3 an integer constant zero (which is convertible to a pointer) and
nullptr are “null pointer constants” (what I will call for now NULL). C does not specify what the actual pointer is NULL The digit ADR is the machine address zero, because the C standard only talks about the C abstract machine, not the hardware.
The only guarantee is that if you compare NULL Up to zero you will see them equal. But for all you know that’s because the zero has been converted to the original platform NULLwhatever happens 0xffff.
It also explicitly says that dereferencing a null pointer, no matter what the value, is undefined behavior. Its Example of UB under 3.4.3.
This also means that you cannot assume that memset(&ptr, 0, sizeof(ptr)); will make one NULL Pointer! You cannot initialize your structs this way and assume that the members are pointers. NULL! and this does Apply to most programmers.
And yes, some historical machines used non-null NULL pointers.
But let’s say you have a modern machine, where NULL There is a pointer to address void, and you actually have an object there.
Again, this is what C 6.3.2.3 says NULL Unequal is compared to “any object or function”. So this is the UB:
void (*func_ptr)() = NULL;
func_ptr();
C says, “There is no work there”. For all you know the compiler has no internal way to express your intent here. You might argue “But surely this would emit only one call instruction for the bit pattern of all zeros? Nothing else seems reasonable.”
However, what are “all zeros”? On 16bit x86, is this 0000:0000? it is CS:0000?
Variable arguments and types (for example with printf %ld instead of %lld)
This is UB:
execl("/bin/sh", "sh", "-c", "date", NULL); /* WRONG */
execl("/bin/sh", "sh", "-c", "date", 0); /* WRONG */
Not this:
execl("/bin/sh", "sh", "-c", "date", (char*)NULL);
Because the argument must be a pointer, and NULL The macro may be misinterpreted as the integer zero.
Similarly, this is UB:
uint64_t blah = 123;
printf("%ld\n", blah); /* WRONG */
It is necessary to have:
uint64_t blah = 123;
printf("%"PRIu64"\n", blah);
So how do you print? uid_t? ok, you can cast them uintmax_t and print them using PRIuMAX. but it is uid_t Unsigned too? oh well, worst case you get a nonsense value printed instead -1I think.
dividing by zero is UB
Sure, you probably know this. But have you considered its security aspects? It is not rare for the denominator to come from unreliable input.
There is so much more. The C23 standard contains 283 uses of the word “undefined”. And this doesn’t even include things that are undefined by omission.
Bonus Non-UB
One cannot enforce integer promotion rules at code skimming speed. nobody.
This post is already quite long, but as a start:
unsigned char a = 0xff;
unsigned char b = 1;
unsigned char zero = 0;
bool overflowed = (a + b) == zero;
// overflowed is set to zero, not one.
unsigned char a = 0x80;
uint64_t b = a << 24; // Bonus UB(?)
// b is now 18446744071562067968 (ffffffff80000000), not 2147483648 (0x80000000).
// even with all our variables unsigned.
LLM is better than us in this matter
Point the LLM to any C code, ask him to find the UB, and he will do it. And this will be true almost all the time these days.
I felt a little bad after finding it correctly in my code, so I thought I’d point it out to the mature and pedantically written OpenBSD. I just picked the first tool I could think of, findAnd it spit a lot.
I sent a patch to the project for out of bounds writes (and also for a non-UB logic bug). I didn’t send them patches for UB left and right, partly because the OpenBSD project hasn’t been very receptive to bug reports in the past, my understanding is “it’s probably fine in practice”, and if OpenBSD wants to remove UB from their code base, it’s a major project that should be done better than being the intermediary between LLM and them for patches here and there.
So what do we do now?
We can’t just throw away our C/C++ code base. But leaving them naturally broken is also not an option.
We need some way to fix UB on a large scale, without the carelessness of AI and without putting pressure on human reviewers.
This is also not a new opinion, nor is it a big revelation.
But yes, writing C/C++ without LLM supervision for UB in 2026 should probably be seen as a SOX violation, and downright irresponsible. If the OpenBSD people couldn’t figure out these problems in 30+ years, what chance do the rest of us have?
It may not scale to larger code bases, but for my own projects I’ve asked LLMs to find the UB, explain it if necessary, and fix it. And keep looking at the output until I confirm the problem and solution.
One problem with this is that to confirm the findings, you would need an expert human. But generally experts are busy with other work. This is the job of the watchman, but it’s subtle enough that it can’t be left to the junior programmers who have traditionally been assigned the job of watchman.
Disqus has started showing ads. 🙁
Showing (possibly incomplete) comments in static read-only view. Click the button to be able to leave comments.
<a href