Both true and false: a Zen moment with C
Update: Discussion on Hacker News
I ran into a really fun bug at work yesterday, where I discovered that my C program was branching down logically inconsistent code paths. After drinking another cup of coffee and firing up GDB I realized that somehow, a boolean variable in my code was simultaneously testing as both true and not true.
While I cannot reproduce the actual source code here, the effect was that code like
bool p;
/* ... */
if ( p )
puts("p is true");
if ( ! p )
puts("p is false");
would produce the output:
p is true p is false
So what’s going on here?
Well it turns out that the authors of the C language specification (and the people who went on to implement compilers for it) were serious about the concept of undefined behavior. In particular, the result of attempting to use an uninitialized variable is undefined.
And in this case that’s exactly what happened: I had failed to properly initialize some memory. Easy, bug fixed. But what I think is interesting are the reasons this code failed in precisely the way that it did. In order to investigate that, we need to get specific.
On 64-bit Linux (Ubuntu 12.04), compiling the following program:
#include <stdio.h>
#include <stdbool.h>
int main(int argc, char *argv[])
{
volatile bool p;
if ( p )
puts("p is true");
else
puts("p is not true");
if ( ! p )
puts("p is false");
else
puts("p is not false");
return 0;
}
with GCC 4.6.3, using the command line:
$ gcc bool1.c -g0 -O0 -fno-dwarf2-cfi-asm -masm=intel -S -o bool1.asm
produces this (truncated) assembly language:
.file "bool1.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "p is true"
.LC1:
.string "p is not true"
.LC2:
.string "p is false"
.LC3:
.string "p is not false"
.text
.globl main
.type main, @function
main:
.LFB0:
push rbp
.LCFI0:
mov rbp, rsp
.LCFI1:
sub rsp, 32
.LCFI2:
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
movzx eax, BYTE PTR [rbp-1]
test al, al
je .L2
mov edi, OFFSET FLAT:.LC0
call puts
jmp .L3
.L2:
mov edi, OFFSET FLAT:.LC1
call puts
.L3:
movzx eax, BYTE PTR [rbp-1]
xor eax, 1
test al, al
je .L4
mov edi, OFFSET FLAT:.LC2
call puts
jmp .L5
.L4:
mov edi, OFFSET FLAT:.LC3
call puts
.L5:
mov eax, 0
leave
.LCFI3:
ret
To perform the test if ( p ) here, first the stack variable is loaded into a 32-bit register with movzx eax, BYTE PTR [rbp-1], and then we use the instruction test al, al which sets the zero flag (ZF) if the lower eight bits of this value are zero. Next we execute the conditional jump je .L2, which jumps to print “p is not true” if ZF was set; otherwise we don’t jump, and “p is true” gets printed instead.
Next let’s examine the second test, if ( ! p ), at label .L3. This starts out the same by loading the boolean variable into register eax, but notice how the negation is handled. Rather than reorder the jumps or use jne instead of je, the compiler explicitly negates the boolean by performing a bitwise exclusive-or: xor eax, 1.
Normally this would be fine — a bool variable is only supposed to contain a value of zero or one, in which case its value can be negated by XOR with 1. When you cast to a bool at runtime, the compiler generates code to ensure only one or zero gets stored. For instance, the cast in this program:
#include <stdbool.h>
volatile char c = 0xff;
volatile bool p;
int main(int argc, char* argv[])
{
p = (bool)c;
return 0;
}
is implemented as the following four instructions:
movzx eax, BYTE PTR c[rip]
test al, al
setne al
mov BYTE PTR p[rip], al
wherein setne sets the register al to exactly 1 if the char contained any nonzero value, before saving the register’s 8-bit value to the boolean variable.
But the compiler affords us no such protection if we accidentally use an uninitialized value as a boolean. It doesn’t have to, it’s not the compiler’s responsibility; the result of using an uninitialized stack variable is undefined. And so if we somehow wind up with a value of e.g. 0x60 stored at the address of a bool variable (as I saw during my troubleshooting yesterday), both the variable and its negation (via exclusive or with 1) will be nonzero, and therefore test as true.
Interestingly, enabling optimization (-O2) in GCC causes the compiler to factor out the XOR and instead reorder the jumps, meaning this program actually behaves more robustly under compiler optimization (for certain definitions of “robust” anyway):
.file "bool1.c"
.intel_syntax noprefix
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "p is true"
.LC1:
.string "p is not true"
.LC2:
.string "p is false"
.LC3:
.string "p is not false"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB22:
sub rsp, 24
.LCFI0:
movzx eax, BYTE PTR [rsp+15]
test al, al
je .L2
mov edi, OFFSET FLAT:.LC0
call puts
.L3:
movzx eax, BYTE PTR [rsp+15]
test al, al
je .L7
mov edi, OFFSET FLAT:.LC3
call puts
.L5:
xor eax, eax
add rsp, 24
.LCFI1:
ret
And of course when we compare char, int, or other multiple-bit values for truthiness, the compiler makes no such assumption that the value can be logically negated by bitwise XOR; instead it uses jne in place of the je instruction. (Maybe someone with more knowledge of the compiler can say why GCC with -O0 uses an xor at all, when testing the negation of a bool.)
59 Responses to Both true and false: a Zen moment with C
“Doctor, it hurts when I do this!”
“Don’t do that.”
Newer version of gcc should catch unitialized variables via -Wall
$ cat uninitialized.c
int main(void)
{
int foo;
return foo;
}
$ gcc -Wall -o uninitialized uninitialized.c
uninitialized.c: In function ‘main’:
uninitialized.c:4:5: warning: ‘foo’ is used uninitialized in this function [-Wuninitialized]
Personally I use -W -Wall -Wextra -pedantic or similar
good advice, though I was compiling with -Wall already — in the real application, the boolean in question was an uninitialized struct element, which GCC did not warn me about.
It’s too pedantic for some people (I don’t use it), but just as an FYI: -Weffc++
Thanks for a really interesting post, Mark.
I was curious about how obscure/redirected the struct needed to be to defeat the compiler warning so I had a quick play. Answer: minimally obscure.
http://pastebin.com/raw.php?i=TSS7VA1m
Neither “gcc -Wall -Wextra” nor “g++ -Weffc++ -Wall Wextra” (thanks for the tip Brent) warn on that one, although it feels to me like they should be able to.
Valgrind finds it every time, though, as you’d expect it to. *Hugs valgrind patiently*.
There seems to be a problem with the -O2 gist, both are je into the puts.
Sorry for being unclear, the -O2 gist is correct though. Where I saw jne was compiling with -O0 with the same compiler, but testing a char rather than a bool variable.
This is yet another thing that golang improves over C. You can’t define “undefined” variables. Once you define them, they have a default value (i.e. 0 for integers, etc).
So is Go perfectly specified, i.e., are there no places in the spec where any behavior at all is left up to the implementation?
That seems like a good idea, except that it seems like it would make the specification about 5 times bigger.
Not necessarily. A fully and rigorously specified programming language does not entail a huge, complex language. Standard ML is the main example of a fully and rigorously specified language (specified in terms of exact mathematical constructs/operational semantics rather than imprecise prose). The _Definition of Standard ML_ (i.e., the print version of the specification) fully specifies the general purpose language in fewer than 128 pages. Considering that Standard ML has a fairly sophisticated module and type system, this is significant. Researchers have made great strides in applying this technique to model other existing languages (e.g., Featherweight Java) and discovering a good number of bugs in the process.
C assumes that you know what you are doing. The weakness that you say is also the C’s strength. C is a system programming language and the reason why it is so common among operating systems it is because it is very close to machine language. When writing an OS you are often doing many operations that often don’t even make sense to an application programmer. It would be much harder to code that when a programming language is doing additional things behind your back.
“Golang is better because it does not do this and that.” and so does JavaScript, Python, and ton of other languages, but you won’t write an Operating Systems using them.
Also it’s not hard to modify compiler to always initialize defined variables. In fact C actually does that to all variables defined as static (because of their purpose). The reason C does some things that intuitively might not seem right for an application programmer is one of the reason why C is still relevant 40 years after it was made, and nothing replaced it so far. If you use C and it constantly does not do what you want then perhaps you are using the wrong tool for your job.
But that’s not even true of C. The Linux kernel had a ton of problems recently when they were trying to do null pointer checks (for security) and the C language defined that these checks don’t make sense and therefore empowered the compiler to optimize them out. That would have made perfect sense for an application program, but doesn’t make sense in the kernel.
It would be nice to have a language where what you say is in fact what gets executed, and that is close to the machine language but offers structure, but C isn’t it. C’s the closest thing we have to it, though.
Is there a good writeup of what happened and how the kernel team forced GCC to do what they wanted instead?
Derek, that’s bullshit.
You don’t write OSes in Python, JS etc, for architectural reasons as well as the fact that you’re most probably not going to have to either.
How many CompSci courses ask students to write an OS in, say, C# or Pascal?
C got famous for this task, true. But that’s the end of it. There’s no ulterior reasons why it is better at it than anyone else. In fact, given it’s time, it might even be a bad choice for this task nowadays.
This is not necessarily an improvement.
Instead of getting only one of two possible eigenstates here, you got both of them. I’d call that a successful quantum physics experiment.
Have you considered changing careers? (;
Good one!
LOL LET’S DECIDE THE PROGRAM FLOW DEPENDING ON UNINITIALISED VARIABLES
And then, when the program fails…
LOL C SUCKS
Nope, I’m not claiming C sucks by any means — there are very good reasons we’re using it for this project, aside from the fact that it’s the default language in this space. I just thought it was fun to see what the compiler was doing under the hood here.
I think there’s something wrong with you.
No, I suspect there is something lacking – if you have read all the way down to here – yet don’t “get” why someone would be interested in this.
Well, C sucks, uninitialized variables or not.
First, you demonstrate ignorance talking about C language with the “bool” data type. “bool” isn’t a C keyword, it’s a C++ one. The most remotely similar keyword in C is “_Bool”, a C99 standard addition. The original C didn’t had any special Boolean data type. The decisions were taken whether the expression was or wasn’t 0, that was it. That leads to the second point (which you may consider it as just personal sentiment) – it isn’t really needed a Boolean type in C. That kind of addition is purely a change for the sake of change. An “int” or merely a “char” did the job just fine.
boolis defined to_Boolin stdbool.h, as used above. Yes, this C, not C++, for what that’s worth.http://pubs.opengroup.org/onlinepubs/007904875/basedefs/stdbool.h.html
Just because something is written in C, dose not mean it is a part of C. Looking at the link you provided is for a POSIX header, this is not a C problem.
It is needed because bool has some semantic value. By your logic, why have char if int can take care of it?
int and char are conventionally different widths.
Int can’t take care of char. They are two different sizes.
And int and char can’t take care of bool – they have different ranges. By having this distinction at the language level, the compiler knows that the value is either 0 or 1, and it can produce far more sensible code. To negate a GCC int on x86, you need:
test eax, eax
jnz false
true:
mov eax, 1
jmp end
false:
mov eax, 0
end:
To negate a GCC bool, you need:
xor eax, 1
why not simply like:
test eax, eax
jz false
mov eax, 0
false:
xor eax, 1
daef: At the end of your code, eax is always 1. Your intention may have been to mov eax, 1 – in which case it would work, and be slightly faster and branch slightly less – either way, we still need a branch and we can’t compare with the simplicity of the xor. Most of these speedups go away if we’re moving to another register at the same time. (I’m assuming this is why I haven’t come across your implementation in compiled C code.)
It should be noted that this could also use the bool casting code from original post, followed by the xor. But all of these have slight overhead and add to the complexity of this extremely simple task. (Actually, xor is so fast that most of these probably take more than three times longer because of our looser types.)
The real lesson here is that you should always compile with -Wall (at a minimum) and do not under any circumstances ignore compiler warnings.
Absolutely, I agree! The real application in question is built with
-Wall -Werror, in fact, but failed to catch this uninitialized boolean because it was an element of a struct (unlike in my simplified example).Fascinating seeing how this happens through assembly. Suddenly the “bizarre, makes no sense undefined behavior” thing makes sense– in fact it’s perfectly logical and almost even inevitable. Ahh the beauty of C, it will never grow old or stop amazing me.
I’m saddened by the low-quality responses on here. Everyone seems either eager to a) try and bash the bloke, or b) argue about languages. All you idiots speaking in platitudes about this or that
THE REAL QUESTION I WANTED IN THE COMMENTS WAS WHY IS THE COMPILER DOING THAT
you’d think the internet’s brightest and best would be able to answer PLEASE POST TO STACKOVERFLOW where someone will know (or I will)
This way a compiler doesn’t have to worry about the context of the operation !p. If you were to store the result of that into another boolean the operation would be faster. There would be no need for branching statements. Suppose not. If you did something like this, q = !p, the resulting pseudo-assembly would be like this:
load p
test p, 0×00
jne .false
store q, 0
j .done
.false:
store q, 1
.done:
In the fashion it is done, the code is:
load p
xor p, 1
store q, p
*Note, my storage pseudo-assembly commands store the second operand into the first
Completely agree, every second idiot has something to pedantic to argue about but I applaud Mark for having the curiosity to investigate and then share his findings. Interesting read, pity about the immature responses.
Great post, and some rather amusing undefined behavior, heh. I think this would have made me tear my hair out! Thanks for going into detail with the assembly, it’s something I’ve been interested in but never grasped the basic concepts of it so its mainly unintelligible nonsense to me. Helped me understand the overlaying behavior with the C though. Cheers!
-Wall -Werr or stfu
Let’s all agree to learn C before we write code.
Nice finding and interesting analysis. Thanks.
A smarter optimizer should have translated this code as a pure no-op (printing neither true nor false), as any operation using an uninitialized value can be optimized away !
More seriously, I am always amazed to see how numerous well-behaved C programs there are around, despite then number of undefined behaviors present in them and that the compilers do not tell us about.
Why all the fuzz here? No matter what programming language one uses, one should always define and initialize all variables, structures etc.
right?
While I haven’t read any GCC source code or talked to a GCC developer, I have spent the past couple of months working a decompiler for x86 GCC code, which involves a lot of staring at compiler output and reading a handful of compiler optimisation papers.
The short answer is that the binary negation on a “bool” type value is defined by gcc as “value ^ 1″. Probably because it’s significantly faster, at least when not conditional branching. (! (unary not) on numeric values relies on branching (test, cmp, jnz…) to get a 1 or 0 value, and “xor” is really fast)
The longer answer:
Compiler optimisations are procedures which analyse the flow of control and data throughout a program and modify it into a more optimal form. A good example of this is the removal of a variables from the stack into a register. While this sounds simple, every use of each variable must be analysed to ensure that a pointer to it is never required, and all of those should also be analysed to choose the most sensible ones to keep in registers. A very similar process is recognising constant local variables (whose value is never changed and whose address is never acquired) and substituting the value in every use throughout the functions. A more extreme example is finding a branch (if/else) after which the same operations are performed and moving those operations to before the branch.
Code generation has it’s own “optimising” behaviours, but this does not look at the bigger picture or analyse surrounding code, and is not considered an optimisation in GCC. The best example of this is the % (modulo) operator, which is implemented in x86 as part of the large, inefficient ‘idiv’ instruction. If given a constant operand, however, GCC at -O0 will produce some of the most cryptic x86 I have seen a compiler produce. Note the complete absence of the number three in the assembly in the following example. If you’d like to explain how this works, be my guest.
main(){
volatile int i = 13;
printf("%d", i % 3);
}
mov [ebp-0xc], 13 ; [ebp-0xc] is i
mov eax, [ebp-0xc]
mov ecx, esp ; in no way related to the modulo
mov edx, 0xaaaaaaab
mov [ebp-0x10], eax ; save eax
imul edx
mov eax, [ebp-0x10] ; restore eax
lea edx, [edx+eax]
mov esi, edx
shr esi, 0x1f
sar edx, 1
lea edx, [edx+esi]
mov esi, edx
lea edx, [edx+esi*2]
sub eax, edx
mov [ecx+0x4], eax ; at this point, somehow, eax = i % 3
mov [ecx], 0x1fa8 ; "%d"
call 0x1f82
Where the separation between optimisation and code-generation lies is unclear to me. It may be somewhat arbitrary or based solely on where things are easiest to implement (without slowing down compilation of course). Certain multiplication/addition patterns will sometimes get converted to ‘lea’ and ‘shl’ instructions (some x86 Arithmetic Idioms) at higher optimisation levels, but are often left untouched without optimisation. Perhaps this is because they involve a larger scope (not just handling one operator), because they are special cases applicable only to certain values, or because there are so many different methods that must be weighed up for each case.
And you are very lucky indeed since “a Ninja could have entered your place and kill you” since this would have fit the definition of undefined behaviour as well.
Anyway, I agree with everyone else in the thread on the fact that you should have used -Wall and -pedantic.
Curiously, gcc 4.6.3 on Ubuntu 32-bit in a VMware VM produces identical assembler but a different result – the stack seems to end up zero-initialised and so the variable is correctly false.
Sorry that I wasn’t as clear as I should have been about this… yep, the simplified example I presented above will not exhibit this behavior, since it performs its tests before the process has done anything to muck up its stack address space (and as you observed, the process’s initial stack address space is zeroed out by the kernel). This works the same on x86-64, too.
That example was only meant to show the kind of machine code produced, which would cause that behavior to be exhibited if the stack contained arbitrary non-zero data.
I’ve added a better (but longer) example here, which intentionally causes a stack write before performing this test, in order to demonstrate the actual behavior. This works on x86-64 Linux when compiled without optimization, anyway:
https://gist.github.com/3001231#file_bool2.c
VC++ will give warning for uninitialized variables :P
Hmmm. Either you didn’t bother to read any of the other comments, or you did and just can’t resist a snide remark anyway. Try compiling this in VC++2010:
#include
#include
typedef unsigned char bool;
struct foo {
int x;
float y;
bool z;
};
int main() {
struct foo *a = malloc(sizeof(struct foo));
if(a->z)
puts("z is true\n");
else
puts("z is not true\n");
if(!a->z)
puts("z is false\n");
else
puts("z is not false\n");
}
Do you see a compiler warning? Nope, me neither.
Schroedinger’s Boolean!
Ahh forum trolls, gotta love… well… actually hate them.
This is a fantastic article, not from any attempt at conveying the lesson that you shouldn’t use uninitialized variables which the trolls seem to have pounced upon, but rather from sharing some very interesting information about what actually happens when you do.
It’s the fundamental difference between knowing not to do something (shallow knowledge) vs. understanding why you shouldn’t do something and what will happen if you choose to ignore, or forget to follow, what you know (deep knowledge). I found it a very interesting read.
Keep up the good work!
Real programmers program in assembly language! LOL
They should never have created a “bool” type in the first place.
Well boolean type is new in c99 and its size is not defined usually
Its assigned the smallest size which is byte
But since its included as fundamental type compiler can assume
it to be having either 0 or 1
Best would be if you got a warning here but compiler may not catch
All cases
A similar thing can happen in the JVM. It’s an unlikely scenario, but it may occur when code from two different compilers are running together, if the two compilers disagree on whether >1 values are valid integers. All Java compilers I know of will only generate 0-or-1 booleans, but the JVM allows any byte to be a boolean.
Although the Java specification says there are only two boolean values, it is possible for the Java expression “x == true || x == false” to evaluate to false when compiled with a mainstream Java compiler.
Other than defining boolean tests as returning false for 0 else true, the problem is not so much with the language itself as with the conventional implementations. Even without the uninitialized variable issues, there is still a problem: (x & TRUE) needs to be FALSE only if x is 0, but the only way to guarantee that is if TRUE is defined as int -1. But the convention is using something like typedef enum { FALSE, TRUE ) BOOL; (take your pick on spelling or even if enums are used), which by the language definition uses the smallest type that can express all the values in the enum. As someone who started off as a hardware logic engineer then moved on to becoming a programmer, I can tell you it was a painful surprise discovering this, when I was expecting bit-wise Boolean algebra to work with the compiler-defined Boolean values. It plain does not work.
And, s I further was a datacomms specialist, it was equally painful discovering the problems with enum sizing – you cannot use enums to define fields in a data packet – or equivalently define a struct over a data packet – and expect the fields to line up properly while at the same time expecting to be able to use enums for defining the states of fields like commands and status. Which means you cannot user the innate knowledge in the debuggers to display such fields using the symbols from an enum set. Bear in mind that in datacomms one has to handle line ordered data packets passing between or through nodes of wildly differing architectures programmed in wildly different languages. Including down to hardware bit level languages – some protocols were defined for hardware handling in the bad old days, but hung around to the days when hardware became generic and code took over the parsing job. Think SDLC/HDLC families (from which IP and X.25 descend, amongst others), with bit fields and multi-byte encoded values. Initially coded in Z80 or 8048 assembler.
It frankly sucks. But kept me gainfully employed for many years.
Interesting read, didn’t know that GCC assembled operations on bool types to be like that under x86. And, of course, it’s a good testament to always make sure you initialize everything.
I don’t program in C on a regular basis, but I find that C always has the most interesting and peculiar bugs. Fun to read about.
Doesn’t it cost in run-time to default initialize all stack variables? Is there some way to prevent that?
I did some programming in C but didn’t know anything about this. So much to learn with C even today.