JVM exceptions are weird: a decompiler perspective
hacker news lobster Russian
Some time ago, I attempted to decompile Java class files in a more efficient way than traditional solutions like Vineflower. Eventually, I wrote an article on my approach to decomposing control flow, which ended up giving a great performance boost to my prototype.
At the time, I believed that this method could be directly extended to handle exceptional control flow, i.e. decompiling. try,catch Block. Looking back, I should have known it wouldn’t be that easy. This shows that there are many strange cases ranging from strange to strange. javac The behavior results from the JVM design and class file format, which makes it quite complex. In this post, I’ll cover the details of why simple solutions don’t work, and which approach I ultimately decided on.
However, there is exceptional control flow underlying it, so it cannot be controlled in the same way. even one try The block must catch all exceptions raised within its scope, regardless of whether they are exceptions forwarded from a method call. invokevirtualDivision by zero idivor in void pointer dereference getfieldSuch relations cannot be efficiently encoded in bytecode, so they are stored separately, exception table,
Each entry in the exception table specifies which area of the instruction is associated with which exception handler. If an exception is raised in such a field, the stack is cleared, the exception object is pushed onto the stack, and control is transferred to the first instruction of the handler.
For example, here’s what the bytecode and exception table looks like for a simple method try,catch block (built with) javap -c,
static void method() {
try {
System.out.println("Hello, world!");
} catch (Exception ex) {
System.out.println("Oops, an error happened");
}
}
Code:
0: getstatic #7
3: ldc #13
5: invokevirtual #15
8: goto 20
11: astore_0
12: getstatic #7
15: ldc #23
17: invokevirtual #15
20: return
Exception table:
from to target type
0 8 11 Class java/lang/Exception
If there are multiple rows in the exceptions table, the first matching row is used. For example, if nested try block, inner try The block will be listed first, followed by the outer block.
As we will see shortly, this is not a hypothetical, and real-world layer files often violate these “obvious” assumptions. This makes it important to handle the problem well, not only if you are building an unconditionally perfect decompiler, as I am trying to do, but building any decompiler.
It is not clear how finally The block must know where to move the control next. One option is to store the potential exception in a hidden variable to rethrow and treat. null as a sign that try The block completed without any exceptions, but this is not enough. A try Blocks may have exit points in addition to fallthrough: continue, breakand even return There may be exit points, each requiring a separate post-finally Target. Handling this properly would require a jump table, which is likely to be slow, not to mention confusing JIT compilers and static analyzers, which also involves verifying in the JVM that uninitialized variables are not accessed.
instead, javac The cursed and the genius do something together: it imitates finally Body at each exit path. Let’s consider the following snippet:
static void method() {
try {
try_body();
} catch (Exception ex) {
throw ex;
} finally {
finally_body();
}
}
Code:
0: invokestatic #7
3: invokestatic #12
6: goto 18
9: astore_0
10: aload_0
11: athrow
12: astore_1
13: invokestatic #12
16: aload_1
17: athrow
18: return
Exception table:
from to target type
0 3 9 Class java/lang/Exception
0 3 12 any
9 13 12 any
First, javac believes that try The body may fall, so it adds a call finally_body right after try body, followed for a leap return, catch The body cannot fall, therefore finally_body is not inserted after 11: athrow,
other thing, javac believes that try The body may throw, so it wraps it in a catch-all handler (0 3 12 any in the table) which saves the thrown exception, call finally_bodyAnd then re-throws the saved exception. Similarly, catch The body can throw, so it’s also wrapped in a catch-all handler (9 13 12 any on the table).
For whatever reason, the scope of this last catch-all handler additionally covers the handler’s first instruction as well. I’ve narrowed it down to one suspicious line javac code, but it’s been there for so long, I doubt anyone will want to touch it. Even if this is fixed at some point, older category files will still suffer from this problem, so it’s not like we can expect to forget about it.
for one thing, Any JVM can throw instructionsThe JVM specification is very clear about this: VirtualMachineError ,[…] “May be thrown at any time during the operation of the Java Virtual Machine”. VirtualMachineError Like there’s a superclass of bangers OutOfMemoryError And StackOverflowErrorAnd I don’t think it’s hard to imagine a JVM interpreter that throws StackOverflowError When a JVM-internal function runs out of stack, or OutOfMemoryError If any ad-hoc allotment fails. even astore_1 This can be a real throw off if the array of locals is allocated on demand. At least we won’t have to deal with it Thread.stop Throwing arbitrary exceptions at arbitrary points since Java 20.
But a false positive (ie catching an exception when it shouldn’t be caught) is just part of the problem. A false negative can also occur under certain conditions. consider the following:
static int method(boolean condition) {
try {
if (condition) {
return 1;
}
} finally {
finally_body();
}
return 2;
}
The main goal here is to make return statement within one try,finally block. When if (condition) parts and initialization 1 will be covered by try Area, return Self must be preceded by a call finally_bodywhich should be located outside tryso where is it return Do the instructions run on their own? it turns out that javac Generates out of try block:
From the source code, we’d expect exceptions arising during return to be caught by try Block, yet they are not. But Definitely return can only throw VirtualMachineErrorCan we turn a blind eye to this? Not at all: According to the JVM specification, return can also throw IllegalMonitorStateException If, for example, some monitors acquired during the execution of the function are not released until the function returns. javac produces code that never exhibits this behavior, and since monitors are incompatible with coroutines, it is likely that other frontends will not use this feature as much. But hand-written Java bytecode is not guaranteed to be valid in this respect, so a decompiler will still need to take this design oddity into account.
My solution is ineffective on this. If the monitor can be statistically verified to be accurate, return can't throw, and the worst thing that can happen is OOM or stack overflow during astore Caught/not caught by mistake, which cannot happen on Hotspot or any other reasonably efficient JVM implementation. This means that we can assume that, for all intents and purposes, most instructions cannot throw. On the other hand, if the compactness of the monitor cannot be verified, the decompiler cannot produce Java code anyway, so how exactly javac The interpretation of the resulting pseudocode does not matter.
JVM is weird because it has two types of checkers. If the bytecode compiler provides a table (called StackMapTable) which contains information about the type of each stack element at each point, the JVM only needs to verify that all operations are correctly typed. If no such table is provided, the JVM needs to infer the types instead. Since type inference takes very little time, StackMapTable Required to be present in all classfiles since Java 6. However, modern JVMs are still capable of loading older classfiles, so we'll be stuck with two type checkers for some time.
There is one big difference between both types of checkers: When performing type checking by validation (i.e. by using). StackMapTable) validates every instruction in the bytecode, type checking by inference necessarily only validates each accessible instructions, because it cannot know the stack layout of accessible instructions. This means invalid combinations of bytecode instructions, e.g. iconst_1; laddMay be present in older classfiles, but not in newer ones.
How is this relevant to exception handling? Since the rows in the exceptions table have two parameters to And target which generally corresponds to Java code (try ends at }immediately followed catch (...) {), but often differ in bytecode (for example a goto in the middle), you can foolishly try to expand it try right side limit of target If there is no instruction in the range to,target Can throw. This extension has a strange side effect: if there are no directives in the range from,to was previously accessible, but to,target is accessible, then you have made the exception handler accessible when it was not accessible in the bytecode. And in older classfiles, this can cause legitimate code to appear mistyped. This is bad!
Of course, you may not be interested in handling old classfiles, but it's time to discuss why this Band-Aid doesn't have a chance to work regardless.
try {
if (condition) {
return 1;
} else {
return 2;
}
} finally {
finally_body();
}
Even if the finally the block is missing, javac Considers it only empty, yet puts it out return And goto statement from try Categories:
try {
if (condition) {
return 1;
} else {
return 2;
}
} catch (Exception ex) {
return 3;
}
Code:
0: iload_0
1: ifeq 6
4: iconst_1
5: ireturn
6: iconst_2
7: ireturn
8: astore_1
9: iconst_3
10: ireturn
Exception table:
from to target type
0 5 8 Class java/lang/Exception
6 7 8 Class java/lang/Exception
(This also means that the code between to And target There's not always just one goto or a return - May also include ingredients from finally Block, which is not guaranteed not to be thrown.)
Perhaps the most confusing implication is that exception handling categories can cross control flow structures (for example it is possible). from be outside of if and for to to be inside one if), categories of Discount EH corresponds to a single state in the source code, and thus cannot cross control flow. So in the eyes of a decompiler, the above code should be parsed like this:
try #1 {
if (condition) {
int tmp = 1;
exempt #1 {
return tmp;
}
} else {
int tmp = 2;
exempt #1 {
return tmp;
}
}
} catch (Exception ex) {
}
return 3;
...and not by making any one try Block for each row. Then the decompiler can verify this exempt There are blocks on each exit path of try Block and keep matching content, and simplify code try,finallyThe details are vague and I haven't understood everything myself yet, but I believe it can be implemented in one go,
I wanted this post to be specifically about Java gimmicks rather than my decompiler, so that's all for now. If I missed anything important or you want to share any thoughts, feel free to message me.