Mercurial > hg > truffle
diff graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt @ 2509:16b9a8b5ad39
Renamings Runtime=>GraalRuntime and Compiler=>GraalCompiler
author | Thomas Wuerthinger <thomas@wuerthinger.net> |
---|---|
date | Wed, 27 Apr 2011 11:50:44 +0200 |
parents | graal/Compiler/src/com/sun/c1x/doc/differences.txt@9ec15d6914ca |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt Wed Apr 27 11:50:44 2011 +0200 @@ -0,0 +1,154 @@ +Differences between C1 and C1X, including upgrades and limitations +(and some general information about C1) +====================================================================== + +StrictFP: + - C1X has removed the backend code to deal with the FPU stack, and therefore + requires SSE2 currently. StrictFP is still tracked in the front end. + - C1 will not inline methods with different strictfp-ness. C1X does not have this + limitation because it only targets SSE2 x86 processors. + +JSR/RET + - C1 will bail out if it encounters strange JSR/RET patterns + - recursive JSRs + - JSR regions that are shared with non-JSR code + - RET encountered out of JSR (would not verify) + +Exceptions + - C1 will bailout if the code of an exception handler can be reached via normal + control flow. + => C1X might be extended to introduce a phi for the exception + object in this case. + - C1 will bailout if an exception handler covers itself + +Verification + - C1 does not rely on bytecode verification having been run. However, if it detects + type errors in its building the IR graph it will usually bail out. + - C1 requires a bitmap of the bytecode, where a bit for + each byte of the bytecode indicates if the bytecode at that location starts a + basic block. It uses this to construct the basic block list in a single pass. + => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out + the compilation instead of crashing the VM. + => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode + and one pass over the successor lists. + => C1X computes the "stores in loops" only when loops are encountered in the CFG. + An option can select conservative mode (all locals stored in all loops) trades + faster parse speed for fewer optimization opportunities + => C1X includes an IRChecker that typechecks the entire IR and checks for CFG + consistency that can be run after each pass. + +Constants + => C1X allows unrestricted use of object constants throughout the code, including + folding reads of static final fields that reference objects. + +Pinning + => C1X pins fewer instructions than C1 + ** C1X will eventually allow certain kinds of instructions to float outside the CFG + and be scheduled with a C2-lite scheduling pass. + +Synchronization + - C1 will refuse to compile methods with unbalanced synchronization. This property is + computed by the bytecode verifier and supplied to C1. + ** C1X will not rely on the bytecode verifier to compute this but should do so itself. + => C1 relied on the backend to generate synchronization code for the root method's + synchronization operations. C1X inserts code into the start block and generates + and exception handler to do this explicitly. + +Optimizations + => C1X has many more options to turn on individual passes, parts of passes, approximations, + etc. It is designed to have three optimization levels: + 0 = super-fast: essentially no optimization + 1 = fast: inlining, constant folding, and local optimizations + 2 = optimized: inlining, constant folding, local and global optimizations, including + iterative versions of all algorithms + ** Planned optimizations for C1X that C1 does not have: + TypeCheckElimination: remove redundant casts and devirtualize more call sites + ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure + code to deoptimize when bounds checks within loops will fail + LoopPeeling: replicate the first iteration of a loop + LoopUnrolling: replicate the body of certain shapes of loops + LoopInvariantCodeMotion: move invariant code out of a loop + ProfileGuidedInlining: use receiver method profiles to emit guarded inlines + ProfileGuidedBlockLayout: use profiling information for code placement + Peephole: peephole optimize backend output + +Block Merging + ** C1X will replace branches to blocks with a single Goto with a branch to the + block's successor, if the blocks cannot be merged otherwise. + +Constant Folding / Strength reduction + - C1 had some of its strength reduction logic built into the GraphBuilder because + the Canonicalizer could not return multiple instructions. + => C1X added this ability, moved the logic to Canonicalizer, and added a few new + strength reductions. + => C1X should have an interface for doing folding of @FOLD method calls + => C1X folds many intrinsic operations that don't have side effects + => C1X folds all the basic floating point operations + => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant + => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer + (instead of the backend) + ** C1X will be able to run a global sparse conditional constant propagation phase + to catch any missed canonicalization opportunities after graph building. + +Switches + - C1 did not detect back edges in tableswitch/lookupswitch default branches + => C1X does detect these back edges + => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer, + where it belongs + +Inlining + - C1 cannot inline: + - native methods (or their stubs), except some intrinsics + - methods whose class has not been initialized + - methods with unbalanced monitors + - methods with JSRs (this is probably technically possible now) + + - C1 will not inline: + - methods with exception handlers (optional) + - synchronized methods (optional) + - if the maximum inline depth is reached (default = 9) + - if the maximum recursive inline depth is reached (default = 1) + - if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35) + - constructors for subclasses of Throwable + - if the strictfp-ness of the callee is different than the caller (on x87) + - abstract methods + - synchronized intrinsics + +Load/store elimination + => C1X may eliminate loads of static fields, which C1 did not + => C1X distinguishes loads/stores to different fields in MemoryBuffer + => C1X assumes that RiField instances are unique when .isLoaded() is true + +Local/Global Value Numbering + => C1X improved local load elimination and no longer value numbers fields, reducing the + logic necessary in ValueMap, simplifying it and improving its performance. + => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer + value numbered, the logic to kill values is unnecessary, greatly simplifying + GVN. + ** A global version of load elimination will compensate for this loss in the future. + => C1X value numbers are always or'd with a high order bit when value numbering is possible + to prevent value numbering failing if the value number is accidentally 0. + +Nullcheck elimination + => A new flag, NonNull, indicates instructions that produce values that are guaranteed + to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null + checks check this flag for their inputs in their constructors, eliminating most + redundant null checks immediately, without requiring the NullCheckEliminator to run. + => C1X uses a more efficient block ordering for null check elimination. The first pass is + optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs, + this almost always succeeds, requiring no iteration. Full iterative data flow analysis + can be enabled separately. Bitmaps used during the fixpoint calculation are much + smaller due to local numbering of instructions (as opposed to global IDs). + ** C1X will recognize If's that check against null and propagate the non-nullness across + the appropriate branches. + +BlockListBuilder + - C1 had a vestigial loop map in BlockListBuilder which was not really used. + => C1X does not need to compute a complete loop map in order to do selective phi creation, + it builds the "storesInLoops" BitMap in BlockMap. + +Types + => C1X adds the declared type of method parameters to Local instructions, which + may help with devirtualization + => C1X makes local 0 of instance methods non-null at the start +