Mercurial > hg > truffle
view graal/GraalCompiler/bin/com/sun/c1x/doc/differences.txt @ 2619:1586b1b56f0c
Fixed typo.
author | Thomas Wuerthinger <thomas@wuerthinger.net> |
---|---|
date | Mon, 09 May 2011 19:12:45 +0200 |
parents | 16b9a8b5ad39 |
children |
line wrap: on
line source
Differences between C1 and C1X, including upgrades and limitations (and some general information about C1) ====================================================================== StrictFP: - C1X has removed the backend code to deal with the FPU stack, and therefore requires SSE2 currently. StrictFP is still tracked in the front end. - C1 will not inline methods with different strictfp-ness. C1X does not have this limitation because it only targets SSE2 x86 processors. JSR/RET - C1 will bail out if it encounters strange JSR/RET patterns - recursive JSRs - JSR regions that are shared with non-JSR code - RET encountered out of JSR (would not verify) Exceptions - C1 will bailout if the code of an exception handler can be reached via normal control flow. => C1X might be extended to introduce a phi for the exception object in this case. - C1 will bailout if an exception handler covers itself Verification - C1 does not rely on bytecode verification having been run. However, if it detects type errors in its building the IR graph it will usually bail out. - C1 requires a bitmap of the bytecode, where a bit for each byte of the bytecode indicates if the bytecode at that location starts a basic block. It uses this to construct the basic block list in a single pass. => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out the compilation instead of crashing the VM. => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode and one pass over the successor lists. => C1X computes the "stores in loops" only when loops are encountered in the CFG. An option can select conservative mode (all locals stored in all loops) trades faster parse speed for fewer optimization opportunities => C1X includes an IRChecker that typechecks the entire IR and checks for CFG consistency that can be run after each pass. Constants => C1X allows unrestricted use of object constants throughout the code, including folding reads of static final fields that reference objects. Pinning => C1X pins fewer instructions than C1 ** C1X will eventually allow certain kinds of instructions to float outside the CFG and be scheduled with a C2-lite scheduling pass. Synchronization - C1 will refuse to compile methods with unbalanced synchronization. This property is computed by the bytecode verifier and supplied to C1. ** C1X will not rely on the bytecode verifier to compute this but should do so itself. => C1 relied on the backend to generate synchronization code for the root method's synchronization operations. C1X inserts code into the start block and generates and exception handler to do this explicitly. Optimizations => C1X has many more options to turn on individual passes, parts of passes, approximations, etc. It is designed to have three optimization levels: 0 = super-fast: essentially no optimization 1 = fast: inlining, constant folding, and local optimizations 2 = optimized: inlining, constant folding, local and global optimizations, including iterative versions of all algorithms ** Planned optimizations for C1X that C1 does not have: TypeCheckElimination: remove redundant casts and devirtualize more call sites ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure code to deoptimize when bounds checks within loops will fail LoopPeeling: replicate the first iteration of a loop LoopUnrolling: replicate the body of certain shapes of loops LoopInvariantCodeMotion: move invariant code out of a loop ProfileGuidedInlining: use receiver method profiles to emit guarded inlines ProfileGuidedBlockLayout: use profiling information for code placement Peephole: peephole optimize backend output Block Merging ** C1X will replace branches to blocks with a single Goto with a branch to the block's successor, if the blocks cannot be merged otherwise. Constant Folding / Strength reduction - C1 had some of its strength reduction logic built into the GraphBuilder because the Canonicalizer could not return multiple instructions. => C1X added this ability, moved the logic to Canonicalizer, and added a few new strength reductions. => C1X should have an interface for doing folding of @FOLD method calls => C1X folds many intrinsic operations that don't have side effects => C1X folds all the basic floating point operations => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer (instead of the backend) ** C1X will be able to run a global sparse conditional constant propagation phase to catch any missed canonicalization opportunities after graph building. Switches - C1 did not detect back edges in tableswitch/lookupswitch default branches => C1X does detect these back edges => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer, where it belongs Inlining - C1 cannot inline: - native methods (or their stubs), except some intrinsics - methods whose class has not been initialized - methods with unbalanced monitors - methods with JSRs (this is probably technically possible now) - C1 will not inline: - methods with exception handlers (optional) - synchronized methods (optional) - if the maximum inline depth is reached (default = 9) - if the maximum recursive inline depth is reached (default = 1) - if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35) - constructors for subclasses of Throwable - if the strictfp-ness of the callee is different than the caller (on x87) - abstract methods - synchronized intrinsics Load/store elimination => C1X may eliminate loads of static fields, which C1 did not => C1X distinguishes loads/stores to different fields in MemoryBuffer => C1X assumes that RiField instances are unique when .isLoaded() is true Local/Global Value Numbering => C1X improved local load elimination and no longer value numbers fields, reducing the logic necessary in ValueMap, simplifying it and improving its performance. => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer value numbered, the logic to kill values is unnecessary, greatly simplifying GVN. ** A global version of load elimination will compensate for this loss in the future. => C1X value numbers are always or'd with a high order bit when value numbering is possible to prevent value numbering failing if the value number is accidentally 0. Nullcheck elimination => A new flag, NonNull, indicates instructions that produce values that are guaranteed to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null checks check this flag for their inputs in their constructors, eliminating most redundant null checks immediately, without requiring the NullCheckEliminator to run. => C1X uses a more efficient block ordering for null check elimination. The first pass is optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs, this almost always succeeds, requiring no iteration. Full iterative data flow analysis can be enabled separately. Bitmaps used during the fixpoint calculation are much smaller due to local numbering of instructions (as opposed to global IDs). ** C1X will recognize If's that check against null and propagate the non-nullness across the appropriate branches. BlockListBuilder - C1 had a vestigial loop map in BlockListBuilder which was not really used. => C1X does not need to compute a complete loop map in order to do selective phi creation, it builds the "storesInLoops" BitMap in BlockMap. Types => C1X adds the declared type of method parameters to Local instructions, which may help with devirtualization => C1X makes local 0 of instance methods non-null at the start