view graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt @ 2509:16b9a8b5ad39

Renamings Runtime=>GraalRuntime and Compiler=>GraalCompiler
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Wed, 27 Apr 2011 11:50:44 +0200
parents graal/Compiler/src/com/sun/c1x/doc/differences.txt@9ec15d6914ca
children
line wrap: on
line source

Differences between C1 and C1X, including upgrades and limitations
(and some general information about C1)
======================================================================

StrictFP:
   - C1X has removed the backend code to deal with the FPU stack, and therefore
     requires SSE2 currently. StrictFP is still tracked in the front end.
   - C1 will not inline methods with different strictfp-ness. C1X does not have this
     limitation because it only targets SSE2 x86 processors.

JSR/RET
   - C1 will bail out if it encounters strange JSR/RET patterns
       - recursive JSRs
       - JSR regions that are shared with non-JSR code
       - RET encountered out of JSR (would not verify)

Exceptions
   -  C1 will bailout if the code of an exception handler can be reached via normal
      control flow.
   => C1X might be extended to introduce a phi for the exception
      object in this case.
   -  C1 will bailout if an exception handler covers itself

Verification
   -  C1 does not rely on bytecode verification having been run. However, if it detects
      type errors in its building the IR graph it will usually bail out.
   -  C1 requires a bitmap of the bytecode, where a bit for
      each byte of the bytecode indicates if the bytecode at that location starts a
      basic block. It uses this to construct the basic block list in a single pass.
   => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out
      the compilation instead of crashing the VM.
   => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode
      and one pass over the successor lists.
   => C1X computes the "stores in loops" only when loops are encountered in the CFG.
      An option can select conservative mode (all locals stored in all loops) trades
      faster parse speed for fewer optimization opportunities
   => C1X includes an IRChecker that typechecks the entire IR and checks for CFG
      consistency that can be run after each pass.

Constants
   => C1X allows unrestricted use of object constants throughout the code, including
      folding reads of static final fields that reference objects.

Pinning
   => C1X pins fewer instructions than C1
   ** C1X will eventually allow certain kinds of instructions to float outside the CFG
      and be scheduled with a C2-lite scheduling pass.

Synchronization
   -  C1 will refuse to compile methods with unbalanced synchronization. This property is
      computed by the bytecode verifier and supplied to C1.
   ** C1X will not rely on the bytecode verifier to compute this but should do so itself.
   => C1 relied on the backend to generate synchronization code for the root method's
      synchronization operations. C1X inserts code into the start block and generates
      and exception handler to do this explicitly.

Optimizations
   => C1X has many more options to turn on individual passes, parts of passes, approximations,
      etc. It is designed to have three optimization levels:
      0 = super-fast: essentially no optimization
      1 = fast:       inlining, constant folding, and local optimizations
      2 = optimized:  inlining, constant folding, local and global optimizations, including
                      iterative versions of all algorithms
   ** Planned optimizations for C1X that C1 does not have:
      TypeCheckElimination:        remove redundant casts and devirtualize more call sites
      ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure
                                   code to deoptimize when bounds checks within loops will fail
      LoopPeeling:                 replicate the first iteration of a loop
      LoopUnrolling:               replicate the body of certain shapes of loops
      LoopInvariantCodeMotion:     move invariant code out of a loop
      ProfileGuidedInlining:       use receiver method profiles to emit guarded inlines
      ProfileGuidedBlockLayout:    use profiling information for code placement
      Peephole:                    peephole optimize backend output

Block Merging
   ** C1X will replace branches to blocks with a single Goto with a branch to the
      block's successor, if the blocks cannot be merged otherwise.

Constant Folding / Strength reduction
   -  C1 had some of its strength reduction logic built into the GraphBuilder because
      the Canonicalizer could not return multiple instructions.
   => C1X added this ability, moved the logic to Canonicalizer, and added a few new
      strength reductions.
   => C1X should have an interface for doing folding of @FOLD method calls
   => C1X folds many intrinsic operations that don't have side effects
   => C1X folds all the basic floating point operations
   => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant
   => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer
      (instead of the backend)
   ** C1X will be able to run a global sparse conditional constant propagation phase
      to catch any missed canonicalization opportunities after graph building.

Switches
   -  C1 did not detect back edges in tableswitch/lookupswitch default branches
   => C1X does detect these back edges
   => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer,
      where it belongs

Inlining
   -  C1 cannot inline:
      -  native methods (or their stubs), except some intrinsics
      -  methods whose class has not been initialized
      -  methods with unbalanced monitors
      -  methods with JSRs (this is probably technically possible now)

   -  C1 will not inline:
      -  methods with exception handlers (optional)
      -  synchronized methods (optional)
      -  if the maximum inline depth is reached (default = 9)
      -  if the maximum recursive inline depth is reached (default = 1)
      -  if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35)
      -  constructors for subclasses of Throwable
      -  if the strictfp-ness of the callee is different than the caller (on x87)
      -  abstract methods
      -  synchronized intrinsics

Load/store elimination
   => C1X may eliminate loads of static fields, which C1 did not
   => C1X distinguishes loads/stores to different fields in MemoryBuffer
   => C1X assumes that RiField instances are unique when .isLoaded() is true

Local/Global Value Numbering
   => C1X improved local load elimination and no longer value numbers fields, reducing the
      logic necessary in ValueMap, simplifying it and improving its performance.
   => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer
      value numbered, the logic to kill values is unnecessary, greatly simplifying
      GVN.
   ** A global version of load elimination will compensate for this loss in the future.
   => C1X value numbers are always or'd with a high order bit when value numbering is possible
      to prevent value numbering failing if the value number is accidentally 0.

Nullcheck elimination
   => A new flag, NonNull, indicates instructions that produce values that are guaranteed
      to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null
      checks check this flag for their inputs in their constructors, eliminating most
      redundant null checks immediately, without requiring the NullCheckEliminator to run.
   => C1X uses a more efficient block ordering for null check elimination. The first pass is
      optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs,
      this almost always succeeds, requiring no iteration. Full iterative data flow analysis
      can be enabled separately. Bitmaps used during the fixpoint calculation are much
      smaller due to local numbering of instructions (as opposed to global IDs).
   ** C1X will recognize If's that check against null and propagate the non-nullness across
      the appropriate branches.

BlockListBuilder
   -  C1 had a vestigial loop map in BlockListBuilder which was not really used.
   => C1X does not need to compute a complete loop map in order to do selective phi creation,
      it builds the "storesInLoops" BitMap in BlockMap.

Types
   => C1X adds the declared type of method parameters to Local instructions, which
      may help with devirtualization
   => C1X makes local 0 of instance methods non-null at the start