Mercurial > hg > graal-jvmci-8

diff graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt @ 2509:16b9a8b5ad39
Renamings Runtime=>GraalRuntime and Compiler=>GraalCompiler
author: Thomas Wuerthinger <thomas@wuerthinger.net>
date: Wed, 27 Apr 2011 11:50:44 +0200
parents: graal/Compiler/src/com/sun/c1x/doc/differences.txt@9ec15d6914ca
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt	Wed Apr 27 11:50:44 2011 +0200
@@ -0,0 +1,154 @@
+Differences between C1 and C1X, including upgrades and limitations
+(and some general information about C1)
+======================================================================
+
+StrictFP:
+   - C1X has removed the backend code to deal with the FPU stack, and therefore
+     requires SSE2 currently. StrictFP is still tracked in the front end.
+   - C1 will not inline methods with different strictfp-ness. C1X does not have this
+     limitation because it only targets SSE2 x86 processors.
+
+JSR/RET
+   - C1 will bail out if it encounters strange JSR/RET patterns
+       - recursive JSRs
+       - JSR regions that are shared with non-JSR code
+       - RET encountered out of JSR (would not verify)
+
+Exceptions
+   -  C1 will bailout if the code of an exception handler can be reached via normal
+      control flow.
+   => C1X might be extended to introduce a phi for the exception
+      object in this case.
+   -  C1 will bailout if an exception handler covers itself
+
+Verification
+   -  C1 does not rely on bytecode verification having been run. However, if it detects
+      type errors in its building the IR graph it will usually bail out.
+   -  C1 requires a bitmap of the bytecode, where a bit for
+      each byte of the bytecode indicates if the bytecode at that location starts a
+      basic block. It uses this to construct the basic block list in a single pass.
+   => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out
+      the compilation instead of crashing the VM.
+   => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode
+      and one pass over the successor lists.
+   => C1X computes the "stores in loops" only when loops are encountered in the CFG.
+      An option can select conservative mode (all locals stored in all loops) trades
+      faster parse speed for fewer optimization opportunities
+   => C1X includes an IRChecker that typechecks the entire IR and checks for CFG
+      consistency that can be run after each pass.
+
+Constants
+   => C1X allows unrestricted use of object constants throughout the code, including
+      folding reads of static final fields that reference objects.
+
+Pinning
+   => C1X pins fewer instructions than C1
+   ** C1X will eventually allow certain kinds of instructions to float outside the CFG
+      and be scheduled with a C2-lite scheduling pass.
+
+Synchronization
+   -  C1 will refuse to compile methods with unbalanced synchronization. This property is
+      computed by the bytecode verifier and supplied to C1.
+   ** C1X will not rely on the bytecode verifier to compute this but should do so itself.
+   => C1 relied on the backend to generate synchronization code for the root method's
+      synchronization operations. C1X inserts code into the start block and generates
+      and exception handler to do this explicitly.
+
+Optimizations
+   => C1X has many more options to turn on individual passes, parts of passes, approximations,
+      etc. It is designed to have three optimization levels:
+      0 = super-fast: essentially no optimization
+      1 = fast:       inlining, constant folding, and local optimizations
+      2 = optimized:  inlining, constant folding, local and global optimizations, including
+                      iterative versions of all algorithms
+   ** Planned optimizations for C1X that C1 does not have:
+      TypeCheckElimination:        remove redundant casts and devirtualize more call sites
+      ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure
+                                   code to deoptimize when bounds checks within loops will fail
+      LoopPeeling:                 replicate the first iteration of a loop
+      LoopUnrolling:               replicate the body of certain shapes of loops
+      LoopInvariantCodeMotion:     move invariant code out of a loop
+      ProfileGuidedInlining:       use receiver method profiles to emit guarded inlines
+      ProfileGuidedBlockLayout:    use profiling information for code placement
+      Peephole:                    peephole optimize backend output
+
+Block Merging
+   ** C1X will replace branches to blocks with a single Goto with a branch to the
+      block's successor, if the blocks cannot be merged otherwise.
+
+Constant Folding / Strength reduction
+   -  C1 had some of its strength reduction logic built into the GraphBuilder because
+      the Canonicalizer could not return multiple instructions.
+   => C1X added this ability, moved the logic to Canonicalizer, and added a few new
+      strength reductions.
+   => C1X should have an interface for doing folding of @FOLD method calls
+   => C1X folds many intrinsic operations that don't have side effects
+   => C1X folds all the basic floating point operations
+   => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant
+   => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer
+      (instead of the backend)
+   ** C1X will be able to run a global sparse conditional constant propagation phase
+      to catch any missed canonicalization opportunities after graph building.
+
+Switches
+   -  C1 did not detect back edges in tableswitch/lookupswitch default branches
+   => C1X does detect these back edges
+   => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer,
+      where it belongs
+
+Inlining
+   -  C1 cannot inline:
+      -  native methods (or their stubs), except some intrinsics
+      -  methods whose class has not been initialized
+      -  methods with unbalanced monitors
+      -  methods with JSRs (this is probably technically possible now)
+
+   -  C1 will not inline:
+      -  methods with exception handlers (optional)
+      -  synchronized methods (optional)
+      -  if the maximum inline depth is reached (default = 9)
+      -  if the maximum recursive inline depth is reached (default = 1)
+      -  if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35)
+      -  constructors for subclasses of Throwable
+      -  if the strictfp-ness of the callee is different than the caller (on x87)
+      -  abstract methods
+      -  synchronized intrinsics
+
+Load/store elimination
+   => C1X may eliminate loads of static fields, which C1 did not
+   => C1X distinguishes loads/stores to different fields in MemoryBuffer
+   => C1X assumes that RiField instances are unique when .isLoaded() is true
+
+Local/Global Value Numbering
+   => C1X improved local load elimination and no longer value numbers fields, reducing the
+      logic necessary in ValueMap, simplifying it and improving its performance.
+   => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer
+      value numbered, the logic to kill values is unnecessary, greatly simplifying
+      GVN.
+   ** A global version of load elimination will compensate for this loss in the future.
+   => C1X value numbers are always or'd with a high order bit when value numbering is possible
+      to prevent value numbering failing if the value number is accidentally 0.
+
+Nullcheck elimination
+   => A new flag, NonNull, indicates instructions that produce values that are guaranteed
+      to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null
+      checks check this flag for their inputs in their constructors, eliminating most
+      redundant null checks immediately, without requiring the NullCheckEliminator to run.
+   => C1X uses a more efficient block ordering for null check elimination. The first pass is
+      optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs,
+      this almost always succeeds, requiring no iteration. Full iterative data flow analysis
+      can be enabled separately. Bitmaps used during the fixpoint calculation are much
+      smaller due to local numbering of instructions (as opposed to global IDs).
+   ** C1X will recognize If's that check against null and propagate the non-nullness across
+      the appropriate branches.
+
+BlockListBuilder
+   -  C1 had a vestigial loop map in BlockListBuilder which was not really used.
+   => C1X does not need to compute a complete loop map in order to do selective phi creation,
+      it builds the "storesInLoops" BitMap in BlockMap.
+
+Types
+   => C1X adds the declared type of method parameters to Local instructions, which
+      may help with devirtualization
+   => C1X makes local 0 of instance methods non-null at the start
+
author	Thomas Wuerthinger <thomas@wuerthinger.net>
date	Wed, 27 Apr 2011 11:50:44 +0200
parents	graal/Compiler/src/com/sun/c1x/doc/differences.txt@9ec15d6914ca
children