comparison graal/GraalCompiler/src/com/sun/c1x/doc/differences.txt @ 2509:16b9a8b5ad39

Renamings Runtime=>GraalRuntime and Compiler=>GraalCompiler
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Wed, 27 Apr 2011 11:50:44 +0200
parents graal/Compiler/src/com/sun/c1x/doc/differences.txt@9ec15d6914ca
children
comparison
equal deleted inserted replaced
2508:fea94949e0a2 2509:16b9a8b5ad39
1 Differences between C1 and C1X, including upgrades and limitations
2 (and some general information about C1)
3 ======================================================================
4
5 StrictFP:
6 - C1X has removed the backend code to deal with the FPU stack, and therefore
7 requires SSE2 currently. StrictFP is still tracked in the front end.
8 - C1 will not inline methods with different strictfp-ness. C1X does not have this
9 limitation because it only targets SSE2 x86 processors.
10
11 JSR/RET
12 - C1 will bail out if it encounters strange JSR/RET patterns
13 - recursive JSRs
14 - JSR regions that are shared with non-JSR code
15 - RET encountered out of JSR (would not verify)
16
17 Exceptions
18 - C1 will bailout if the code of an exception handler can be reached via normal
19 control flow.
20 => C1X might be extended to introduce a phi for the exception
21 object in this case.
22 - C1 will bailout if an exception handler covers itself
23
24 Verification
25 - C1 does not rely on bytecode verification having been run. However, if it detects
26 type errors in its building the IR graph it will usually bail out.
27 - C1 requires a bitmap of the bytecode, where a bit for
28 each byte of the bytecode indicates if the bytecode at that location starts a
29 basic block. It uses this to construct the basic block list in a single pass.
30 => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out
31 the compilation instead of crashing the VM.
32 => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode
33 and one pass over the successor lists.
34 => C1X computes the "stores in loops" only when loops are encountered in the CFG.
35 An option can select conservative mode (all locals stored in all loops) trades
36 faster parse speed for fewer optimization opportunities
37 => C1X includes an IRChecker that typechecks the entire IR and checks for CFG
38 consistency that can be run after each pass.
39
40 Constants
41 => C1X allows unrestricted use of object constants throughout the code, including
42 folding reads of static final fields that reference objects.
43
44 Pinning
45 => C1X pins fewer instructions than C1
46 ** C1X will eventually allow certain kinds of instructions to float outside the CFG
47 and be scheduled with a C2-lite scheduling pass.
48
49 Synchronization
50 - C1 will refuse to compile methods with unbalanced synchronization. This property is
51 computed by the bytecode verifier and supplied to C1.
52 ** C1X will not rely on the bytecode verifier to compute this but should do so itself.
53 => C1 relied on the backend to generate synchronization code for the root method's
54 synchronization operations. C1X inserts code into the start block and generates
55 and exception handler to do this explicitly.
56
57 Optimizations
58 => C1X has many more options to turn on individual passes, parts of passes, approximations,
59 etc. It is designed to have three optimization levels:
60 0 = super-fast: essentially no optimization
61 1 = fast: inlining, constant folding, and local optimizations
62 2 = optimized: inlining, constant folding, local and global optimizations, including
63 iterative versions of all algorithms
64 ** Planned optimizations for C1X that C1 does not have:
65 TypeCheckElimination: remove redundant casts and devirtualize more call sites
66 ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure
67 code to deoptimize when bounds checks within loops will fail
68 LoopPeeling: replicate the first iteration of a loop
69 LoopUnrolling: replicate the body of certain shapes of loops
70 LoopInvariantCodeMotion: move invariant code out of a loop
71 ProfileGuidedInlining: use receiver method profiles to emit guarded inlines
72 ProfileGuidedBlockLayout: use profiling information for code placement
73 Peephole: peephole optimize backend output
74
75 Block Merging
76 ** C1X will replace branches to blocks with a single Goto with a branch to the
77 block's successor, if the blocks cannot be merged otherwise.
78
79 Constant Folding / Strength reduction
80 - C1 had some of its strength reduction logic built into the GraphBuilder because
81 the Canonicalizer could not return multiple instructions.
82 => C1X added this ability, moved the logic to Canonicalizer, and added a few new
83 strength reductions.
84 => C1X should have an interface for doing folding of @FOLD method calls
85 => C1X folds many intrinsic operations that don't have side effects
86 => C1X folds all the basic floating point operations
87 => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant
88 => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer
89 (instead of the backend)
90 ** C1X will be able to run a global sparse conditional constant propagation phase
91 to catch any missed canonicalization opportunities after graph building.
92
93 Switches
94 - C1 did not detect back edges in tableswitch/lookupswitch default branches
95 => C1X does detect these back edges
96 => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer,
97 where it belongs
98
99 Inlining
100 - C1 cannot inline:
101 - native methods (or their stubs), except some intrinsics
102 - methods whose class has not been initialized
103 - methods with unbalanced monitors
104 - methods with JSRs (this is probably technically possible now)
105
106 - C1 will not inline:
107 - methods with exception handlers (optional)
108 - synchronized methods (optional)
109 - if the maximum inline depth is reached (default = 9)
110 - if the maximum recursive inline depth is reached (default = 1)
111 - if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35)
112 - constructors for subclasses of Throwable
113 - if the strictfp-ness of the callee is different than the caller (on x87)
114 - abstract methods
115 - synchronized intrinsics
116
117 Load/store elimination
118 => C1X may eliminate loads of static fields, which C1 did not
119 => C1X distinguishes loads/stores to different fields in MemoryBuffer
120 => C1X assumes that RiField instances are unique when .isLoaded() is true
121
122 Local/Global Value Numbering
123 => C1X improved local load elimination and no longer value numbers fields, reducing the
124 logic necessary in ValueMap, simplifying it and improving its performance.
125 => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer
126 value numbered, the logic to kill values is unnecessary, greatly simplifying
127 GVN.
128 ** A global version of load elimination will compensate for this loss in the future.
129 => C1X value numbers are always or'd with a high order bit when value numbering is possible
130 to prevent value numbering failing if the value number is accidentally 0.
131
132 Nullcheck elimination
133 => A new flag, NonNull, indicates instructions that produce values that are guaranteed
134 to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null
135 checks check this flag for their inputs in their constructors, eliminating most
136 redundant null checks immediately, without requiring the NullCheckEliminator to run.
137 => C1X uses a more efficient block ordering for null check elimination. The first pass is
138 optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs,
139 this almost always succeeds, requiring no iteration. Full iterative data flow analysis
140 can be enabled separately. Bitmaps used during the fixpoint calculation are much
141 smaller due to local numbering of instructions (as opposed to global IDs).
142 ** C1X will recognize If's that check against null and propagate the non-nullness across
143 the appropriate branches.
144
145 BlockListBuilder
146 - C1 had a vestigial loop map in BlockListBuilder which was not really used.
147 => C1X does not need to compute a complete loop map in order to do selective phi creation,
148 it builds the "storesInLoops" BitMap in BlockMap.
149
150 Types
151 => C1X adds the declared type of method parameters to Local instructions, which
152 may help with devirtualization
153 => C1X makes local 0 of instance methods non-null at the start
154