Mercurial > hg > graal-jvmci-8
comparison graal/Compiler/src/com/sun/c1x/doc/differences.txt @ 2507:9ec15d6914ca
Pull over of compiler from maxine repository.
author | Thomas Wuerthinger <thomas@wuerthinger.net> |
---|---|
date | Wed, 27 Apr 2011 11:43:22 +0200 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
2506:4a3bf8a5bf41 | 2507:9ec15d6914ca |
---|---|
1 Differences between C1 and C1X, including upgrades and limitations | |
2 (and some general information about C1) | |
3 ====================================================================== | |
4 | |
5 StrictFP: | |
6 - C1X has removed the backend code to deal with the FPU stack, and therefore | |
7 requires SSE2 currently. StrictFP is still tracked in the front end. | |
8 - C1 will not inline methods with different strictfp-ness. C1X does not have this | |
9 limitation because it only targets SSE2 x86 processors. | |
10 | |
11 JSR/RET | |
12 - C1 will bail out if it encounters strange JSR/RET patterns | |
13 - recursive JSRs | |
14 - JSR regions that are shared with non-JSR code | |
15 - RET encountered out of JSR (would not verify) | |
16 | |
17 Exceptions | |
18 - C1 will bailout if the code of an exception handler can be reached via normal | |
19 control flow. | |
20 => C1X might be extended to introduce a phi for the exception | |
21 object in this case. | |
22 - C1 will bailout if an exception handler covers itself | |
23 | |
24 Verification | |
25 - C1 does not rely on bytecode verification having been run. However, if it detects | |
26 type errors in its building the IR graph it will usually bail out. | |
27 - C1 requires a bitmap of the bytecode, where a bit for | |
28 each byte of the bytecode indicates if the bytecode at that location starts a | |
29 basic block. It uses this to construct the basic block list in a single pass. | |
30 => Assertion failures and/or bugs in C1X that cause exceptions to be thrown bail out | |
31 the compilation instead of crashing the VM. | |
32 => C1X's BlockMap does not computes the basic block starts in one pass over the bytecode | |
33 and one pass over the successor lists. | |
34 => C1X computes the "stores in loops" only when loops are encountered in the CFG. | |
35 An option can select conservative mode (all locals stored in all loops) trades | |
36 faster parse speed for fewer optimization opportunities | |
37 => C1X includes an IRChecker that typechecks the entire IR and checks for CFG | |
38 consistency that can be run after each pass. | |
39 | |
40 Constants | |
41 => C1X allows unrestricted use of object constants throughout the code, including | |
42 folding reads of static final fields that reference objects. | |
43 | |
44 Pinning | |
45 => C1X pins fewer instructions than C1 | |
46 ** C1X will eventually allow certain kinds of instructions to float outside the CFG | |
47 and be scheduled with a C2-lite scheduling pass. | |
48 | |
49 Synchronization | |
50 - C1 will refuse to compile methods with unbalanced synchronization. This property is | |
51 computed by the bytecode verifier and supplied to C1. | |
52 ** C1X will not rely on the bytecode verifier to compute this but should do so itself. | |
53 => C1 relied on the backend to generate synchronization code for the root method's | |
54 synchronization operations. C1X inserts code into the start block and generates | |
55 and exception handler to do this explicitly. | |
56 | |
57 Optimizations | |
58 => C1X has many more options to turn on individual passes, parts of passes, approximations, | |
59 etc. It is designed to have three optimization levels: | |
60 0 = super-fast: essentially no optimization | |
61 1 = fast: inlining, constant folding, and local optimizations | |
62 2 = optimized: inlining, constant folding, local and global optimizations, including | |
63 iterative versions of all algorithms | |
64 ** Planned optimizations for C1X that C1 does not have: | |
65 TypeCheckElimination: remove redundant casts and devirtualize more call sites | |
66 ArrayBoundsCheckElimination: remove redundant array bounds checks and/or restructure | |
67 code to deoptimize when bounds checks within loops will fail | |
68 LoopPeeling: replicate the first iteration of a loop | |
69 LoopUnrolling: replicate the body of certain shapes of loops | |
70 LoopInvariantCodeMotion: move invariant code out of a loop | |
71 ProfileGuidedInlining: use receiver method profiles to emit guarded inlines | |
72 ProfileGuidedBlockLayout: use profiling information for code placement | |
73 Peephole: peephole optimize backend output | |
74 | |
75 Block Merging | |
76 ** C1X will replace branches to blocks with a single Goto with a branch to the | |
77 block's successor, if the blocks cannot be merged otherwise. | |
78 | |
79 Constant Folding / Strength reduction | |
80 - C1 had some of its strength reduction logic built into the GraphBuilder because | |
81 the Canonicalizer could not return multiple instructions. | |
82 => C1X added this ability, moved the logic to Canonicalizer, and added a few new | |
83 strength reductions. | |
84 => C1X should have an interface for doing folding of @FOLD method calls | |
85 => C1X folds many intrinsic operations that don't have side effects | |
86 => C1X folds all the basic floating point operations | |
87 => C1X strength reduces (e >> C >> K) to (e >> (C + K)) when C and K are constant | |
88 => Multiplies of power-of-2 constants are reduced to shifts in the canonicalizer | |
89 (instead of the backend) | |
90 ** C1X will be able to run a global sparse conditional constant propagation phase | |
91 to catch any missed canonicalization opportunities after graph building. | |
92 | |
93 Switches | |
94 - C1 did not detect back edges in tableswitch/lookupswitch default branches | |
95 => C1X does detect these back edges | |
96 => C1X moved the canonicalization code of 1 and 2 branch switches to canonicalizer, | |
97 where it belongs | |
98 | |
99 Inlining | |
100 - C1 cannot inline: | |
101 - native methods (or their stubs), except some intrinsics | |
102 - methods whose class has not been initialized | |
103 - methods with unbalanced monitors | |
104 - methods with JSRs (this is probably technically possible now) | |
105 | |
106 - C1 will not inline: | |
107 - methods with exception handlers (optional) | |
108 - synchronized methods (optional) | |
109 - if the maximum inline depth is reached (default = 9) | |
110 - if the maximum recursive inline depth is reached (default = 1) | |
111 - if the callee is larger than the maximum inline size (reduced to 90% at each level, starting at 35) | |
112 - constructors for subclasses of Throwable | |
113 - if the strictfp-ness of the callee is different than the caller (on x87) | |
114 - abstract methods | |
115 - synchronized intrinsics | |
116 | |
117 Load/store elimination | |
118 => C1X may eliminate loads of static fields, which C1 did not | |
119 => C1X distinguishes loads/stores to different fields in MemoryBuffer | |
120 => C1X assumes that RiField instances are unique when .isLoaded() is true | |
121 | |
122 Local/Global Value Numbering | |
123 => C1X improved local load elimination and no longer value numbers fields, reducing the | |
124 logic necessary in ValueMap, simplifying it and improving its performance. | |
125 => C1X reuses the same simplified ValueMap for GVN. Since heap accesses are no longer | |
126 value numbered, the logic to kill values is unnecessary, greatly simplifying | |
127 GVN. | |
128 ** A global version of load elimination will compensate for this loss in the future. | |
129 => C1X value numbers are always or'd with a high order bit when value numbering is possible | |
130 to prevent value numbering failing if the value number is accidentally 0. | |
131 | |
132 Nullcheck elimination | |
133 => A new flag, NonNull, indicates instructions that produce values that are guaranteed | |
134 to be non-null (e.g. NewXXX and Local 0, NullCheck). Instructions that require null | |
135 checks check this flag for their inputs in their constructors, eliminating most | |
136 redundant null checks immediately, without requiring the NullCheckEliminator to run. | |
137 => C1X uses a more efficient block ordering for null check elimination. The first pass is | |
138 optimistic and attempts to visit the blocks in reverse post-order. For acyclic graphs, | |
139 this almost always succeeds, requiring no iteration. Full iterative data flow analysis | |
140 can be enabled separately. Bitmaps used during the fixpoint calculation are much | |
141 smaller due to local numbering of instructions (as opposed to global IDs). | |
142 ** C1X will recognize If's that check against null and propagate the non-nullness across | |
143 the appropriate branches. | |
144 | |
145 BlockListBuilder | |
146 - C1 had a vestigial loop map in BlockListBuilder which was not really used. | |
147 => C1X does not need to compute a complete loop map in order to do selective phi creation, | |
148 it builds the "storesInLoops" BitMap in BlockMap. | |
149 | |
150 Types | |
151 => C1X adds the declared type of method parameters to Local instructions, which | |
152 may help with devirtualization | |
153 => C1X makes local 0 of instance methods non-null at the start | |
154 |