Mercurial > hg > truffle
comparison graal/com.oracle.max.graal.doc.initial/graal_compiler.tex @ 2892:5005a5607506
Moved design doc to graal project directory.
author | Thomas Wuerthinger <thomas@wuerthinger.net> |
---|---|
date | Wed, 08 Jun 2011 14:06:17 +0200 |
parents | doc/design/graal_compiler.tex@3396862d4cee |
children | 5d4aa5672d3d |
comparison
equal
deleted
inserted
replaced
2891:75a99b4f1c98 | 2892:5005a5607506 |
---|---|
1 \documentclass[twocolumn]{svjour3} | |
2 \usepackage{listings} | |
3 \usepackage[pdftex]{graphicx} | |
4 \usepackage{environ} | |
5 \usepackage{amsmath} | |
6 \usepackage{amsfonts} | |
7 \usepackage[english]{babel} | |
8 \usepackage[utf8]{inputenc} | |
9 \usepackage{lmodern} | |
10 \usepackage[T1]{fontenc} | |
11 \usepackage{color} | |
12 | |
13 \input{graphdrawing} | |
14 | |
15 \renewcommand*\descriptionlabel[1]{\hspace\labelsep\normalfont\bf #1} | |
16 | |
17 \newcommand{\Sa}{{\Large$^*$}} | |
18 \newcommand{\Sb}{{\Large$^\dag$}} | |
19 \newcommand{\Sc}{{\Large$^\S$}} | |
20 | |
21 | |
22 \newcommand{\mynote}[2]{ | |
23 \textcolor{red}{\fbox{\bfseries\sffamily\scriptsize#1} | |
24 {\small\textsf{\emph{#2}}} | |
25 \fbox{\bfseries\sffamily\scriptsize }}} | |
26 | |
27 \newcommand\TODO[1]{\mynote{TODO}{#1}} | |
28 \newcommand\cw[1]{\mynote{CW}{#1}} | |
29 \newcommand\ls[1]{\mynote{LS}{#1}} | |
30 \newcommand\nodename[1]{\texttt{#1}} | |
31 | |
32 | |
33 | |
34 \smartqed % flush right qed marks, e.g. at end of proof | |
35 | |
36 \journalname{Graal Compiler Design} | |
37 \def\makeheadbox{{% | |
38 \hbox to0pt{\vbox{\baselineskip=10dd\hrule\hbox | |
39 to\hsize{\vrule\kern3pt\vbox{\kern3pt | |
40 \hbox{\bfseries The Graal Compiler - Design and Strategy} | |
41 \kern3pt}\hfil\kern3pt\vrule}\hrule}% | |
42 \hss}}} | |
43 | |
44 \begin{document} | |
45 | |
46 \author{Thomas W\"{u}rthinger \Sa, Lukas Stadler \Sc, Gilles Duboscq \Sa} | |
47 \institute{\Sa Oracle, \Sc Johannes Kepler University, Linz} | |
48 | |
49 \date{Created: \today} | |
50 | |
51 \title{The Graal Compiler} | |
52 \subtitle{Design and Strategy \\ \textcolor{red}{work in progress (Oracle internal)}} | |
53 | |
54 \maketitle | |
55 | |
56 \abstract{ | |
57 The Graal compiler (simply referred to as \emph{the compiler} in the rest of this document) aims at improving upon C1X, the Java port of the HotSpot client compiler, both in terms of modularity and peak performance. | |
58 The compiler should work with the Maxine VM and the HotSpot VM. | |
59 This document contains information about the proposed design and strategy for developing the compiler.} | |
60 | |
61 \section{Context} | |
62 | |
63 In 2009, the Maxine team started with creating C1X, a Java port of the HotSpot client compiler, and integrated it into the Maxine VM. | |
64 Part of this effort was the development of a clear and clean compiler-runtime interface that allows the separation of the compiler and the VM. | |
65 This compiler-runtime interface enables the use of one compiler for multiple VMs. | |
66 In June 2010, we started integrating C1X into the HotSpot VM and we called the resulting system Graal~VM. | |
67 Currently, the Graal~VM is fully functional and runs benchmarks (SciMark, DaCapo) at a similar speed as the HotSpot client compiler. | |
68 | |
69 \section{Goals} | |
70 The compiler effort aims at rewriting the high-level intermediate representation of C1X with two main goals: | |
71 \begin{description} | |
72 \item[Modularity:] A modular design of the compiler should simplify the implementation of new languages, new back-ends, and new optimizations. | |
73 \item[Peak Performance:] A more powerful intermediate representation should enable the implementation of aggressive optimizations that impact the peak performance of the resulting machine code. | |
74 \end{description} | |
75 | |
76 \section{Design} | |
77 For the implementation of the compiler, we rely on the following design decisions: | |
78 \begin{description} | |
79 \item[Graph Representation:] | |
80 The compiler's intermediate representation is modeled as a graph with nodes that are connected with directed edges. | |
81 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime. | |
82 Every node is serializable and has an id that is unique within its graph. | |
83 Every edge is classified as either a control flow edge (anti-dependency) or a data flow edge (dependency) and represented as a simple pointer from the source node to the target node. | |
84 It is possible to replace a node with another node without traversing the full graph. | |
85 The graph does not allow data flow edge cycles or control flow edge cycles. | |
86 We achieve this by explicitly modeling loops (see Section~\ref{sec:loops}). | |
87 \item[Extensibility:] | |
88 The compiler is extensible by allowing developers to add new compiler phases and new node subclasses without modifying the compiler's sources. | |
89 A node has an abstract way of expressing its semantics and new compiler phases can ask compiler nodes for their properties and capabilities. | |
90 We use the ``everything is an extension'' concept. | |
91 Even standard compiler optimizations are internally modeled as extensions, to show that the extension mechanism exposes all necessary functionality. | |
92 \item[Detailing:] | |
93 The compilation starts with a graph that contains nodes that represent the operations of the source language (e.g., one node for an array store to an object array). | |
94 During the compilation, the nodes are replaced with more detailed nodes (e.g., the array store node is split into a null check, a bounds check, a store check, and a memory access). | |
95 Compiler phases can choose whether they want to work on the earlier versions of the graph (e.g., escape analysis) or on later versions (e.g., null check elimination). | |
96 \item[Generality:] | |
97 The compiler does not require Java as its input. | |
98 This is achieved by having a graph as the starting point of the compilation and not a Java bytecode array. | |
99 Building the graph from the Java bytecodes must happen before giving a method to the compiler. | |
100 This enables front-ends for different languages (e.g., Ruby or JavaScript) to provide their own graph. | |
101 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase. | |
102 \end{description} | |
103 | |
104 \section{Milestones} | |
105 \label{sec:mile} | |
106 The compiler is being developed starting from the current C1X source code base. | |
107 This helps us test the compiler at every intermediate development step on a variety of Java benchmarks. | |
108 We define the following development milestones (see Section~\ref{sec:conclusions} for planned dates): | |
109 \begin{description} | |
110 \item[M1:] We have a fully working Graal~VM version with a stripped down C1X compiler that does not perform any optimization. | |
111 \item[M2:] We modified the high-level intermediate representation to be based on the compiler graph data structure. | |
112 \item[M3:] We have reimplemented and reenabled compiler optimizations in the compiler that previously existed in C1X. | |
113 \item[M4:] We have reintegrated the new compiler into the Maxine VM and can use it as a Maxine VM bootstrapping compiler. | |
114 \end{description} | |
115 | |
116 After those four milestones, we see three different possible further development directions that can be followed in parallel: | |
117 \begin{itemize} | |
118 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the compiler graph. | |
119 \item Improvements for peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback). | |
120 \item Implementation of a prototype front-end for a different language, e.g., JavaScript. | |
121 \end{itemize} | |
122 | |
123 \section{Project Source Structure} | |
124 In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.max.graal}). | |
125 | |
126 \begin{description} | |
127 \item[graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes. | |
128 \item[nodes] contains the implementation of known basic nodes (e.g., phi nodes, control flow nodes, \ldots). | |
129 Additional node classes should go into separate projects and be specializations of the known basic nodes. | |
130 \item[java] contains code for building graphs from Java bytecodes and Java-specific nodes. | |
131 \item[opt] contains optimizations such as global value numbering or conditional constant propagation. | |
132 \item[compiler] contains the compiler, including: | |
133 \begin{itemize} | |
134 \item Scheduling of the compilation phases. | |
135 \item Implementation of the \emph{compiler interface} (CI). | |
136 \item Implementation of the final compilation phase that produces the low-level representation. | |
137 \item Machine code creation, including debug info. | |
138 \end{itemize} | |
139 \end{description} | |
140 | |
141 | |
142 \section{Graph} | |
143 | |
144 The \emph{intermediate representation}~(IR) of the compiler is designed as a directed graph. | |
145 The graph allocates unique ids for new nodes and can be queried for the node corresponding to a given id as well as for an unordered list of nodes of the graph. | |
146 Graphs can manage side data structures (e.g. dominator trees and temporary schedules), which will be automatically invalidated and lazily recomputed whenever the graph changes. These side data structures will usually be understood by more than one optimization. | |
147 | |
148 The nodes of the graph have the following properties: | |
149 \begin{itemize} | |
150 \item Each node is always associated with a single graph and this association is immutable. | |
151 \item Each node has an immutable id that is unique within its associated graph. | |
152 \item Nodes can have a data dependency, which means that one node requires the result of another node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes. | |
153 \item Nodes can have a control flow dependency, which means that the execution of one node will be followed by the execution of another node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes. | |
154 \item Nodes can only have data and control dependencies to nodes which belong to the same graph. | |
155 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards in a drawing of the graph. Situations that normally incur cycles (like loops) are represented by special nodes (see Section~\ref{sec:loops}). | |
156 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. This gives the compiler flexibility for the possible scheduling of a node and therefore wiggle room for optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated. | |
157 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions (see Figure~\ref{fig:directions}): | |
158 \begin{itemize} | |
159 \item \emph{inputs} are all nodes that this node has data dependencies on. | |
160 \item \emph{usages} are all nodes whose inputs contain this node. | |
161 \item \emph{successors} are all nodes that have to be after this node in control flow. | |
162 \item \emph{predecessors} are all nodes whose successors contain this node. | |
163 \end{itemize} | |
164 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors. | |
165 \item Every node must be able to support cloning and serialization. | |
166 \item The edges of a node also define \textit{happens-before} and \textit{happens-after} relationships as shown in Figure~\ref{fig:directions}. | |
167 \end{itemize} | |
168 | |
169 \begin{figure}[h] | |
170 \centering | |
171 \begin{digraphenv}{scale=0.5}{graphdirections} | |
172 \node{node1}{Node} | |
173 \textnode{inputs}{inputs} | |
174 \textnode{usages}{usages} | |
175 \textnode{successors}{successors} | |
176 \textnode{predecessors}{predecessors} | |
177 \data{node1}{inputs} | |
178 \control{node1}{successors} | |
179 \data{usages}{node1} | |
180 \control{predecessors}{node1} | |
181 \node{node2}{Node} | |
182 \textnode{before}{happens-before} | |
183 \textnode{after}{happens-after} | |
184 \data{node2}{before} | |
185 \control{node2}{after} | |
186 \data{after}{node2} | |
187 \control{before}{node2} | |
188 \end{digraphenv} | |
189 \caption{A node and its edges.} | |
190 \label{fig:directions} | |
191 \end{figure} | |
192 | |
193 \subsection{Inlining} | |
194 Inlining is always performed by embedding one graph into another graph. | |
195 Nodes cannot be reassigned to another graph, they are cloned instead. | |
196 Therefore, inlining is performed by copying and rewiring the nodes of the inlined method into the graph of the outer method. | |
197 While the copying will decrease compilation performance, it enables us to cache the graph for the inlined method, optimize it independently from the outer method, and use the optimized graph for subsequent inlinings. | |
198 We do not expect a significant negative impact on overall compilation performance. | |
199 | |
200 We are able to perform the inlining at any point during the compilation of a method and can therefore selectively expand the inlining if a certain optimization turns out to depend on the inlining of a method. | |
201 An example for this would be when the escape analysis finds out that a certain object only escapes because of one method call and this method call is not inlined, because the penalty was to high. | |
202 In this case, we can chose to nevertheless inline the method in order to increase the chances for finding out that the object does not escape. | |
203 | |
204 \section{Control Flow} | |
205 | |
206 Control flow is managed in way where the predecessor node contains direct pointers to its successor nodes. | |
207 We reserve the term \textit{instruction} for nodes that are embedded in the control flow. | |
208 This is opposite to the approach taken in the server compiler, where control flow and data flow edges point in the same direction. | |
209 The advantage that we see in our approach is that there is no need for projection nodes in case of control flow splits. | |
210 An \texttt{If} instruction can directly point to its true and false successors without any intermediate nodes. | |
211 This makes the graph more compact and simplifies graph traversal. | |
212 | |
213 Listing~\ref{lst:cfg2} shows an example Java program with an if statement where both paths do not contain any instruction with side effects. | |
214 The \texttt{If} instruction can directly point its true and false successors to a \texttt{Merge} instruction. | |
215 A \texttt{Phi} node that selects the appropriate value is appended to the \texttt{Merge} instruction. | |
216 The \texttt{Return} instruction then has a data dependency on the \texttt{Phi} node. | |
217 | |
218 \begin{lstlisting}[label=lst:cfg2, caption=Control flow in the graph., captionpos=b] | |
219 if (condition) { return 0; } | |
220 else { return 1; } | |
221 \end{lstlisting} | |
222 | |
223 \begin{figure}[h] | |
224 \centering | |
225 \begin{digraphenv}{scale=0.5}{cfg2} | |
226 \textnode{entry}{Entry} | |
227 \textnode{condition}{condition} | |
228 \textnode{const0}{0} | |
229 \textnode{const1}{1} | |
230 \nodesplit{if}{If} | |
231 \control{entry}{if} | |
232 \controllabel{if:succ1}{merge} | |
233 \controllabel{if:succ2}{merge} | |
234 \data{if}{condition} | |
235 \node{merge}{Merge} | |
236 \node{return}{Return} | |
237 \nodetri{phi}{Phi} | |
238 \datalabel{phi:in1}{merge} | |
239 \datalabel{phi:in2}{const0} | |
240 \datalabel{phi:in3}{const1} | |
241 \data{return}{phi} | |
242 \control{merge}{return} | |
243 \end{digraphenv} | |
244 \caption{A simple loop with two exits.} | |
245 \label{fig:exc1} | |
246 \end{figure} | |
247 | |
248 \section{Exceptions} | |
249 \label{sec:Exceptions} | |
250 | |
251 We do not throw runtime exceptions (e.g., \texttt{IndexOutOf\-BoundsException}, \texttt{NullPointerException}, or \texttt{Out\-Of\-MemoryException}), but deoptimize instead. | |
252 This reduces the places in the compiled code where an exact bytecode location and debug information must be known. | |
253 Additionally, this greatly reduces the number of exception handler edges in the compiled code. | |
254 The main advantage of this technique is however, that we are free in moving around bounds checks, memory allocation, memory accesses with implicit null checks, etc. | |
255 | |
256 There are only two kinds of instruction that need explicit exception edges, because they are the only instructions that can throw exceptions in compiled code: \texttt{Throw} instructions and \texttt{Invoke} instructions. | |
257 They are modelled as instructions with an additional control flow continuation that points to an \texttt{ExceptionDispatch} instruction. | |
258 The exception dispatch instruction decides based on the type of the exception object whether the control should flow to the catch handler or to another exception dispatch. | |
259 If there is no catch handler in the currently compiled method, then the control flows into the \texttt{Unwind} instruction that handles the exception by forwarding it to the caller. | |
260 Listing~\ref{lst:exc1} shows an example Java program with nested try blocks and Figure \ref{fig:exc1} shows the corresponding compiler graph. | |
261 | |
262 \begin{lstlisting}[label=lst:exc1, caption=Exception dispatch in the compiler graph., captionpos=b] | |
263 try { m1(); | |
264 try { m2(); | |
265 } catch(ExtendedException e) { ... } | |
266 m3(); | |
267 throw exception; | |
268 } catch(Exception e) { ... } | |
269 \end{lstlisting} | |
270 | |
271 \begin{figure}[h] | |
272 \centering | |
273 \begin{digraphenv}{scale=0.5}{exc1} | |
274 \textnode{entry}{Entry} | |
275 \textnode{catch1}{catch1} | |
276 \textnode{catch2}{catch2} | |
277 \nodesplit{m1}{Invoke m1} | |
278 \nodesplit{m2}{Invoke m2} | |
279 \nodesplit{m3}{Invoke m3} | |
280 \nodesplit{dispatch1}{ExceptionDispatch} | |
281 \nodesplit{dispatch2}{ExceptionDispatch} | |
282 \node{throw}{Throw} | |
283 \node{unwind}{Unwind} | |
284 \control{entry}{m1} | |
285 \controllabel{m1:succ1}{m2} | |
286 \controllabel{m1:succ2}{dispatch2} | |
287 \controllabel{m2:succ1}{m3} | |
288 \controllabel{m2:succ2}{dispatch1} | |
289 \controllabel{m3:succ1}{throw} | |
290 \controllabel{m3:succ2}{dispatch2} | |
291 \control{throw}{dispatch2} | |
292 \controllabel{dispatch1:succ2}{catch1} | |
293 \controllabel{dispatch1:succ1}{dispatch2} | |
294 \controllabel{dispatch2:succ2}{catch2} | |
295 \controllabel{dispatch2:succ1}{unwind} | |
296 \end{digraphenv} | |
297 \caption{A simple loop with two exits.} | |
298 \label{fig:exc1} | |
299 \end{figure} | |
300 | |
301 \section{Loops} | |
302 \label{sec:loops} | |
303 Loops form a first-class construct in the IR that is expressed by specialized IR nodes during all optimization phases. | |
304 We only compile methods with a control flow where every loop has a single entry point. | |
305 This entry point is a \nodename{LoopBegin} instruction. | |
306 This instruction is connected to a \nodename{LoopEnd} instruction that merges all control flow paths that do not exit the loop. | |
307 The edge between the \nodename{LoopBegin} and the \nodename{LoopEnd} is the backedge of the loop. | |
308 It goes from the beginning to the end in order to make the graph acyclic. | |
309 An algorithm that traverses the control flow has to explicitely decide whether it wants to incorporate backedges (i.e., special case of the treatment of \nodename{LoopEnd}) or ignore them. | |
310 Figure \ref{fig:loop1} shows a simple example with a loop with a single entry and two exits. | |
311 | |
312 \begin{figure}[h] | |
313 \centering | |
314 \begin{digraphenv}{scale=0.5}{layout1} | |
315 \textnode{BeforeLoop}{Loop entry} | |
316 \textnode{Exit1}{First loop exit} | |
317 \textnode{Exit2}{Second loop exit} | |
318 \nodesplit{LoopBegin}{LoopBegin} | |
319 \node{LoopEnd}{LoopEnd} | |
320 \nodesplit{If1}{If} | |
321 \nodesplit{If2}{If} | |
322 \controllabel{LoopBegin:succ1}{LoopEnd} | |
323 \controllabel{LoopBegin:succ2}{If1} | |
324 \controllabel{If1:succ1}{If2} | |
325 \controllabel{If2:succ1}{LoopEnd} | |
326 \controllabel{BeforeLoop}{LoopBegin} | |
327 \controllabel{If1:succ2}{Exit1} | |
328 \controllabel{If2:succ2}{Exit2} | |
329 \end{digraphenv} | |
330 \caption{A simple loop with two exits.} | |
331 \label{fig:loop1} | |
332 \end{figure} | |
333 | |
334 \subsection{Loop Phis} | |
335 Data flow in loops is modeled with special phi nodes at the beginning and the end of the loop. | |
336 The \nodename{LoopEnd} instruction merges every value that flows into the next loop iteration in associated \nodename{LoopEndPhi} nodes. | |
337 A corresponding \nodename{LoopBeginPhi} node that is associated with the loop header has a control flow dependency on the \nodename{LoopEndPhi} node. | |
338 Listing~\ref{lst:loop} shows a simple counting loop that is used as an example in the rest of this section. | |
339 Figure~\ref{fig:loop2} shows how the loop is modelled immediately after building the graph. | |
340 | |
341 \begin{lstlisting}[label=lst:loop, caption=Loop example that counts from 0 to n-1., captionpos=b] | |
342 for(int i=0; i<n; ++i) { } | |
343 \end{lstlisting} | |
344 | |
345 \begin{figure}[h] | |
346 \centering | |
347 \begin{digraphenv}{scale=0.5}{layout2} | |
348 \textnode{BeforeLoop}{Loop entry} | |
349 \textnode{Exit}{Loop exit} | |
350 \textnode{n}{n} | |
351 \textnode{Constant0}{0} | |
352 \textnode{Constant1}{1} | |
353 \nodesplit{LoopBegin}{LoopBegin} | |
354 \node{LoopEnd}{LoopEnd} | |
355 \nodesplit{If1}{If} | |
356 \controllabel{LoopBegin:succ1}{LoopEnd} | |
357 \controllabel{LoopBegin:succ2}{If1} | |
358 \nodebi{Compare}{<} | |
359 \nodebi{LoopBeginPhi}{LoopBeginPhi} | |
360 \nodebi{Add}{+} | |
361 \datalabel{Add:in1}{LoopBeginPhi} | |
362 \datalabel{Add:in2}{Constant1} | |
363 \nodebi{LoopEndPhi}{LoopEndPhi} | |
364 \control{LoopBeginPhi}{LoopEndPhi} | |
365 \data{LoopEndPhi:in1}{LoopEnd} | |
366 \data{LoopEndPhi:in2}{Add} | |
367 \datalabel{LoopBeginPhi:in1}{LoopBegin} | |
368 \datalabel{LoopBeginPhi:in2}{Constant0} | |
369 \datalabel{Compare:in1}{LoopBeginPhi} | |
370 \datalabel{Compare:in2}{n} | |
371 \data{If1}{Compare} | |
372 \controllabel{If1:succ1}{LoopEnd} | |
373 \controllabel{BeforeLoop}{LoopBegin} | |
374 \controllabel{If1:succ2}{Exit} | |
375 \end{digraphenv} | |
376 \caption{Graph for a loop counting from 0 to n-1.} | |
377 \label{fig:loop2} | |
378 \end{figure} | |
379 | |
380 \subsection{Loop Counters} | |
381 The compiler is capable of recognizing variables that are only increased within a loop. | |
382 A potential overflow of such a variable is prohibited with a guard before the loop (this is not necessary in this example, because the loop variable cannot overflow). | |
383 Figure \ref{fig:loop3} shows the compiler graph of the example loop after the loop counter transformation. | |
384 | |
385 | |
386 \begin{figure}[h] | |
387 \centering | |
388 \begin{digraphenv}{scale=0.5}{layout3} | |
389 \textnode{BeforeLoop}{Loop entry} | |
390 \textnode{Exit}{Loop exit} | |
391 \textnode{n}{n} | |
392 \textnode{Constant0}{0} | |
393 \textnode{Constant1}{1} | |
394 \nodesplit{LoopBegin}{LoopBegin} | |
395 \node{LoopEnd}{LoopEnd} | |
396 \nodesplit{If1}{If} | |
397 \controllabel{LoopBegin:succ1}{LoopEnd} | |
398 \controllabel{LoopBegin:succ2}{If1} | |
399 \nodebi{Compare}{<} | |
400 \nodetri{LoopCounter}{LoopCounter} | |
401 \datalabel{LoopCounter:in1}{LoopBegin} | |
402 \datalabeltext{LoopCounter:in2}{Constant0}{init} | |
403 \datalabeltext{LoopCounter:in3}{Constant1}{stride} | |
404 \datalabel{Compare:in1}{LoopCounter} | |
405 \datalabel{Compare:in2}{n} | |
406 \data{If1}{Compare} | |
407 \controllabel{If1:succ1}{LoopEnd} | |
408 \controllabel{BeforeLoop}{LoopBegin} | |
409 \controllabel{If1:succ2}{Exit} | |
410 \end{digraphenv} | |
411 \caption{Graph after loop counter transformation.} | |
412 \label{fig:loop3} | |
413 \end{figure} | |
414 | |
415 \subsection{Bounded Loops} | |
416 | |
417 If the total maximum number of iterations of a loop is fixed, then the loop is converted into a bounded loop. | |
418 The total number of iterations always denotes the number of full iterations of the loop with the control flowing from the loop begin to the loop end. | |
419 If the total number of iterations is reached, the loop is exited directly from the loop header. | |
420 In the example, we can infer from the loop exit with the comparison on the loop counter that the total number of iterations of the loop is limited to n. | |
421 Figure \ref{fig:loop4} shows the compiler graph of the example loop after the bounded loop transformation. | |
422 | |
423 \begin{figure}[h] | |
424 \centering | |
425 \begin{digraphenv}{scale=0.5}{layout4} | |
426 \textnode{BeforeLoop}{Loop entry} | |
427 \textnode{Exit}{Loop exit} | |
428 \textnode{n}{n} | |
429 \textnode{Constant0}{0} | |
430 \textnode{Constant1}{1} | |
431 \nodesplittri{LoopBegin}{BoundedLoopBegin} | |
432 \node{LoopEnd}{LoopEnd} | |
433 \controllabel{LoopBegin:succ1}{LoopEnd} | |
434 \controllabel{LoopBegin:succ2}{LoopEnd} | |
435 \controllabel{LoopBegin:succ3}{Exit} | |
436 \nodetri{LoopCounter}{LoopCounter} | |
437 \datalabel{LoopCounter:in1}{LoopBegin} | |
438 \datalabeltext{LoopCounter:in2}{Constant0}{init} | |
439 \datalabeltext{LoopCounter:in3}{Constant1}{stride} | |
440 \data{LoopBegin}{n} | |
441 \controllabel{BeforeLoop}{LoopBegin} | |
442 \end{digraphenv} | |
443 \caption{Graph after bounded loop transformation.} | |
444 \label{fig:loop4} | |
445 \end{figure} | |
446 | |
447 \subsection{Vectorization} | |
448 | |
449 If we have now a bounded loop with no additional loop exit and no associated phi nodes (only associated loop counters), we can vectorize the loop. | |
450 We replace the loop header with a normal instruction that produces a vector of values from 0 to the number of loop iterations minus 1. | |
451 The loop counters are replaced with \texttt{VectorAdd} and \texttt{VectorMul} nodes. | |
452 The vectorization is only possible if every node of the loop can be replaced with a corresponding vector node. | |
453 Figure \ref{fig:loop5} shows the compiler graph of the example loop after vectorization. | |
454 The vector nodes all work on an ordered list of integer values and are subject to canonicalization and global value numbering like any other node. | |
455 | |
456 | |
457 \begin{figure}[h] | |
458 \centering | |
459 \begin{digraphenv}{scale=0.5}{layout5} | |
460 \textnode{Entry}{Entry} | |
461 \textnode{Exit}{Exit} | |
462 \textnode{n}{n} | |
463 \textnode{Constant0}{0} | |
464 \textnode{Constant1}{1} | |
465 \node{Vector}{Vector} | |
466 \nodebi{VectorAdd}{VectorAdd} | |
467 \nodebi{VectorMul}{VectorMul} | |
468 \control{Entry}{Vector} | |
469 \control{Vector}{Exit} | |
470 \datalabel{VectorAdd:in1}{Vector} | |
471 \datalabel{VectorAdd:in2}{Constant0} | |
472 \datalabel{VectorMul:in1}{VectorAdd} | |
473 \datalabel{VectorMul:in2}{Constant1} | |
474 \data{Vector}{n} | |
475 \end{digraphenv} | |
476 \caption{Graph after vectorization.} | |
477 \label{fig:loop5} | |
478 \end{figure} | |
479 | |
480 | |
481 \section{Frame States} | |
482 A frame state captures the state of the program like it is seen in by an interpreter of the program. | |
483 The frame state contains the information that is local to the current activation and will therefore disappear during SSA-form constructions or other compiler optimizations. | |
484 For Java, the frame state is defined in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors). | |
485 However, a frame state is not a concept specific to Java (e.g., the Crankshaft JavaScript engine uses frame states in their optimizing compiler to model the values of the AST interpreter). | |
486 | |
487 Frame states are necessary to support the deoptimization of the program, which is the precondition for performing aggressive optimizations that use optimistic assumptions. | |
488 Therefore every point in the optimizing compiler that may revert execution back to the interpreter needs a valid frame state. | |
489 However, the point where the interpreter continues execution need not correspond exactly to the execution position of the compiled code, because many Java bytecode instructions can be safely reexecuted. | |
490 Thus, frame states need only be generated for the states after instructions that cannot be reexecuted, because they modify the state of the program. | |
491 Examples for such instructions are: | |
492 | |
493 \begin{itemize} | |
494 \item Array stores (in Java bytecodes {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE}) | |
495 \item Field stores (in Java bytecodes {\tt PUTSTATIC, PUTFIELD}) | |
496 \item Method calls (in Java bytecodes {\tt INVOKEVIRTUAL, INVOKESPECIAL, \\INVOKESTATIC, INVOKEINTERFACE}) | |
497 \item Synchronization (in Java bytecodes {\tt MONITORENTER, MONITOREXIT}) | |
498 \end{itemize} | |
499 | |
500 Within the graph a frame state is represented as a node that is attached to the instruction that caused it to be generated using a control dependency (see Figure~\ref{fig:fs1}). | |
501 Frame states also have data dependencies on the contents of the state: the local variables and the expression stack. | |
502 | |
503 The frame state at the method beginning does not have to be explicitely in the graph, because it can always be reconstructed at a later stage. | |
504 We save the frame state at control flow merges if there is at least one frame state on any control flow path between a node and its immediate dominator. | |
505 | |
506 | |
507 \begin{figure}[h] | |
508 \centering | |
509 \begin{digraphenv}{scale=0.5}{fs1} | |
510 \nodetrisplit{store1}{ArrayStore} | |
511 \nodebi{load1}{ArrayLoad} | |
512 \controllabel{store1:succ1}{load1} | |
513 \nodetrisplit{store2}{FieldStore} | |
514 \control{load1}{store2} | |
515 end [shape=plaintext, label="...", width="2.0"] | |
516 store2:succ1:s -> end:n [color=red]; | |
517 % | |
518 \nodeframestate{fs1}{FrameState} | |
519 \controllabel{store1:succ2}{fs1} | |
520 \nodeframestate{fs2}{FrameState} | |
521 \controllabel{store2:succ2}{fs2} | |
522 \end{digraphenv} | |
523 \caption{Simple example using two frame states.} | |
524 \label{fig:fs1} | |
525 \end{figure} | |
526 | |
527 | |
528 A deoptimization node needs a valid frame state that specifies the location and state where the interpreter should continue. | |
529 The algorithm for constructing frame states makes sure that every possible location in the graph has a well-defined frame state that can be used by a deoptimization instruction. | |
530 Therefore, there are no direct links between the deoptimization instruction and its frame state thus allowing the deoptimization instructions to move freely around. | |
531 | |
532 \subsection{Partial Escape Analysis} | |
533 | |
534 A partial escape analysis can help to further reduce the number of frame states. | |
535 A field or array store does not create a new frame state, when the object that is modified did not have a chance to escape between its creation and the store. | |
536 | |
537 Listing~\ref{lst:escape1} shows an example of a method that creates two \texttt{Point} objects, connects them, and returns them. | |
538 The object allocation of the first \texttt{Point} object does not need a frame state. | |
539 We can always reexecute the \texttt{NEW} bytecode again in the interpreter. | |
540 The \texttt{Point} object allocated by the compiler will then simply disappear after the next garbage collection. | |
541 The following field store is a thread-local memory store, because the \texttt{Point} object did not have any chance to escape. | |
542 Same applies to the assignment of the \texttt{next} field and the third field assignment. | |
543 Therefore, the whole method \texttt{getPoint} does not need an explicit frame state, because at any time during execution of this method, we can deoptimize and continue execution in the interpreter at the first bytecode of the method. | |
544 | |
545 \begin{lstlisting}[label=lst:escape1, caption=Example method that needs no frame state., captionpos=b] | |
546 void getPoint() { | |
547 Point p = new Point(); | |
548 p.x = 1; | |
549 p.next = new Point(); | |
550 p.next.x = 2; | |
551 return p; | |
552 } | |
553 \end{lstlisting} | |
554 | |
555 The reduction of frame states makes it easier for the compiler to perform memory optimizations like memory access coalescing. | |
556 We believe that this reduction on frame states is the key to effective vectorization and other compiler optimizations where compilers of compilers of unmanaged languages have advantages. | |
557 | |
558 \subsection{Guards} | |
559 A guard is a node that deoptimizes based on a conditional expression. | |
560 Guards are not attached to a certain frame state, they can move around freely and will always use the correct frame state when the nodes are scheduled (i.e., the last emitted frame state). | |
561 The node that is guarded by the deoptimization has a data dependency on the guard and the guard in turn has a data dependency on the condition. | |
562 A guard must not be moved above any \texttt{If} nodes. | |
563 Therefore, we use \texttt{Anchor} instructions after a control flow split and a data dependency from the guard to this anchor. | |
564 The anchor is the most distant instruction that is postdominated by the guarded instruction and the guard can be scheduled anywhere between those two nodes. | |
565 This ensures maximum flexibility for the guard instruction and guarantees that we only deoptimize if the control flow would have reached the guarded instruction (without taking exceptions into account). | |
566 | |
567 To illustrate the strengths of this approach, we show the graph for the Java code snippet shown in \ref{lst:guard1}. | |
568 The example looks artificial, but in case of method inlining, this is a pattern that is not unlikely to be present in a normal Java program. | |
569 Figure \ref{fig:guard0} shows the compiler graph for the example method after graph building. | |
570 The field stores are both represented by a single instruction and the null check that is implicitely incorporated in the field store. | |
571 | |
572 \begin{lstlisting}[label=lst:guard1, caption=Example method that demonstrates the strengths of modelling the guards explicitely., captionpos=b] | |
573 void init(Point p) { | |
574 if (p != null) { | |
575 p.x = 0; | |
576 } | |
577 p.y = 0; | |
578 } | |
579 \end{lstlisting} | |
580 | |
581 \begin{figure}[h] | |
582 \centering | |
583 \begin{digraphenv}{scale=0.5}{guard0} | |
584 \textnode{entry}{Entry} | |
585 \nodesplit{if}{If} | |
586 \node{merge}{Merge} | |
587 \node{return}{Return} | |
588 \node{cmpnull}{NonNull} | |
589 \textnode{p}{p} | |
590 \textnode{const0}{0} | |
591 \nodebisplit{store1}{FieldStore x} | |
592 \nodebisplit{store2}{FieldStore y} | |
593 \nodeframestate{fs1}{FrameState} | |
594 \nodeframestate{fs2}{FrameState} | |
595 \datalabel{store1:in1}{p} | |
596 \datalabel{store2:in1}{p} | |
597 \datalabel{store1:in2}{const0} | |
598 \datalabel{store2:in2}{const0} | |
599 \control{entry}{if} | |
600 \data{if}{cmpnull} | |
601 \controllabel{if:succ1}{merge} | |
602 \controllabel{if:succ2}{store1} | |
603 \controllabel{store1:succ1}{merge} | |
604 \controllabel{store1:succ2}{fs1} | |
605 \control{merge}{store2} | |
606 \controllabel{store2:succ1}{return} | |
607 \controllabel{store2:succ2}{fs2} | |
608 \data{cmpnull}{p} | |
609 \end{digraphenv} | |
610 \caption{Initial graph with the two field stores.} | |
611 \label{fig:guard0} | |
612 \end{figure} | |
613 | |
614 Figure~\ref{fig:guard1} shows the example graph at a later compilation phase when the field store instructions are lowered to memory store instructions and explicitely modelled null check guards. | |
615 The guards are attached to anchor instructions that delimit their possible schedule. | |
616 The first guard must not be moved outside the \texttt{if} block; the second guard may be moved before the \texttt{If} instruction, because at this point it is already guaranteed that the second store is executed. | |
617 | |
618 \begin{figure}[h] | |
619 \centering | |
620 \begin{digraphenv}{scale=0.5}{guard1} | |
621 \textnode{entry}{Entry} | |
622 \node{anchor1}{Anchor} | |
623 \node{anchor2}{Anchor} | |
624 \nodesplit{if}{If} | |
625 \node{merge}{Merge} | |
626 \node{return}{Return} | |
627 \node{cmpnull}{NonNull} | |
628 \textnode{p}{p} | |
629 \textnode{const0}{0} | |
630 \nodeguard{guard1}{Guard} | |
631 \nodeguard{guard2}{Guard} | |
632 \nodetrisplit{store1}{MemStore 16 (int)} | |
633 \nodetrisplit{store2}{MemStore 20 (int)} | |
634 \nodeframestate{fs1}{FrameState} | |
635 \nodeframestate{fs2}{FrameState} | |
636 \data{store1:in1}{p} | |
637 \data{store2:in1}{p} | |
638 \data{store1:in2}{const0} | |
639 \data{store2:in2}{const0} | |
640 \data{store1:in3}{guard1} | |
641 \data{store2:in3}{guard2} | |
642 \data{guard1:in1}{anchor2} | |
643 \data{guard2:in1}{anchor1} | |
644 \data{guard1:in2}{cmpnull} | |
645 \data{guard2:in2}{cmpnull} | |
646 \control{entry}{anchor1} | |
647 \control{anchor1}{if} | |
648 \data{if}{cmpnull} | |
649 \controllabel{if:succ1}{merge} | |
650 \controllabel{if:succ2}{anchor2} | |
651 \control{anchor2}{store1} | |
652 \controllabel{store1:succ1}{merge} | |
653 \controllabel{store1:succ2}{fs1} | |
654 \control{merge}{store2} | |
655 \controllabel{store2:succ1}{return} | |
656 \controllabel{store2:succ2}{fs2} | |
657 \data{cmpnull}{p} | |
658 \end{digraphenv} | |
659 \caption{A load guarded by a null check guard.} | |
660 \label{fig:guard1} | |
661 \end{figure} | |
662 | |
663 The first guard can be easily removed, because it is guarded by an \texttt{If} instruction that checks the same condition. | |
664 Therefore we can remove the guard and the anchor from the graph and this gives us the graph shown in Figure \ref{fig:guard2}. | |
665 | |
666 There is another optimization for guard instructions: If two guards that are anchored to the true and false branch of the same \texttt{If} instruction have the same condition, they can be merged, so that the resulting guard is anchored at the most distant node of which the \texttt{If} instruction is a postdominator. | |
667 | |
668 | |
669 \begin{figure}[h] | |
670 \centering | |
671 \begin{digraphenv}{scale=0.5}{guard2} | |
672 \textnode{entry}{Entry} | |
673 \node{anchor1}{Anchor} | |
674 \nodesplit{if}{If} | |
675 \node{merge}{Merge} | |
676 \node{return}{Return} | |
677 \node{cmpnull}{NonNull} | |
678 \textnode{p}{p} | |
679 \textnode{const0}{0} | |
680 \nodeguard{guard2}{Guard} | |
681 \nodetrisplit{store1}{MemStore 16 (int)} | |
682 \nodetrisplit{store2}{MemStore 20 (int)} | |
683 \nodeframestate{fs1}{FrameState} | |
684 \nodeframestate{fs2}{FrameState} | |
685 \data{store1:in1}{p} | |
686 \data{store2:in1}{p} | |
687 \data{store1:in2}{const0} | |
688 \data{store2:in2}{const0} | |
689 \data{store2:in3}{guard2} | |
690 \data{guard2:in1}{anchor1} | |
691 \data{guard2:in2}{cmpnull} | |
692 \control{entry}{anchor1} | |
693 \control{anchor1}{if} | |
694 \data{if}{cmpnull} | |
695 \controllabel{if:succ1}{merge} | |
696 \controllabel{if:succ2}{store1} | |
697 \controllabel{store1:succ1}{merge} | |
698 \controllabel{store1:succ2}{fs1} | |
699 \control{merge}{store2} | |
700 \controllabel{store2:succ1}{return} | |
701 \controllabel{store2:succ2}{fs2} | |
702 \data{cmpnull}{p} | |
703 \end{digraphenv} | |
704 \caption{After removing redundant guards.} | |
705 \label{fig:guard2} | |
706 \end{figure} | |
707 | |
708 The remaining guard can now be moved above the \texttt{If} condition and be used to eliminate the need for the \texttt{If} node. | |
709 From this point on, the guard can however no longer be moved below the first memory store. | |
710 We use a control dependency from the guard to the field store to express this condition. | |
711 The link between the second store and the guard and the control flow merge instruction is no longer necessary. | |
712 | |
713 \begin{figure}[h] | |
714 \centering | |
715 \begin{digraphenv}{scale=0.5}{guard3} | |
716 \textnode{entry}{Entry} | |
717 \node{anchor1}{Anchor} | |
718 \node{return}{Return} | |
719 \node{cmpnull}{NonNull} | |
720 \textnode{p}{p} | |
721 \textnode{const0}{0} | |
722 \nodeguard{guard2}{Guard} | |
723 \nodetrisplit{store1}{MemStore 16 (int)} | |
724 \nodetrisplit{store2}{MemStore 20 (int)} | |
725 \nodeframestate{fs1}{FrameState} | |
726 \nodeframestate{fs2}{FrameState} | |
727 \data{store1:in1}{p} | |
728 \data{store2:in1}{p} | |
729 \data{store1:in2}{const0} | |
730 \data{store2:in2}{const0} | |
731 \data{store2:in3}{guard2} | |
732 \data{guard2:in1}{anchor1} | |
733 \data{guard2:in2}{cmpnull} | |
734 \control{guard2}{store1} | |
735 \control{entry}{anchor1} | |
736 \control{anchor1}{store1} | |
737 \controllabel{store1:succ2}{fs1} | |
738 \control{store1}{store2} | |
739 \controllabel{store2:succ1}{return} | |
740 \controllabel{store2:succ2}{fs2} | |
741 \data{cmpnull}{p} | |
742 \end{digraphenv} | |
743 \caption{After eliminating an if with a guard.} | |
744 \label{fig:guard3} | |
745 \end{figure} | |
746 | |
747 At some point during the compilation, guards need to be fixed, which means that appropriate data and control dependencies will be inserted so that they cannot move outside the scope of the associated frame state. | |
748 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations. | |
749 A simple algorithm for this removal of frame states would be to move all guards as far upwards as possible and then the guards are fixed using anchor nodes. | |
750 In our example, the guard is already fixed, so there is no deoptimization point that uses any of the memory store frame states. | |
751 Therefore we can delete the frame states from the graph (see Figure \ref{fig:guard4}). | |
752 | |
753 \begin{figure}[h] | |
754 \centering | |
755 \begin{digraphenv}{scale=0.5}{guard4} | |
756 \textnode{entry}{Entry} | |
757 \node{anchor1}{Anchor} | |
758 \node{return}{Return} | |
759 \node{cmpnull}{NonNull} | |
760 \textnode{p}{p} | |
761 \textnode{const0}{0} | |
762 \nodeguard{guard2}{Guard} | |
763 \nodetrisplit{store1}{MemStore 16 (int)} | |
764 \nodetrisplit{store2}{MemStore 20 (int)} | |
765 \data{store1:in1}{p} | |
766 \data{store2:in1}{p} | |
767 \data{store1:in2}{const0} | |
768 \data{store2:in2}{const0} | |
769 \data{store2:in3}{guard2} | |
770 \data{guard2:in1}{anchor1} | |
771 \data{guard2:in2}{cmpnull} | |
772 \control{guard2}{store1} | |
773 \control{entry}{anchor1} | |
774 \control{anchor1}{store1} | |
775 \control{store1}{store2} | |
776 \controllabel{store2:succ1}{return} | |
777 \data{cmpnull}{p} | |
778 \end{digraphenv} | |
779 \caption{After removing the frame states.} | |
780 \label{fig:guard4} | |
781 \end{figure} | |
782 | |
783 Now we can use memory coalescing to combine the two stores without frame state to adjacent locations in the same object. | |
784 This is only possible if the first store does not have a frame state. | |
785 Figure \ref{fig:guard5} shows the resulting graph. | |
786 | |
787 | |
788 \begin{figure}[h] | |
789 \centering | |
790 \begin{digraphenv}{scale=0.5}{guard5} | |
791 \textnode{entry}{Entry} | |
792 \node{anchor1}{Anchor} | |
793 \node{return}{Return} | |
794 \node{cmpnull}{NonNull} | |
795 \textnode{p}{p} | |
796 \textnode{const0}{0} | |
797 \nodeguard{guard2}{Guard} | |
798 \nodetrisplit{store1}{MemStore 16 (long)} | |
799 \data{store1:in1}{p} | |
800 \data{store1:in2}{const0} | |
801 \data{guard2:in1}{anchor1} | |
802 \data{guard2:in2}{cmpnull} | |
803 \control{guard2}{store1} | |
804 \control{entry}{anchor1} | |
805 \control{anchor1}{store1} | |
806 \controllabel{store1:succ1}{return} | |
807 \data{cmpnull}{p} | |
808 \end{digraphenv} | |
809 \caption{After coalescing the two memory stores.} | |
810 \label{fig:guard5} | |
811 \end{figure} | |
812 | |
813 A memory store that immediately follows a null check guard instruction on the same object, can be combined into a store with an implicit null check (that deoptimizes instead of throwing the exception). | |
814 Therefore, we can remove the guard again and also the anchor is no longer necessary. | |
815 Figure~\ref{fig:guard6} shows now that fully optimized graph that is generated for Listing~\ref{lst:guard1}. | |
816 | |
817 \begin{figure}[h] | |
818 \centering | |
819 \begin{digraphenv}{scale=0.5}{guard6} | |
820 \textnode{entry}{Entry} | |
821 \node{return}{Return} | |
822 \textnode{p}{p} | |
823 \textnode{const0}{0} | |
824 \nodetrisplit{store1}{DeoptimizingMemStore 16 (long)} | |
825 \data{store1:in1}{p} | |
826 \data{store1:in2}{const0} | |
827 \control{entry}{store1} | |
828 \controllabel{store1:succ1}{return} | |
829 \end{digraphenv} | |
830 \caption{Fully optimized method.} | |
831 \label{fig:guard6} | |
832 \end{figure} | |
833 | |
834 | |
835 \section{Conclusions} | |
836 \label{sec:conclusions} | |
837 This document sketched the strategy for the Graph compiler. | |
838 We already reached M1 (as defined in Section~\ref{sec:mile}) and have the following plans for M2 to M4: | |
839 \begin{description} | |
840 \item[M2:] June 30th, 2011 | |
841 \item[M3:] August 15th, 2011 | |
842 \item[M4:] September 30th, 2011 | |
843 \end{description} | |
844 After we reach M4, we want to create a new project road map that further improves the Graal compiler with respect to its two main goals: Modularity and peak performance. | |
845 | |
846 \end{document} |