comparison graal/com.oracle.max.graal.doc.initial/graal_compiler.tex @ 2949:4db4e8cb6bd6

Updated design document (incorporated comments from Peter Kessler).
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Sat, 11 Jun 2011 18:41:40 +0200
parents 5d4aa5672d3d
children
comparison
equal deleted inserted replaced
2948:c76db61fbb73 2949:4db4e8cb6bd6
72 Currently, the Graal~VM is fully functional and runs benchmarks (SciMark, DaCapo) at a similar speed as the HotSpot client compiler. 72 Currently, the Graal~VM is fully functional and runs benchmarks (SciMark, DaCapo) at a similar speed as the HotSpot client compiler.
73 73
74 \section{Goals} 74 \section{Goals}
75 The compiler effort aims at rewriting the high-level intermediate representation of C1X with two main goals: 75 The compiler effort aims at rewriting the high-level intermediate representation of C1X with two main goals:
76 \begin{description} 76 \begin{description}
77 \item[Modularity:] A modular design of the compiler should simplify the implementation of new languages, new back-ends, and new optimizations. 77 \item[Modularity:]
78 \item[Peak Performance:] A more powerful intermediate representation should enable the implementation of aggressive optimizations that impact the peak performance of the resulting machine code. 78 A modular design of the compiler should simplify the implementation of new languages, new back-ends, and new optimizations.
79 \item[Peak Performance:]
80 A more powerful intermediate representation should enable the implementation of aggressive optimizations that impact the peak performance of the resulting machine code.
81 In terms of startup performance, we want to orientate ourselves to the HotSpot server compiler configuration.
79 \end{description} 82 \end{description}
80 83
81 \section{Design} 84 \section{Design}
82 For the implementation of the compiler, we rely on the following design decisions: 85 For the implementation of the compiler, we rely on the following design decisions:
83 \begin{description} 86 \begin{description}
101 \item[Generality:] 104 \item[Generality:]
102 The compiler does not require Java as its input. 105 The compiler does not require Java as its input.
103 This is achieved by having a graph as the starting point of the compilation and not a Java bytecode array. 106 This is achieved by having a graph as the starting point of the compilation and not a Java bytecode array.
104 Building the graph from the Java bytecodes must happen before giving a method to the compiler. 107 Building the graph from the Java bytecodes must happen before giving a method to the compiler.
105 This enables front-ends for different languages (e.g., Ruby or JavaScript) to provide their own graph. 108 This enables front-ends for different languages (e.g., Ruby or JavaScript) to provide their own graph.
109 The support for different languages is constrained by the following two conditions: We only support structured control flow, and the dynamic type system of the language must be expressible using the RiType class.
106 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase. 110 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase.
107 \end{description} 111 \end{description}
108 112
109 \section{Milestones} 113 \section{Milestones}
110 \label{sec:mile} 114 \label{sec:mile}
166 \item \emph{successors} are all nodes that have to be after this node in control flow. 170 \item \emph{successors} are all nodes that have to be after this node in control flow.
167 \item \emph{predecessors} are all nodes whose successors contain this node. 171 \item \emph{predecessors} are all nodes whose successors contain this node.
168 \end{itemize} 172 \end{itemize}
169 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors. 173 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
170 \item Every node must be able to support cloning and serialization. 174 \item Every node must be able to support cloning and serialization.
171 \item The edges of a node also define \textit{happens-before} and \textit{happens-after} relationships as shown in Figure~\ref{fig:directions}. 175 \item The edges of a node also define \textit{emitted-before} and \textit{emitted-after} relationships as shown in Figure~\ref{fig:directions}.
176 They mean that the machine code for one node is emitted before or after the machine code of another node.
172 \end{itemize} 177 \end{itemize}
173 178
174 \begin{figure}[ht] 179 \begin{figure}[ht]
175 \centering 180 \centering
176 \begin{digraphenv}{scale=0.5}{graphdirections} 181 \begin{digraphenv}{scale=0.5}{graphdirections}
182 \data{node1}{inputs} 187 \data{node1}{inputs}
183 \control{node1}{successors} 188 \control{node1}{successors}
184 \data{usages}{node1} 189 \data{usages}{node1}
185 \control{predecessors}{node1} 190 \control{predecessors}{node1}
186 \node{node2}{Node} 191 \node{node2}{Node}
187 \textnode{before}{happens-before} 192 \textnode{before}{emitted-before}
188 \textnode{after}{happens-after} 193 \textnode{after}{emitted-after}
189 \data{node2}{before} 194 \data{node2}{before}
190 \control{node2}{after} 195 \control{node2}{after}
191 \data{after}{node2} 196 \data{after}{node2}
192 \control{before}{node2} 197 \control{before}{node2}
193 \end{digraphenv} 198 \end{digraphenv}
206 An example for this would be when the escape analysis finds out that a certain object only escapes because of one method call and this method call is not inlined, because the penalty was to high. 211 An example for this would be when the escape analysis finds out that a certain object only escapes because of one method call and this method call is not inlined, because the penalty was to high.
207 In this case, we can chose to nevertheless inline the method in order to increase the chances for finding out that the object does not escape. 212 In this case, we can chose to nevertheless inline the method in order to increase the chances for finding out that the object does not escape.
208 213
209 \section{Control Flow} 214 \section{Control Flow}
210 215
211 Control flow is managed in way where the predecessor node contains direct pointers to its successor nodes. 216 Control flow is managed in a way where the predecessor node contains direct pointers to its successor nodes.
212 We reserve the term \textit{instruction} for nodes that are embedded in the control flow.
213 This is opposite to the approach taken in the server compiler, where control flow and data flow edges point in the same direction. 217 This is opposite to the approach taken in the server compiler, where control flow and data flow edges point in the same direction.
214 The advantage that we see in our approach is that there is no need for projection nodes in case of control flow splits. 218 The advantage that we see in our approach is that there is no need for projection nodes in case of control flow splits.
215 An \texttt{If} instruction can directly point to its true and false successors without any intermediate nodes. 219 An \texttt{If} node can directly point to its true and false successors without any intermediate nodes.
216 This makes the graph more compact and simplifies graph traversal. 220 This makes the graph more compact and simplifies graph traversal.
217 221 We distinguish between \textit{fixed nodes} that are directly embedded in the control flow and \textit{floating nodes} whose position in the control flow may vary.
218 Listing~\ref{lst:cfg2} shows an example Java program with an if statement where both paths do not contain any instruction with side effects. 222
223 Listing~\ref{lst:cfg2} shows an example Java program with an if statement where both paths do not contain any node with side effects.
219 Figure~\ref{fig:loopexits} shows the corresponding compiler graph. 224 Figure~\ref{fig:loopexits} shows the corresponding compiler graph.
220 The \texttt{If} instruction can directly point its true and false successors to a \texttt{Merge} instruction. 225 The \texttt{If} node can directly point its true and false successors to a \texttt{Merge} node.
221 A \texttt{Phi} node that selects the appropriate value is appended to the \texttt{Merge} instruction. 226 A \texttt{Phi} node that selects the appropriate value is appended to the \texttt{Merge} node.
222 The \texttt{Return} instruction then has a data dependency on the \texttt{Phi} node. 227 The \texttt{Return} node then has a data dependency on the \texttt{Phi} node.
223 228
224 \begin{lstlisting}[label=lst:cfg2, caption=Control flow in the graph., captionpos=b] 229 \begin{lstlisting}[label=lst:cfg2, caption=Control flow in the graph., captionpos=b]
225 if (condition) { return 0; } 230 if (condition) { return 0; }
226 else { return 1; } 231 else { return 1; }
227 \end{lstlisting} 232 \end{lstlisting}
257 We do not throw runtime exceptions (e.g., \texttt{IndexOutOf\-BoundsException}, \texttt{Null\-Pointer\-Exception}, or \texttt{Out\-Of\-Memory\-Exception}), but deoptimize instead. 262 We do not throw runtime exceptions (e.g., \texttt{IndexOutOf\-BoundsException}, \texttt{Null\-Pointer\-Exception}, or \texttt{Out\-Of\-Memory\-Exception}), but deoptimize instead.
258 This reduces the places in the compiled code where an exact bytecode location and debug information must be known. 263 This reduces the places in the compiled code where an exact bytecode location and debug information must be known.
259 Additionally, this greatly reduces the number of exception handler edges in the compiled code. 264 Additionally, this greatly reduces the number of exception handler edges in the compiled code.
260 The main advantage of this technique is however, that we are free in moving around bounds checks, memory allocation, memory accesses with implicit null checks, etc. 265 The main advantage of this technique is however, that we are free in moving around bounds checks, memory allocation, memory accesses with implicit null checks, etc.
261 266
262 There are only two kinds of instruction that need explicit exception edges, because they are the only instructions that can throw exceptions in compiled code: \texttt{Throw} instructions and \texttt{Invoke} instructions. 267 There are only two kinds of nodes that need explicit exception edges, because they are the only nodes that can throw exceptions in compiled code: \texttt{Throw} nodes and \texttt{Invoke} nodes.
263 They are modelled as instructions with an additional control flow continuation that points to an \texttt{ExceptionDispatch} instruction. 268 They are modelled as nodes with an additional control flow continuation that points to an \texttt{ExceptionDispatch} node.
264 The exception dispatch instruction decides based on the type of the exception object whether the control should flow to the catch handler or to another exception dispatch. 269 The exception dispatch node decides based on the type of the exception object whether the control should flow to the catch handler or to another exception dispatch.
265 If there is no catch handler in the currently compiled method, then the control flows into the \texttt{Unwind} instruction that handles the exception by forwarding it to the caller. 270 If there is no catch handler in the currently compiled method, then the control flows into the \texttt{Unwind} node that handles the exception by forwarding it to the caller.
266 Listing~\ref{lst:exc1} shows an example Java program with nested try blocks and Figure \ref{fig:exc1} shows the corresponding compiler graph. 271 Listing~\ref{lst:exc1} shows an example Java program with nested try blocks and Figure \ref{fig:exc1} shows the corresponding compiler graph.
267 272
268 \begin{lstlisting}[label=lst:exc1, caption=Exception dispatch in the compiler graph., captionpos=b] 273 \begin{lstlisting}[label=lst:exc1, caption=Exception dispatch in the compiler graph., captionpos=b]
269 try { m1(); 274 try { m1();
270 try { m2(); 275 try { m2();
306 311
307 \section{Loops} 312 \section{Loops}
308 \label{sec:loops} 313 \label{sec:loops}
309 Loops form a first-class construct in the IR that is expressed by specialized IR nodes during all optimization phases. 314 Loops form a first-class construct in the IR that is expressed by specialized IR nodes during all optimization phases.
310 We only compile methods with a control flow where every loop has a single entry point. 315 We only compile methods with a control flow where every loop has a single entry point.
311 This entry point is a \nodename{LoopBegin} instruction. 316 This entry point is a \nodename{LoopBegin} node.
312 This instruction is connected to a \nodename{LoopEnd} instruction that merges all control flow paths that do not exit the loop. 317 This node is connected to a \nodename{LoopEnd} node that merges all control flow paths that do not exit the loop.
313 The edge between the \nodename{LoopBegin} and the \nodename{LoopEnd} is the backedge of the loop. 318 The edge between the \nodename{LoopBegin} and the \nodename{LoopEnd} is the backedge of the loop.
314 It goes from the beginning to the end in order to make the graph acyclic. 319 It goes from the beginning to the end in order to make the graph acyclic.
315 An algorithm that traverses the control flow has to explicitely decide whether it wants to incorporate backedges (i.e., special case of the treatment of \nodename{LoopEnd}) or ignore them. 320 An algorithm that traverses the control flow has to explicitely decide whether it wants to incorporate backedges (i.e., special case of the treatment of \nodename{LoopEnd}) or ignore them.
316 Figure \ref{fig:loop1} shows a simple example with a loop with a single entry and two exits. 321 Figure \ref{fig:loop1} shows a simple example with a loop with a single entry and two exits.
317 322
319 \centering 324 \centering
320 \begin{digraphenv}{scale=0.5}{layout1} 325 \begin{digraphenv}{scale=0.5}{layout1}
321 \textnode{BeforeLoop}{Loop entry} 326 \textnode{BeforeLoop}{Loop entry}
322 \textnode{Exit1}{First loop exit} 327 \textnode{Exit1}{First loop exit}
323 \textnode{Exit2}{Second loop exit} 328 \textnode{Exit2}{Second loop exit}
324 \nodesplit{LoopBegin}{LoopBegin} 329 \node{LoopBegin}{LoopBegin}
325 \node{LoopEnd}{LoopEnd} 330 \node{LoopEnd}{LoopEnd}
326 \nodesplit{If1}{If} 331 \nodesplit{If1}{If}
327 \nodesplit{If2}{If} 332 \nodesplit{If2}{If}
328 \controllabel{LoopBegin:succ1}{LoopEnd} 333 \data{LoopEnd}{LoopBegin}
329 \controllabel{LoopBegin:succ2}{If1} 334 \control{LoopBegin}{If1}
330 \controllabel{If1:succ1}{If2} 335 \controllabel{If1:succ1}{If2}
331 \controllabel{If2:succ1}{LoopEnd} 336 \controllabel{If2:succ1}{LoopBody}
337 \textnode{LoopBody}{Loop body}
338 \control{LoopBody}{LoopEnd}
332 \controllabel{BeforeLoop}{LoopBegin} 339 \controllabel{BeforeLoop}{LoopBegin}
333 \controllabel{If1:succ2}{Exit1} 340 \controllabel{If1:succ2}{Exit1}
334 \controllabel{If2:succ2}{Exit2} 341 \controllabel{If2:succ2}{Exit2}
335 \end{digraphenv} 342 \end{digraphenv}
336 \caption{A simple loop with two exits.} 343 \caption{A simple loop with two exits.}
337 \label{fig:loop1} 344 \label{fig:loop1}
338 \end{figure} 345 \end{figure}
339 346
340 \subsection{Loop Phis} 347 \subsection{Loop Phis}
341 Data flow in loops is modeled with special phi nodes at the beginning and the end of the loop. 348 Data flow in loops is modeled with special phi nodes at the beginning and the end of the loop.
342 The \nodename{LoopEnd} instruction merges every value that flows into the next loop iteration in associated \nodename{LoopEndPhi} nodes. 349 The \nodename{LoopEnd} node merges every value that flows into the next loop iteration in associated \nodename{LoopEndPhi} nodes.
343 A corresponding \nodename{LoopBeginPhi} node that is associated with the loop header has a control flow dependency on the \nodename{LoopEndPhi} node. 350 A corresponding \nodename{LoopBeginPhi} node that is associated with the loop header has a control flow dependency on the \nodename{LoopEndPhi} node.
344 Listing~\ref{lst:loop} shows a simple counting loop that is used as an example in the rest of this section. 351 Listing~\ref{lst:loop} shows a simple counting loop that is used as an example in the rest of this section.
345 Figure~\ref{fig:loop2} shows how the loop is modelled immediately after building the graph. 352 Figure~\ref{fig:loop2} shows how the loop is modelled immediately after building the graph.
346 353
347 \begin{lstlisting}[label=lst:loop, caption=Loop example that counts from 0 to n-1., captionpos=b] 354 \begin{lstlisting}[label=lst:loop, caption=Loop example that counts from 0 to n-1., captionpos=b]
354 \textnode{BeforeLoop}{Loop entry} 361 \textnode{BeforeLoop}{Loop entry}
355 \textnode{Exit}{Loop exit} 362 \textnode{Exit}{Loop exit}
356 \textnode{n}{n} 363 \textnode{n}{n}
357 \textnode{Constant0}{0} 364 \textnode{Constant0}{0}
358 \textnode{Constant1}{1} 365 \textnode{Constant1}{1}
359 \nodesplit{LoopBegin}{LoopBegin} 366 \node{LoopBegin}{LoopBegin}
360 \node{LoopEnd}{LoopEnd} 367 \node{LoopEnd}{LoopEnd}
361 \nodesplit{If1}{If} 368 \nodesplit{If1}{If}
362 \controllabel{LoopBegin:succ1}{LoopEnd} 369 \data{LoopEnd}{LoopBegin}
363 \controllabel{LoopBegin:succ2}{If1} 370 \control{LoopBegin}{If1}
364 \nodebi{Compare}{&lt;} 371 \nodebi{Compare}{&lt;}
365 \nodebi{LoopBeginPhi}{LoopBeginPhi} 372 \nodebi{LoopBeginPhi}{LoopBeginPhi}
366 \nodebi{Add}{+} 373 \nodebi{Add}{+}
367 \datalabel{Add:in1}{LoopBeginPhi} 374 \datalabel{Add:in1}{LoopBeginPhi}
368 \datalabel{Add:in2}{Constant1} 375 \datalabel{Add:in2}{Constant1}
373 \datalabel{LoopBeginPhi:in1}{LoopBegin} 380 \datalabel{LoopBeginPhi:in1}{LoopBegin}
374 \datalabel{LoopBeginPhi:in2}{Constant0} 381 \datalabel{LoopBeginPhi:in2}{Constant0}
375 \datalabel{Compare:in1}{LoopBeginPhi} 382 \datalabel{Compare:in1}{LoopBeginPhi}
376 \datalabel{Compare:in2}{n} 383 \datalabel{Compare:in2}{n}
377 \data{If1}{Compare} 384 \data{If1}{Compare}
378 \controllabel{If1:succ1}{LoopEnd} 385 \controllabel{If1:succ1}{LoopBody}
386 \textnode{LoopBody}{Loop body}
387 \control{LoopBody}{LoopEnd}
379 \controllabel{BeforeLoop}{LoopBegin} 388 \controllabel{BeforeLoop}{LoopBegin}
380 \controllabel{If1:succ2}{Exit} 389 \controllabel{If1:succ2}{Exit}
381 \end{digraphenv} 390 \end{digraphenv}
382 \caption{Graph for a loop counting from 0 to n-1.} 391 \caption{Graph for a loop counting from 0 to n-1.}
383 \label{fig:loop2} 392 \label{fig:loop2}
395 \textnode{BeforeLoop}{Loop entry} 404 \textnode{BeforeLoop}{Loop entry}
396 \textnode{Exit}{Loop exit} 405 \textnode{Exit}{Loop exit}
397 \textnode{n}{n} 406 \textnode{n}{n}
398 \textnode{Constant0}{0} 407 \textnode{Constant0}{0}
399 \textnode{Constant1}{1} 408 \textnode{Constant1}{1}
400 \nodesplit{LoopBegin}{LoopBegin} 409 \node{LoopBegin}{LoopBegin}
401 \node{LoopEnd}{LoopEnd} 410 \node{LoopEnd}{LoopEnd}
402 \nodesplit{If1}{If} 411 \nodesplit{If1}{If}
403 \controllabel{LoopBegin:succ1}{LoopEnd} 412 \data{LoopEnd}{LoopBegin}
404 \controllabel{LoopBegin:succ2}{If1} 413 \control{LoopBegin}{If1}
405 \nodebi{Compare}{&lt;} 414 \nodebi{Compare}{&lt;}
406 \nodetri{LoopCounter}{LoopCounter} 415 \nodetri{LoopCounter}{LoopCounter}
407 \datalabel{LoopCounter:in1}{LoopBegin} 416 \datalabel{LoopCounter:in1}{LoopBegin}
408 \datalabeltext{LoopCounter:in2}{Constant0}{init} 417 \datalabeltext{LoopCounter:in2}{Constant0}{init}
409 \datalabeltext{LoopCounter:in3}{Constant1}{stride} 418 \datalabeltext{LoopCounter:in3}{Constant1}{stride}
410 \datalabel{Compare:in1}{LoopCounter} 419 \datalabel{Compare:in1}{LoopCounter}
411 \datalabel{Compare:in2}{n} 420 \datalabel{Compare:in2}{n}
412 \data{If1}{Compare} 421 \data{If1}{Compare}
413 \controllabel{If1:succ1}{LoopEnd} 422 \controllabel{If1:succ1}{LoopBody}
423 \textnode{LoopBody}{Loop body}
424 \control{LoopBody}{LoopEnd}
414 \controllabel{BeforeLoop}{LoopBegin} 425 \controllabel{BeforeLoop}{LoopBegin}
415 \controllabel{If1:succ2}{Exit} 426 \controllabel{If1:succ2}{Exit}
416 \end{digraphenv} 427 \end{digraphenv}
417 \caption{Graph after loop counter transformation.} 428 \caption{Graph after loop counter transformation.}
418 \label{fig:loop3} 429 \label{fig:loop3}
421 \subsection{Bounded Loops} 432 \subsection{Bounded Loops}
422 433
423 If the total maximum number of iterations of a loop is fixed, then the loop is converted into a bounded loop. 434 If the total maximum number of iterations of a loop is fixed, then the loop is converted into a bounded loop.
424 The total number of iterations always denotes the number of full iterations of the loop with the control flowing from the loop begin to the loop end. 435 The total number of iterations always denotes the number of full iterations of the loop with the control flowing from the loop begin to the loop end.
425 If the total number of iterations is reached, the loop is exited directly from the loop header. 436 If the total number of iterations is reached, the loop is exited directly from the loop header.
437 The representation of the bounded loop in the graph should support reasoning about the loop, but does not specify how the loop will later be converted to machine code.
426 In the example, we can infer from the loop exit with the comparison on the loop counter that the total number of iterations of the loop is limited to n. 438 In the example, we can infer from the loop exit with the comparison on the loop counter that the total number of iterations of the loop is limited to n.
427 Figure \ref{fig:loop4} shows the compiler graph of the example loop after the bounded loop transformation. 439 Figure \ref{fig:loop4} shows the compiler graph of the example loop after the bounded loop transformation.
440 If there are no other exits out of the loop, then the number of iterations specified as the input to the bounded loop node is also the exact number of loop iterations.
441 Loops with the same number of iterations can be merged into a single loop.
442 We also want to support loop splitting in order to find simple loops that are candidates for vectorization.
428 443
429 \begin{figure}[ht] 444 \begin{figure}[ht]
430 \centering 445 \centering
431 \begin{digraphenv}{scale=0.5}{layout4} 446 \begin{digraphenv}{scale=0.5}{layout4}
432 \textnode{BeforeLoop}{Loop entry} 447 \textnode{BeforeLoop}{Loop entry}
433 \textnode{Exit}{Loop exit} 448 \textnode{Exit}{Loop exit}
434 \textnode{n}{n} 449 \textnode{n}{n}
435 \textnode{Constant0}{0} 450 \textnode{Constant0}{0}
436 \textnode{Constant1}{1} 451 \textnode{Constant1}{1}
437 \nodesplittri{LoopBegin}{BoundedLoopBegin} 452 \nodesplit{LoopBegin}{BoundedLoopBegin}
438 \node{LoopEnd}{LoopEnd} 453 \node{LoopEnd}{LoopEnd}
439 \controllabel{LoopBegin:succ1}{LoopEnd} 454 \data{LoopEnd}{LoopBegin}
440 \controllabel{LoopBegin:succ2}{LoopEnd} 455 \controllabel{LoopBegin:succ1}{LoopBody}
441 \controllabel{LoopBegin:succ3}{Exit} 456 \textnode{LoopBody}{Loop body}
457 \control{LoopBody}{LoopEnd}
458 \controllabel{LoopBegin:succ2}{Exit}
442 \nodetri{LoopCounter}{LoopCounter} 459 \nodetri{LoopCounter}{LoopCounter}
443 \datalabel{LoopCounter:in1}{LoopBegin} 460 \datalabel{LoopCounter:in1}{LoopBegin}
444 \datalabeltext{LoopCounter:in2}{Constant0}{init} 461 \datalabeltext{LoopCounter:in2}{Constant0}{init}
445 \datalabeltext{LoopCounter:in3}{Constant1}{stride} 462 \datalabeltext{LoopCounter:in3}{Constant1}{stride}
446 \data{LoopBegin}{n} 463 \data{LoopBegin}{n}
451 \end{figure} 468 \end{figure}
452 469
453 \subsection{Vectorization} 470 \subsection{Vectorization}
454 471
455 If we have now a bounded loop with no additional loop exit and no associated phi nodes (only associated loop counters), we can vectorize the loop. 472 If we have now a bounded loop with no additional loop exit and no associated phi nodes (only associated loop counters), we can vectorize the loop.
456 We replace the loop header with a normal instruction that produces a vector of values from 0 to the number of loop iterations minus 1. 473 We replace the loop header with a normal node that produces a vector of values from 0 to the number of loop iterations minus 1.
457 The loop counters are replaced with \texttt{VectorAdd} and \texttt{VectorMul} nodes. 474 The loop counters are replaced with \texttt{VectorAdd} and \texttt{VectorMul} nodes.
458 The vectorization is only possible if every node of the loop can be replaced with a corresponding vector node. 475 The vectorization is only possible if every node of the loop can be replaced with a corresponding vector node.
459 Figure \ref{fig:loop5} shows the compiler graph of the example loop after vectorization. 476 Figure \ref{fig:loop5} shows the compiler graph of the example loop after vectorization.
460 The vector nodes all work on an ordered list of integer values and are subject to canonicalization and global value numbering like any other node. 477 The vector nodes all work on an ordered list of integer values and are subject to canonicalization and global value numbering like any other node.
461 478
490 For Java, the frame state is defined in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors). 507 For Java, the frame state is defined in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors).
491 However, a frame state is not a concept specific to Java (e.g., the Crankshaft JavaScript engine uses frame states in their optimizing compiler to model the values of the AST interpreter). 508 However, a frame state is not a concept specific to Java (e.g., the Crankshaft JavaScript engine uses frame states in their optimizing compiler to model the values of the AST interpreter).
492 509
493 Frame states are necessary to support the deoptimization of the program, which is the precondition for performing aggressive optimizations that use optimistic assumptions. 510 Frame states are necessary to support the deoptimization of the program, which is the precondition for performing aggressive optimizations that use optimistic assumptions.
494 Therefore every point in the optimizing compiler that may revert execution back to the interpreter needs a valid frame state. 511 Therefore every point in the optimizing compiler that may revert execution back to the interpreter needs a valid frame state.
495 However, the point where the interpreter continues execution need not correspond exactly to the execution position of the compiled code, because many Java bytecode instructions can be safely reexecuted. 512 However, the point where the interpreter continues execution need not correspond exactly to the execution position of the compiled code, because many Java bytecode nodes can be safely reexecuted.
496 Thus, frame states need only be generated for the states after instructions that cannot be reexecuted, because they modify the state of the program. 513 Thus, frame states need only be generated for the states after nodes that cannot be reexecuted, because they modify the state of the program.
497 Examples for such instructions are: 514 Examples for such nodes are:
498 515
499 \begin{itemize} 516 \begin{itemize}
500 \item Array stores (in Java bytecodes {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE}) 517 \item Array stores (in Java bytecodes {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE})
501 \item Field stores (in Java bytecodes {\tt PUTSTATIC, PUTFIELD}) 518 \item Field stores (in Java bytecodes {\tt PUTSTATIC, PUTFIELD})
502 \item Method calls (in Java bytecodes {\tt INVOKEVIRTUAL, INVOKESPECIAL, \\INVOKESTATIC, INVOKEINTERFACE}) 519 \item Method calls (in Java bytecodes {\tt INVOKEVIRTUAL, INVOKESPECIAL, \\INVOKESTATIC, INVOKEINTERFACE})
503 \item Synchronization (in Java bytecodes {\tt MONITORENTER, MONITOREXIT}) 520 \item Synchronization (in Java bytecodes {\tt MONITORENTER, MONITOREXIT})
504 \end{itemize} 521 \end{itemize}
505 522
506 Within the graph a frame state is represented as a node that is attached to the instruction that caused it to be generated using a control dependency (see Figure~\ref{fig:fs1}). 523 Within the graph a frame state is represented as a node that is attached to the node that caused it to be generated using a control dependency (see Figure~\ref{fig:fs1}).
507 Frame states also have data dependencies on the contents of the state: the local variables and the expression stack. 524 Frame states also have data dependencies on the contents of the state: the local variables and the expression stack.
508 525
509 The frame state at the method beginning does not have to be explicitely in the graph, because it can always be reconstructed at a later stage. 526 The frame state at the method beginning does not have to be explicitely in the graph, because it can always be reconstructed at a later stage.
510 We save the frame state at control flow merges if there is at least one frame state on any control flow path between a node and its immediate dominator. 527 We save the frame state at control flow merges if there is at least one frame state on any control flow path between a node and its immediate dominator.
511 528
530 \label{fig:fs1} 547 \label{fig:fs1}
531 \end{figure} 548 \end{figure}
532 549
533 550
534 A deoptimization node needs a valid frame state that specifies the location and state where the interpreter should continue. 551 A deoptimization node needs a valid frame state that specifies the location and state where the interpreter should continue.
535 The algorithm for constructing frame states makes sure that every possible location in the graph has a well-defined frame state that can be used by a deoptimization instruction. 552 The algorithm for constructing frame states makes sure that every possible location in the graph has a well-defined frame state that can be used by a deoptimization node.
536 Therefore, there are no direct links between the deoptimization instruction and its frame state thus allowing the deoptimization instructions to move freely around. 553 Therefore, there are no direct links between the deoptimization node and its frame state thus allowing the deoptimization nodes to move freely around.
537 554
538 \subsection{Partial Escape Analysis} 555 \subsection{Partial Escape Analysis}
539 556
540 A partial escape analysis can help to further reduce the number of frame states. 557 A partial escape analysis can help to further reduce the number of frame states.
541 A field or array store does not create a new frame state, when the object that is modified did not have a chance to escape between its creation and the store. 558 A field or array store does not create a new frame state, when the object that is modified did not have a chance to escape between its creation and the store.
563 580
564 \subsection{Guards} 581 \subsection{Guards}
565 A guard is a node that deoptimizes based on a conditional expression. 582 A guard is a node that deoptimizes based on a conditional expression.
566 Guards are not attached to a certain frame state, they can move around freely and will always use the correct frame state when the nodes are scheduled (i.e., the last emitted frame state). 583 Guards are not attached to a certain frame state, they can move around freely and will always use the correct frame state when the nodes are scheduled (i.e., the last emitted frame state).
567 The node that is guarded by the deoptimization has a data dependency on the guard and the guard in turn has a data dependency on the condition. 584 The node that is guarded by the deoptimization has a data dependency on the guard and the guard in turn has a data dependency on the condition.
568 A guard must not be moved above any \texttt{If} nodes. 585 A guard may only be executed if it is guaranteed that the guarded node is executed too (if no exceptions are thrown).
569 Therefore, we use \texttt{Anchor} instructions after a control flow split and a data dependency from the guard to this anchor. 586 Therefore, we use \texttt{Anchor} nodes after a control flow split and a data dependency from the guard to this anchor.
570 The anchor is the most distant instruction that is postdominated by the guarded instruction and the guard can be scheduled anywhere between those two nodes. 587 The anchor is the most distant node that is postdominated by the guarded node and the guard can be scheduled anywhere between those two nodes.
571 This ensures maximum flexibility for the guard instruction and guarantees that we only deoptimize if the control flow would have reached the guarded instruction (without taking exceptions into account). 588 This ensures maximum flexibility for the guard node and guarantees that we only deoptimize if the control flow would have reached the guarded node (without taking exceptions into account).
572 589
573 To illustrate the strengths of this approach, we show the graph for the Java code snippet shown in \ref{lst:guard1}. 590 To illustrate the strengths of this approach, we show the graph for the Java code snippet shown in \ref{lst:guard1}.
574 The example looks artificial, but in case of method inlining, this is a pattern that is not unlikely to be present in a normal Java program. 591 The example looks artificial, but in case of method inlining, this is a pattern that is not unlikely to be present in a normal Java program.
575 Figure \ref{fig:guard0} shows the compiler graph for the example method after graph building. 592 Figure \ref{fig:guard0} shows the compiler graph for the example method after graph building.
576 The field stores are both represented by a single instruction and the null check that is implicitely incorporated in the field store. 593 The field stores are both represented by a single node and the null check that is implicitely incorporated in the field store.
577 594
578 \begin{lstlisting}[label=lst:guard1, caption=Example method that demonstrates the strengths of modelling the guards explicitely., captionpos=b] 595 \begin{lstlisting}[label=lst:guard1, caption=Example method that demonstrates the strengths of modelling the guards explicitely., captionpos=b]
579 void init(Point p) { 596 void init(Point p) {
580 if (p != null) { 597 if (p != null) {
581 p.x = 0; 598 p.x = 0;
615 \end{digraphenv} 632 \end{digraphenv}
616 \caption{Initial graph with the two field stores.} 633 \caption{Initial graph with the two field stores.}
617 \label{fig:guard0} 634 \label{fig:guard0}
618 \end{figure} 635 \end{figure}
619 636
620 Figure~\ref{fig:guard1} shows the example graph at a later compilation phase when the field store instructions are lowered to memory store instructions and explicitely modelled null check guards. 637 Figure~\ref{fig:guard1} shows the example graph at a later compilation phase when the field store nodes are lowered to memory store nodes and explicitely modelled null check guards.
621 The guards are attached to anchor instructions that delimit their possible schedule. 638 The guards are attached to anchor nodes that delimit their possible schedule.
622 The first guard must not be moved outside the \texttt{if} block; the second guard may be moved before the \texttt{If} instruction, because at this point it is already guaranteed that the second store is executed. 639 The first guard must not be moved outside the \texttt{if} block; the second guard may be moved before the \texttt{If} node, because at this point it is already guaranteed that the second store is executed.
623 640
624 \begin{figure}[ht] 641 \begin{figure}[ht]
625 \centering 642 \centering
626 \begin{digraphenv}{scale=0.5}{guard1} 643 \begin{digraphenv}{scale=0.5}{guard1}
627 \textnode{entry}{Entry} 644 \textnode{entry}{Entry}
633 \node{cmpnull}{NonNull} 650 \node{cmpnull}{NonNull}
634 \textnode{p}{p} 651 \textnode{p}{p}
635 \textnode{const0}{0} 652 \textnode{const0}{0}
636 \nodeguard{guard1}{Guard} 653 \nodeguard{guard1}{Guard}
637 \nodeguard{guard2}{Guard} 654 \nodeguard{guard2}{Guard}
638 \nodetrisplit{store1}{MemStore 16 (int)} 655 \nodetrisplit{store1}{MemStore 16 (4 bytes)}
639 \nodetrisplit{store2}{MemStore 20 (int)} 656 \nodetrisplit{store2}{MemStore 20 (4 bytes)}
640 \nodeframestate{fs1}{FrameState} 657 \nodeframestate{fs1}{FrameState}
641 \nodeframestate{fs2}{FrameState} 658 \nodeframestate{fs2}{FrameState}
642 \data{store1:in1}{p} 659 \data{store1:in1}{p}
643 \data{store2:in1}{p} 660 \data{store2:in1}{p}
644 \data{store1:in2}{const0} 661 \data{store1:in2}{const0}
664 \end{digraphenv} 681 \end{digraphenv}
665 \caption{A load guarded by a null check guard.} 682 \caption{A load guarded by a null check guard.}
666 \label{fig:guard1} 683 \label{fig:guard1}
667 \end{figure} 684 \end{figure}
668 685
669 The first guard can be easily removed, because it is guarded by an \texttt{If} instruction that checks the same condition. 686 The first guard can be easily removed, because it is guarded by an \texttt{If} node that checks the same condition.
670 Therefore we can remove the guard and the anchor from the graph and this gives us the graph shown in Figure \ref{fig:guard2}. 687 Therefore we can remove the guard and the anchor from the graph and this gives us the graph shown in Figure \ref{fig:guard2}.
671 688
672 There is another optimization for guard instructions: If two guards that are anchored to the true and false branch of the same \texttt{If} instruction have the same condition, they can be merged, so that the resulting guard is anchored at the most distant node of which the \texttt{If} instruction is a postdominator. 689 There is another optimization for guard nodes: If two guards that are anchored to the true and false branch of the same \texttt{If} node have the same condition, they can be merged, so that the resulting guard is anchored at the most distant node of which the \texttt{If} node is a postdominator.
673 690
674 691
675 \begin{figure}[ht] 692 \begin{figure}[ht]
676 \centering 693 \centering
677 \begin{digraphenv}{scale=0.5}{guard2} 694 \begin{digraphenv}{scale=0.5}{guard2}
682 \node{return}{Return} 699 \node{return}{Return}
683 \node{cmpnull}{NonNull} 700 \node{cmpnull}{NonNull}
684 \textnode{p}{p} 701 \textnode{p}{p}
685 \textnode{const0}{0} 702 \textnode{const0}{0}
686 \nodeguard{guard2}{Guard} 703 \nodeguard{guard2}{Guard}
687 \nodetrisplit{store1}{MemStore 16 (int)} 704 \nodetrisplit{store1}{MemStore 16 (4 bytes)}
688 \nodetrisplit{store2}{MemStore 20 (int)} 705 \nodetrisplit{store2}{MemStore 20 (4 bytes)}
689 \nodeframestate{fs1}{FrameState} 706 \nodeframestate{fs1}{FrameState}
690 \nodeframestate{fs2}{FrameState} 707 \nodeframestate{fs2}{FrameState}
691 \data{store1:in1}{p} 708 \data{store1:in1}{p}
692 \data{store2:in1}{p} 709 \data{store2:in1}{p}
693 \data{store1:in2}{const0} 710 \data{store1:in2}{const0}
712 \end{figure} 729 \end{figure}
713 730
714 The remaining guard can now be moved above the \texttt{If} condition and be used to eliminate the need for the \texttt{If} node. 731 The remaining guard can now be moved above the \texttt{If} condition and be used to eliminate the need for the \texttt{If} node.
715 From this point on, the guard can however no longer be moved below the first memory store. 732 From this point on, the guard can however no longer be moved below the first memory store.
716 We use a control dependency from the guard to the field store to express this condition. 733 We use a control dependency from the guard to the field store to express this condition.
717 The link between the second store and the guard and the control flow merge instruction is no longer necessary. 734 The link between the second store and the guard and the control flow merge node is no longer necessary.
718 735
719 \begin{figure}[ht] 736 \begin{figure}[ht]
720 \centering 737 \centering
721 \begin{digraphenv}{scale=0.5}{guard3} 738 \begin{digraphenv}{scale=0.5}{guard3}
722 \textnode{entry}{Entry} 739 \textnode{entry}{Entry}
724 \node{return}{Return} 741 \node{return}{Return}
725 \node{cmpnull}{NonNull} 742 \node{cmpnull}{NonNull}
726 \textnode{p}{p} 743 \textnode{p}{p}
727 \textnode{const0}{0} 744 \textnode{const0}{0}
728 \nodeguard{guard2}{Guard} 745 \nodeguard{guard2}{Guard}
729 \nodetrisplit{store1}{MemStore 16 (int)} 746 \nodetrisplit{store1}{MemStore 16 (4 bytes)}
730 \nodetrisplit{store2}{MemStore 20 (int)} 747 \nodetrisplit{store2}{MemStore 20 (4 bytes)}
731 \nodeframestate{fs1}{FrameState} 748 \nodeframestate{fs1}{FrameState}
732 \nodeframestate{fs2}{FrameState} 749 \nodeframestate{fs2}{FrameState}
733 \data{store1:in1}{p} 750 \data{store1:in1}{p}
734 \data{store2:in1}{p} 751 \data{store2:in1}{p}
735 \data{store1:in2}{const0} 752 \data{store1:in2}{const0}
764 \node{return}{Return} 781 \node{return}{Return}
765 \node{cmpnull}{NonNull} 782 \node{cmpnull}{NonNull}
766 \textnode{p}{p} 783 \textnode{p}{p}
767 \textnode{const0}{0} 784 \textnode{const0}{0}
768 \nodeguard{guard2}{Guard} 785 \nodeguard{guard2}{Guard}
769 \nodetrisplit{store1}{MemStore 16 (int)} 786 \nodetrisplit{store1}{MemStore 16 (4 bytes)}
770 \nodetrisplit{store2}{MemStore 20 (int)} 787 \nodetrisplit{store2}{MemStore 20 (4 bytes)}
771 \data{store1:in1}{p} 788 \data{store1:in1}{p}
772 \data{store2:in1}{p} 789 \data{store2:in1}{p}
773 \data{store1:in2}{const0} 790 \data{store1:in2}{const0}
774 \data{store2:in2}{const0} 791 \data{store2:in2}{const0}
775 \data{store2:in3}{guard2} 792 \data{store2:in3}{guard2}
799 \node{return}{Return} 816 \node{return}{Return}
800 \node{cmpnull}{NonNull} 817 \node{cmpnull}{NonNull}
801 \textnode{p}{p} 818 \textnode{p}{p}
802 \textnode{const0}{0} 819 \textnode{const0}{0}
803 \nodeguard{guard2}{Guard} 820 \nodeguard{guard2}{Guard}
804 \nodetrisplit{store1}{MemStore 16 (long)} 821 \nodetrisplit{store1}{MemStore 16 (8 bytes)}
805 \data{store1:in1}{p} 822 \data{store1:in1}{p}
806 \data{store1:in2}{const0} 823 \data{store1:in2}{const0}
807 \data{guard2:in1}{anchor1} 824 \data{guard2:in1}{anchor1}
808 \data{guard2:in2}{cmpnull} 825 \data{guard2:in2}{cmpnull}
809 \control{guard2}{store1} 826 \control{guard2}{store1}
814 \end{digraphenv} 831 \end{digraphenv}
815 \caption{After coalescing the two memory stores.} 832 \caption{After coalescing the two memory stores.}
816 \label{fig:guard5} 833 \label{fig:guard5}
817 \end{figure} 834 \end{figure}
818 835
819 A memory store that immediately follows a null check guard instruction on the same object, can be combined into a store with an implicit null check (that deoptimizes instead of throwing the exception). 836 A memory store that immediately follows a null check guard node on the same object, can be combined into a store with an implicit null check (that deoptimizes instead of throwing the exception).
820 Therefore, we can remove the guard again and also the anchor is no longer necessary. 837 Therefore, we can remove the guard again and also the anchor is no longer necessary.
821 Figure~\ref{fig:guard6} shows now that fully optimized graph that is generated for Listing~\ref{lst:guard1}. 838 Figure~\ref{fig:guard6} shows now that fully optimized graph that is generated for Listing~\ref{lst:guard1}.
822 839
823 \begin{figure}[ht] 840 \begin{figure}[ht]
824 \centering 841 \centering
825 \begin{digraphenv}{scale=0.5}{guard6} 842 \begin{digraphenv}{scale=0.5}{guard6}
826 \textnode{entry}{Entry} 843 \textnode{entry}{Entry}
827 \node{return}{Return} 844 \node{return}{Return}
828 \textnode{p}{p} 845 \textnode{p}{p}
829 \textnode{const0}{0} 846 \textnode{const0}{0}
830 \nodetrisplit{store1}{DeoptimizingMemStore 16 (long)} 847 \nodetrisplit{store1}{DeoptimizingMemStore 16 (8 bytes)}
831 \data{store1:in1}{p} 848 \data{store1:in1}{p}
832 \data{store1:in2}{const0} 849 \data{store1:in2}{const0}
833 \control{entry}{store1} 850 \control{entry}{store1}
834 \controllabel{store1:succ1}{return} 851 \controllabel{store1:succ1}{return}
835 \end{digraphenv} 852 \end{digraphenv}
839 856
840 857
841 \section{Conclusions} 858 \section{Conclusions}
842 \label{sec:conclusions} 859 \label{sec:conclusions}
843 This document sketched the strategy for the Graph compiler. 860 This document sketched the strategy for the Graph compiler.
844 We already reached M1 (as defined in Section~\ref{sec:mile}) and have the following plans for M2 to M4: 861 We have the following plans for M1 to M4:
845 \begin{description} 862 \begin{description}
863 \item[M1:] May 15th, 2011
846 \item[M2:] June 30th, 2011 864 \item[M2:] June 30th, 2011
847 \item[M3:] August 15th, 2011 865 \item[M3:] August 15th, 2011
848 \item[M4:] September 30th, 2011 866 \item[M4:] September 30th, 2011
849 \end{description} 867 \end{description}
850 After we reach M4, we want to create a new project road map that further improves the Graal compiler with respect to its two main goals: Modularity and peak performance. 868 After we reach M4, we want to create a new project road map that further improves the Graal compiler with respect to its two main goals: Modularity and peak performance.