comparison doc/design/graal_compiler.tex @ 2597:08cda6637103

More doc + conclusion.
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Thu, 05 May 2011 15:55:11 +0200
parents dac9bcc6bd4a
children e4395464810e
comparison
equal deleted inserted replaced
2591:dac9bcc6bd4a 2597:08cda6637103
98 This enables front-ends for different languages (e.g., Ruby) to provide their own graph. 98 This enables front-ends for different languages (e.g., Ruby) to provide their own graph.
99 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase. 99 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase.
100 \end{description} 100 \end{description}
101 101
102 \section{Milestones} 102 \section{Milestones}
103 \label{sec:mile}
103 The compiler is developed starting from the current C1X source code base. 104 The compiler is developed starting from the current C1X source code base.
104 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks. 105 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks.
105 We define the following development milestones and when they are considered achieved: 106 We define the following development milestones and when they are considered achieved:
106 \begin{description} 107 \begin{description}
107 \item[M1:] We have a fully working Graal~VM version with a stripped down C1X compiler that does not perform any optimizations. 108 \item[M1:] We have a fully working Graal~VM version with a stripped down C1X compiler that does not perform any optimizations.
115 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the compiler graph. 116 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the compiler graph.
116 \item Improvements for peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback). 117 \item Improvements for peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback).
117 \item Implementation of a prototype front-end for different languages, e.g., JavaScript. 118 \item Implementation of a prototype front-end for different languages, e.g., JavaScript.
118 \end{itemize} 119 \end{itemize}
119 120
120 \section{Implementation} 121 \section{Graph}
121 122
122 \subsection{Nodes and Graphs} 123 The \emph{intermediate representation}~(IR) of the compiler is designed as a directed graph.
123 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR). 124 A graph deals out ids for new nodes and can be queried for the node corresponding to a given id as well as for an unordered list of nodes of the graph.
124 The IR used in the compiler was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing. 125 Graphs can manage side data structures, which will be automatically invalidated and lazily recomputed whenever the graph changes. Examples for side data structures are dominator trees and temporary schedules. These side data structures will usually be understood by more than one optimization.
125 126
126 \subsubsection{The Graph Data Structure} 127 The nodes of the graph have the following properties:
127 \begin{itemize} 128 \begin{itemize}
128 \item A graph deals out ids for new nodes and can be queried for the node corresponding to a given id. 129 \item Each node is always associated with a single graph and this association is immutable.
129 \item Graphs can manage side data structures, which will be automatically invalidated and lazily recomputed whenever the graph changes. Examples for side data structures are dominator trees and temporary schedules. These side data structures will usually be understood by more than one optimization. 130 \item Each node has an immutable id that is unique within its associated graph.
130 \end{itemize}
131
132 \subsubsection{The Node Data Structure}
133 \begin{itemize}
134 \item Each node is always associated with a graph.
135 \item Each node has an immutable id which is unique within its associated graph.
136 \item Nodes represent either operations on values or control flow operations.
137 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes. 131 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes.
138 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes. 132 \item Nodes can have a control flow dependency, which means that the execution of one node will be followed by the execution of another node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes.
139 \item Nodes can only have data and control dependencies to nodes which belong to the same graph. 133 \item Nodes can only have data and control dependencies to nodes which belong to the same graph.
140 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd). 134 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (see Section~\ref{sec:loops}).
141 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated. 135 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. This gives the compiler flexibility for the possible scheduling of a node and therefore wriggle room for optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
142 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions: 136 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions (see Figure~\ref{fig:directions}):
143 \begin{itemize} 137 \begin{itemize}
144 \item \emph{inputs} are all nodes that this node has data dependencies on. 138 \item \emph{inputs} are all nodes that this node has data dependencies on.
145 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs. 139 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs.
146 \item \emph{successors} are all nodes that have a control dependency on this node. 140 \item \emph{successors} are all nodes that have a control dependency on this node.
147 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors. 141 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors.
148 \end{itemize} 142 \end{itemize}
149 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors. 143 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
150 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc. 144 \item Every node must be able to support cloning and serialization.
151 \item Inlining should always be performed as a combination of two graphs. 145 \item Inlining should always be performed as embedding one graph into another graph.
152 \item Nodes cannot be reassigned to another graph, they are cloned instead. 146 \item Nodes cannot be reassigned to another graph, they are cloned instead.
147 \item The edges of a node also define \textit{happens-before} and \textit{happens-after} relationships as shown in Figure~\ref{fig:directions}.
153 \end{itemize} 148 \end{itemize}
154 149
155 \subsection{Loops} 150 \begin{figure}[h]
151 \label{fig:directions}
152 \centering
153 \begin{digraphenv}{scale=0.5}{graphdirections}
154 \node{node1}{Node}
155 \textnode{inputs}{inputs}
156 \textnode{usages}{usages}
157 \textnode{successors}{successors}
158 \textnode{predecessors}{predecessors}
159 \data{node1}{inputs}
160 \control{node1}{successors}
161 \data{usages}{node1}
162 \control{predecessors}{node1}
163 \node{node2}{Node}
164 \textnode{before}{happens before}
165 \textnode{after}{happens after}
166 \data{node2}{before}
167 \control{node2}{after}
168 \data{after}{node2}
169 \control{before}{node2}
170 \end{digraphenv}
171 \caption{A node and its edges.}
172 \end{figure}
173
174 \subsection{Project Source Structure}
175 In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.graal}).
176
177 \begin{description}
178 \item[graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes.
179 \item[nodes] contains the implementation of known basic nodes (e.g., phi nodes, control flow nodes, \ldots).
180 Additional node classes should go into seperate projects and be specializations of the known basic nodes.]
181 \item[java] contains code for building graphs from Java bytecodes and Java-specific nodes.
182 \item[opt] contains optimizations such as global value numbering or conditional constant propagation.
183 \item[compiler] contains the compiler, including:
184 \begin{itemize}
185 \item Schedules the compilation phases.
186 \item Implementation of the \emph{compiler interface} (CI).
187 \item Implements the final compilation phase that produces the low-level representation.
188 \item Machine code creation, including debug info.
189 \end{itemize}
190 \end{description}
191
192 \section{Loops}
156 \label{sec:loops} 193 \label{sec:loops}
157 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases. 194 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases.
158 We only compile methods with a control flow where every loop has only one single entry point. 195 We only compile methods with a control flow where every loop has only one single entry point.
159 This entry point is a \nodename{LoopBegin} node. 196 This entry point is a \nodename{LoopBegin} node.
160 This node is connected to a \nodename{LoopEnd} node that merges all control flow paths that do not exit the loop. 197 This node is connected to a \nodename{LoopEnd} node that merges all control flow paths that do not exit the loop.
324 \data{Vector}{n} 361 \data{Vector}{n}
325 \end{digraphenv} 362 \end{digraphenv}
326 \caption{Graph after bounded loop transformation.} 363 \caption{Graph after bounded loop transformation.}
327 \end{figure} 364 \end{figure}
328 365
329 \subsection{Project Source Structure} 366
330 In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.graal}). 367 \section{Frame States}
331
332 \begin{description}
333 \item[graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes.
334 \item[nodes] contains the implementation of known basic nodes (e.g., phi nodes, control flow nodes, \ldots).
335 Additional node classes should go into seperate projects and be specializations of the known basic nodes.]
336 \item[java] contains code for building graphs from Java bytecodes and Java-specific nodes.
337 \item[opt] contains optimizations such as global value numbering or conditional constant propagation.
338 \item[compiler] contains the compiler, including:
339 \begin{itemize}
340 \item Schedules the compilation phases.
341 \item Implementation of the \emph{compiler interface} (CI).
342 \item Implements the final compilation phase that produces the low-level representation.
343 \item Machine code creation, including debug info.
344 \end{itemize}
345 \end{description}
346
347
348 \subsection{Frame States}
349 A frame state captures the state of the program in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors). 368 A frame state captures the state of the program in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors).
350 Every deoptimization point needs a valid frame state. 369 Every deoptimization point needs a valid frame state.
351 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.). 370 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.).
352 Thus, frame states need only be generated for bytecodes that cannot be reexecuted: 371 Thus, frame states need only be generated for bytecodes that cannot be reexecuted:
353 372
379 \controllabel{store2:succ2}{fs2} 398 \controllabel{store2:succ2}{fs2}
380 \end{digraphenv} 399 \end{digraphenv}
381 \caption{Simple example using two frame states.} 400 \caption{Simple example using two frame states.}
382 \end{figure} 401 \end{figure}
383 402
384 FrameStates also have data dependencies on the contents of the state: the local variables and the expression stack. 403 Frame states also have data dependencies on the contents of the state: the local variables and the expression stack.
385 404
386 \subsection{Traps} 405 \subsection{Traps}
387 A trap node is a node that deoptimizes based on a conditional expression. 406 A trap node is a node that deoptimizes based on a conditional expression.
388 Trap nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state. 407 Trap nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state.
389 The node that is guarded by the deoptimization has a data dependency on the trap, and the trap in turn has a data dependency on the condition and on the most distant node that is postdominated by the guarded node. 408 The node that is guarded by the deoptimization has a data dependency on the trap, and the trap in turn has a data dependency on the condition and on the most distant node that is postdominated by the guarded node.
450 The trap is now anchored to the most distant node of which the If was a postdominator.} 469 The trap is now anchored to the most distant node of which the If was a postdominator.}
451 \end{figure} 470 \end{figure}
452 471
453 At some point during the compilation, trap nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that they cannot move outside the scope of the associated frame state. 472 At some point during the compilation, trap nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that they cannot move outside the scope of the associated frame state.
454 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations. 473 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations.
455 A simple algorithm for this removal of FrameStates would be to move all traps as far upwards as possible. 474 A simple algorithm for this removal of frame states would be to move all traps as far upwards as possible.
456 475
457 476 Multiple traps with the same condition and anchor can be merged:
458 Multiple Traps with the same condition and anchor can be merged:
459 477
460 \begin{figure}[h] 478 \begin{figure}[h]
461 \label{fig:trap3} 479 \label{fig:trap3}
462 \centering 480 \centering
463 \begin{digraphenv}{scale=0.5}{trap3} 481 \begin{digraphenv}{scale=0.5}{trap3}
486 \datalabel{trap:in2}{cmpnull} 504 \datalabel{trap:in2}{cmpnull}
487 \datalabel{load2:in1}{trap} 505 \datalabel{load2:in1}{trap}
488 \datalabel{load1:in1}{trap} 506 \datalabel{load1:in1}{trap}
489 \datalabel{trap:in1}{split2} 507 \datalabel{trap:in1}{split2}
490 \end{digraphenv} 508 \end{digraphenv}
491 \caption{Two loads guarded by the same Trap.} 509 \caption{Two loads guarded by the same trap.}
492 \end{figure} 510 \end{figure}
493 511
494 Also, if two Traps that are anchored to the true and false branch of the same If have the same condition, they can be merged, so that the resulting Trap is anchored at the most distant node of which the If is a postdominator. 512 Also, if two traps that are anchored to the true and false branch of the same \texttt{If} node have the same condition, they can be merged, so that the resulting trap is anchored at the most distant node of which the \texttt{If} node is a postdominator.
495 513
496 \subsection{Graphical Representation} 514 \section{Conclusions}
497 The graphs in this document use the following node layout: 515 This document sketched the strategy for the Graph compiler.
498 516 We already reached M1 (as defined in Section~\ref{sec:mile}) and have the following plans for M2 to M4:
499 \begin{digraphenv}{scale=0.5}{graphrep1} 517 \begin{description}
500 \node{node1}{nop} 518 \item[M2:] June 30th, 2011
501 \nodebi{node2}{+} 519 \item[M3:] August 15th, 2011
502 \nodetri{node3}{phi} 520 \item[M4:] September 30th, 2011
503 \nodesplit{node4}{if} 521 \end{description}
504 \end{digraphenv} 522 After we reached M4, we want to create a new project road map that further improves the Graal compiler with respect to its two main goals: Modularity and peak performance.
505
506 Red arrows always represents control dependencies, while black arrows represent data dependencies:
507
508 \begin{digraphenv}{scale=0.5}{graphrep2}
509 \node{a}{a}
510 \node{b}{b}
511 \nodesplit{if}{if}
512 \node{nop}{nop}
513 \nodebi{add}{+}
514 \controllabel{if:succ1}{nop}
515 \controllabel{if:succ2}{add}
516 \datalabel{add:in1}{a}
517 \datalabel{add:in2}{b}
518 \end{digraphenv}
519 523
520 \end{document} 524 \end{document}