comparison doc/design/graal_compiler.tex @ 2590:d559fac49699

More work on doc.
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Thu, 05 May 2011 15:05:40 +0200
parents 795df30f979d
children dac9bcc6bd4a
comparison
equal deleted inserted replaced
2589:795df30f979d 2590:d559fac49699
130 \end{itemize} 130 \end{itemize}
131 131
132 \subsubsection{The Node Data Structure} 132 \subsubsection{The Node Data Structure}
133 \begin{itemize} 133 \begin{itemize}
134 \item Each node is always associated with a graph. 134 \item Each node is always associated with a graph.
135 \item Each node has an immutable id which is unique within its associated graph. \cw{The server compiler supports ``renumbering'' of nodes to make the ids dense again after large graph manipulations that deleted many nodes.} 135 \item Each node has an immutable id which is unique within its associated graph.
136 \item Nodes represent either operations on values or control flow operations. 136 \item Nodes represent either operations on values or control flow operations.
137 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes. 137 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes.
138 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes. 138 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes.
139 \item Nodes can only have data and control dependencies to nodes which belong to the same graph. 139 \item Nodes can only have data and control dependencies to nodes which belong to the same graph.
140 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd). 140 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd).
141 \cw{I don't like that item. Cycles are a normal thing for control flow and for phi functions. I would phrase it as something like that: Nodes can only have data and control dependencies to nodes that are dominators. The only exception of that are control loop headers and phi functions} 141 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
142 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers (notably the HotSpot client compiler \cw{Wrong: the client compiler only has a fixed order of pinned instructions, most instructions are not pinned and can be moved around freely}) always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
143 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions: 142 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions:
144 \begin{itemize} 143 \begin{itemize}
145 \item \emph{inputs} are all nodes that this node has data dependencies on. 144 \item \emph{inputs} are all nodes that this node has data dependencies on.
146 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs. 145 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs.
147 \item \emph{successors} are all nodes that have a control dependency on this node. 146 \item \emph{successors} are all nodes that have a control dependency on this node.
148 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors. 147 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors.
149 \end{itemize} 148 \end{itemize}
150 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors. 149 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
151 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc. 150 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc.
152 \item Nodes cannot be reassigned to another graph, they are cloned instead \cw{Why should there be the need for more than one graph when compiling a method?} 151 \item Inlining should always be performed as a combination of two graphs.
152 \item Nodes cannot be reassigned to another graph, they are cloned instead.
153 \end{itemize} 153 \end{itemize}
154 154
155 \subsection{Loops} 155 \subsection{Loops}
156 \label{sec:loops} 156 \label{sec:loops}
157 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases. 157 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases.
185 \caption{A simple loop with two exits.} 185 \caption{A simple loop with two exits.}
186 \end{figure} 186 \end{figure}
187 187
188 \subsection{Loop Phis} 188 \subsection{Loop Phis}
189 Data flow in loops is modelled with special phi nodes at the beginning and the end of the loop. 189 Data flow in loops is modelled with special phi nodes at the beginning and the end of the loop.
190 The \nodename{LoopEnd} node merges every value that is flows into the next loop iteration in associated \nodename{LoopEndPhi} nodes. 190 The \nodename{LoopEnd} node merges every value that flows into the next loop iteration in associated \nodename{LoopEndPhi} nodes.
191 A corresponding \nodename{LoopBeginPhi} node that is associated with the loop header has a control flow dependency on the \nodename{LoopEndPhi} node. 191 A corresponding \nodename{LoopBeginPhi} node that is associated with the loop header has a control flow dependency on the \nodename{LoopEndPhi} node.
192 Figure \ref{fig:loop2} shows how a simple counting loop is modelled in the graph. 192 Figure \ref{fig:loop2} shows how a simple counting loop is modelled in the graph.
193 193
194 \begin{figure}[h] 194 \begin{figure}[h]
195 \label{fig:loop2} 195 \label{fig:loop2}
340 \item Schedules the compilation phases. 340 \item Schedules the compilation phases.
341 \item Implementation of the \emph{compiler interface} (CI). 341 \item Implementation of the \emph{compiler interface} (CI).
342 \item Implements the final compilation phase that produces the low-level representation. 342 \item Implements the final compilation phase that produces the low-level representation.
343 \item Machine code creation, including debug info. 343 \item Machine code creation, including debug info.
344 \end{itemize} 344 \end{itemize}
345 \cw{So you want to keep the backend as part of the ``main compiler'' at first. Seems OK for me.}
346 \end{description} 345 \end{description}
347 346
348 347
349 \subsection{Frame States} 348 \subsection{Frame States}
350 Frame states capture the state of the program, in terms of the source representation (e.g., Java state: local variables, expressions, ...). 349 A frame state captures the state of the program in terms of the Java bytecode specification (i.e., the values of the local variables, the operand stack, and the locked monitors).
351 Whenever a safepoint is reached or \cw{why is that an ``or'', both is basically the same} \ls{uncommon traps can be introduced at other points, e.g., at an if branch that isn't compiled} a deoptimization is needed a valid frame state needs to be available. 350 Every deoptimization point needs a valid frame state.
352 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.). 351 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.).
353 Thus, frame states need only be generated for bytecodes that cannot be reexecuted: 352 Thus, frame states need only be generated for bytecodes that cannot be reexecuted:
354 353
355 \begin{itemize} 354 \begin{itemize}
356 \item Array stores: {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE} 355 \item Array stores: {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE}
382 \caption{Simple example using two frame states.} 381 \caption{Simple example using two frame states.}
383 \end{figure} 382 \end{figure}
384 383
385 FrameStates also have data dependencies on the contents of the state: the local variables and the expression stack. 384 FrameStates also have data dependencies on the contents of the state: the local variables and the expression stack.
386 385
387 \subsection{Deoptimization and Uncommon Traps} 386 \subsection{Traps}
388 Uncommon trap nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state. 387 A trap node is a node that deoptimizes based on a conditional expression.
388 Trap nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state.
389 The node that is guarded by the deoptimization has a data dependency on the trap, and the trap in turn has a data dependency on the condition and on the most distant node that is postdominated by the guarded node. 389 The node that is guarded by the deoptimization has a data dependency on the trap, and the trap in turn has a data dependency on the condition and on the most distant node that is postdominated by the guarded node.
390 390
391 \begin{figure}[h] 391 \begin{figure}[h]
392 \label{fig:trap1} 392 \label{fig:trap1}
393 \centering 393 \centering
417 \data{cmpnull}{o2} 417 \data{cmpnull}{o2}
418 \datalabel{trap:in2}{cmpnull} 418 \datalabel{trap:in2}{cmpnull}
419 \datalabel{load2:in1}{trap} 419 \datalabel{load2:in1}{trap}
420 \datalabel{trap:in1}{split2} 420 \datalabel{trap:in1}{split2}
421 \end{digraphenv} 421 \end{digraphenv}
422 \caption{In this example, the second load is guarded by an uncommon trap, because its receiver might be null (the receiver of the load is assumed to be non-null). 422 \caption{In this example, the second load is guarded by a trap, because its receiver might be null (the receiver of the first load is assumed to be non-null).
423 The trap is anchored to the control split, because as soon as this node is executed the second load must be executed as well. 423 The trap is anchored to the control split, because as soon as this node is executed the second load must be executed as well.
424 In the final scheduling the trap can be placed before or after the first load.} 424 In the final scheduling the trap can be placed before or after the first load.}
425 \end{figure} 425 \end{figure}
426 426
427 Another type of uncommon trap is a guard, which is used to remove branches that have a very low execution frequency from the compiled code. 427 Another type of trap is a guard, which is used to remove branches that have a very low execution frequency from the compiled code.
428 428
429 \begin{figure}[h] 429 \begin{figure}[h]
430 \label{fig:trap2} 430 \label{fig:trap2}
431 \centering 431 \centering
432 \begin{digraphenv}{scale=0.5}{trap2} 432 \begin{digraphenv}{scale=0.5}{trap2}
443 \nodetrap{trap}{Trap} 443 \nodetrap{trap}{Trap}
444 \datalabel{trap:in1}{start2} 444 \datalabel{trap:in1}{start2}
445 \datalabel{trap:in2}{start3} 445 \datalabel{trap:in2}{start3}
446 \data{guard}{trap} 446 \data{guard}{trap}
447 \end{digraphenv} 447 \end{digraphenv}
448 \caption{In this example the If from the previous example was replaced by a guard and an uncommon trap. 448 \caption{In this example an If was replaced by a guard and a trap.
449 The guard takes the place of the If in the control flow, and is connected to the trap node. 449 The guard takes the place of the If in the control flow, and is connected to the trap node.
450 The uncommon trap is now anchored to the most distant node of which the If was a postdominator.} 450 The trap is now anchored to the most distant node of which the If was a postdominator.}
451 \end{figure} 451 \end{figure}
452 452
453 At some point during the compilation, trap nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that they cannot move outside the scope of the associated frame state. 453 At some point during the compilation, trap nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that they cannot move outside the scope of the associated frame state.
454 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations. 454 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations.
455 A simple algorithm for this removal of FrameStates would be to move all traps as far upwards as possible. 455 A simple algorithm for this removal of FrameStates would be to move all traps as far upwards as possible.
491 \caption{Two loads guarded by the same Trap.} 491 \caption{Two loads guarded by the same Trap.}
492 \end{figure} 492 \end{figure}
493 493
494 Also, if two Traps that are anchored to the true and false branch of the same If have the same condition, they can be merged, so that the resulting Trap is anchored at the most distant node of which the If is a postdominator. 494 Also, if two Traps that are anchored to the true and false branch of the same If have the same condition, they can be merged, so that the resulting Trap is anchored at the most distant node of which the If is a postdominator.
495 495
496 %Frame states should be represented using a delta-encoding.
497 %This will make them significantly smaller and will make inlining, etc. much easier.
498 %In later compilation phases unnecessary frame states might be removed, using a mark-and-merge algorithm.
499
500 \subsection{Graph Building}
501 \begin{itemize}
502 \item The graph built by the initial parser (called \emph{GraphBuilder}) should be as close to the source representation (bytecode, ...) as possible.
503 \item All nodes should be able to immediately lower themselves to a machine-level representation. \cw{What is that? You mean every node has x86 specific code that spits out machine code? Hope you are joking...} This allows for easier compiler development, and also leads to a compiler that is very flexible in the amount of optimizations it performs (e.g. recompilation of methods with more aggressive optimizations).
504 \end{itemize}
505
506 \subsection{Graphical Representation} 496 \subsection{Graphical Representation}
507 The graphs in this document use the following node layout: 497 The graphs in this document use the following node layout:
508 498
509 \begin{digraphenv}{scale=0.5}{graphrep1} 499 \begin{digraphenv}{scale=0.5}{graphrep1}
510 \node{node1}{nop} 500 \node{node1}{nop}
511 \nodebi{node2}{+} 501 \nodebi{node2}{+}
512 \nodetri{node3}{phi} 502 \nodetri{node3}{phi}
513 \nodesplit{node4}{if} 503 \nodesplit{node4}{if}
514 \end{digraphenv} 504 \end{digraphenv}
515
516 \cw{That doesn't compile with my latex. What do I have to do to get it working?}
517 \ls{graphviz needs to be installed, and pdflatex needs to be started with -shell-escape}
518 505
519 Red arrows always represents control dependencies, while black arrows represent data dependencies: 506 Red arrows always represents control dependencies, while black arrows represent data dependencies:
520 507
521 \begin{digraphenv}{scale=0.5}{graphrep2} 508 \begin{digraphenv}{scale=0.5}{graphrep2}
522 \node{a}{a} 509 \node{a}{a}
528 \controllabel{if:succ2}{add} 515 \controllabel{if:succ2}{add}
529 \datalabel{add:in1}{a} 516 \datalabel{add:in1}{a}
530 \datalabel{add:in2}{b} 517 \datalabel{add:in2}{b}
531 \end{digraphenv} 518 \end{digraphenv}
532 519
533
534
535 \end{document} 520 \end{document}