comparison doc/design/graal_compiler.tex @ 2589:795df30f979d

Refer to "Graal compiler" as "the compiler" in the design document.
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Thu, 05 May 2011 14:03:49 +0200
parents 26ecba0ead6d
children d559fac49699
comparison
equal deleted inserted replaced
2588:26ecba0ead6d 2589:795df30f979d
51 \subtitle{Design and Strategy \\ \textcolor{red}{work in progress}} 51 \subtitle{Design and Strategy \\ \textcolor{red}{work in progress}}
52 52
53 \maketitle 53 \maketitle
54 54
55 \abstract{ 55 \abstract{
56 The Graal compiler aims at improving C1X, the Java port of the HotSpot client compiler, both in terms of modularity and peak performance. 56 The Graal compiler (simply referred to as \emph{the compiler} in the rest of this document) aims at improving C1X, the Java port of the HotSpot client compiler, both in terms of modularity and peak performance.
57 The compiler should work with the Maxine VM and the HotSpot VM. 57 The compiler should work with the Maxine VM and the HotSpot VM.
58 This document contains information about the proposed design and strategy for developing the Graal compiler.} 58 This document contains information about the proposed design and strategy for developing the compiler.}
59 59
60 \section{Context} 60 \section{Context}
61 61
62 In 2009, the Maxine team started with creating C1X, a Java port of the HotSpot client compiler, and integrate it into the Maxine VM. 62 In 2009, the Maxine team started with creating C1X, a Java port of the HotSpot client compiler, and integrate it into the Maxine VM.
63 Part of this effort, was the development of a clear and clean compiler-runtime interface that allows the separation of the compiler and the VM that enables the use of one compiler for multiple VMs. 63 Part of this effort, was the development of a clear and clean compiler-runtime interface that allows the separation of the compiler and the VM that enables the use of one compiler for multiple VMs.
64 In June 2010, we started integrating C1X into the HotSpot VM and we called the resulting system Graal~VM. 64 In June 2010, we started integrating C1X into the HotSpot VM and we called the resulting system Graal~VM.
65 Currently, the Graal~VM is fully functional and runs benchmarks (SciMark, DaCapo) at a similar speed to the HotSpot client compiler. 65 Currently, the Graal~VM is fully functional and runs benchmarks (SciMark, DaCapo) at a similar speed to the HotSpot client compiler.
66 66
67 \section{Goals} 67 \section{Goals}
68 The Graal compiler effort aims at rewriting the high-level intermediate representation of C1X with two main goals: 68 The compiler effort aims at rewriting the high-level intermediate representation of C1X with two main goals:
69 \begin{description} 69 \begin{description}
70 \item[Modularity:] A modular design of the compiler should simplify the implementation of new languages, new back-ends, and new optimizations. 70 \item[Modularity:] A modular design of the compiler should simplify the implementation of new languages, new back-ends, and new optimizations.
71 \item[Peak Performance:] A more powerful intermediate representation should enable the implementation of heavy-weight optimizations that impact the peak performance of the resulting machine code. 71 \item[Peak Performance:] A more powerful intermediate representation should enable the implementation of heavy-weight optimizations that impact the peak performance of the resulting machine code.
72 \end{description} 72 \end{description}
73 73
74 \section{Design} 74 \section{Design}
75 For the implementation of the Graal compiler, we rely on the following design decisions: 75 For the implementation of the compiler, we rely on the following design decisions:
76 \begin{description} 76 \begin{description}
77 \item[Graph Representation:] 77 \item[Graph Representation:]
78 The compiler's intermediate representation is modeled as a graph with nodes that are connected with directed edges. 78 The compiler's intermediate representation is modeled as a graph with nodes that are connected with directed edges.
79 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime. 79 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime.
80 Every node is serializable and has an id that is unique within its graph. 80 Every node is serializable and has an id that is unique within its graph.
98 This enables front-ends for different languages (e.g., Ruby) to provide their own graph. 98 This enables front-ends for different languages (e.g., Ruby) to provide their own graph.
99 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase. 99 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase.
100 \end{description} 100 \end{description}
101 101
102 \section{Milestones} 102 \section{Milestones}
103 The Graal compiler is developed starting from the current C1X source code base. 103 The compiler is developed starting from the current C1X source code base.
104 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks. 104 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks.
105 We define the following development milestones and when they are considered achieved: 105 We define the following development milestones and when they are considered achieved:
106 \begin{description} 106 \begin{description}
107 \item[M1:] We have a fully working Graal VM version with a stripped down C1X compiler that does not perform any optimizations. 107 \item[M1:] We have a fully working Graal~VM version with a stripped down C1X compiler that does not perform any optimizations.
108 \item[M2:] We modified the high-level intermediate representation to be based on the Graal compiler graph data structure. 108 \item[M2:] We modified the high-level intermediate representation to be based on the compiler graph data structure.
109 \item[M3:] We have reimplemented and reenabled compiler optimizations in the Graal compiler that previously existed in C1X. 109 \item[M3:] We have reimplemented and reenabled compiler optimizations in the compiler that previously existed in C1X.
110 \item[M4:] We have reintegrated the new Graal compiler into the Maxine VM and can use it as a Maxine VM bootstrapping compiler. 110 \item[M4:] We have reintegrated the new compiler into the Maxine VM and can use it as a Maxine VM bootstrapping compiler.
111 \end{description} 111 \end{description}
112 112
113 After those four milestones, we see three different possible further development directions that can be followed in parallel: 113 After those four milestones, we see three different possible further development directions that can be followed in parallel:
114 \begin{itemize} 114 \begin{itemize}
115 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the Graal compiler graph. 115 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the compiler graph.
116 \item Improvements for Graal peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback). 116 \item Improvements for peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback).
117 \item Implementation of a prototype front-end for different languages, e.g., JavaScript. 117 \item Implementation of a prototype front-end for different languages, e.g., JavaScript.
118 \end{itemize} 118 \end{itemize}
119 119
120 \section{Implementation} 120 \section{Implementation}
121
122 \subsection{Nodes and Graphs}
123 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR).
124 The IR used in the compiler was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing.
125
126 \subsubsection{The Graph Data Structure}
127 \begin{itemize}
128 \item A graph deals out ids for new nodes and can be queried for the node corresponding to a given id.
129 \item Graphs can manage side data structures, which will be automatically invalidated and lazily recomputed whenever the graph changes. Examples for side data structures are dominator trees and temporary schedules. These side data structures will usually be understood by more than one optimization.
130 \end{itemize}
131
132 \subsubsection{The Node Data Structure}
133 \begin{itemize}
134 \item Each node is always associated with a graph.
135 \item Each node has an immutable id which is unique within its associated graph. \cw{The server compiler supports ``renumbering'' of nodes to make the ids dense again after large graph manipulations that deleted many nodes.}
136 \item Nodes represent either operations on values or control flow operations.
137 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes.
138 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes.
139 \item Nodes can only have data and control dependencies to nodes which belong to the same graph.
140 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd).
141 \cw{I don't like that item. Cycles are a normal thing for control flow and for phi functions. I would phrase it as something like that: Nodes can only have data and control dependencies to nodes that are dominators. The only exception of that are control loop headers and phi functions}
142 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers (notably the HotSpot client compiler \cw{Wrong: the client compiler only has a fixed order of pinned instructions, most instructions are not pinned and can be moved around freely}) always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
143 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions:
144 \begin{itemize}
145 \item \emph{inputs} are all nodes that this node has data dependencies on.
146 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs.
147 \item \emph{successors} are all nodes that have a control dependency on this node.
148 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors.
149 \end{itemize}
150 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
151 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc.
152 \item Nodes cannot be reassigned to another graph, they are cloned instead \cw{Why should there be the need for more than one graph when compiling a method?}
153 \end{itemize}
121 154
122 \subsection{Loops} 155 \subsection{Loops}
123 \label{sec:loops} 156 \label{sec:loops}
124 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases. 157 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases.
125 We only compile methods with a control flow where every loop has only one single entry point. 158 We only compile methods with a control flow where every loop has only one single entry point.
188 \data{If1}{Compare} 221 \data{If1}{Compare}
189 \controllabel{If1:succ1}{LoopEnd} 222 \controllabel{If1:succ1}{LoopEnd}
190 \controllabel{BeforeLoop}{LoopBegin} 223 \controllabel{BeforeLoop}{LoopBegin}
191 \controllabel{If1:succ2}{Exit} 224 \controllabel{If1:succ2}{Exit}
192 \end{digraphenv} 225 \end{digraphenv}
193 \caption{Graal compiler graph for a loop counting from 0 to n-1.} 226 \caption{Graph for a loop counting from 0 to n-1.}
194 \end{figure} 227 \end{figure}
195 228
196 \subsection{Loop Counters} 229 \subsection{Loop Counters}
197 The compiler is capable of recognizing variables that are only increased within a loop. 230 The compiler is capable of recognizing variables that are only increased within a loop.
198 A potential overflow of such a variable is guarded with a trap before the loop. 231 A potential overflow of such a variable is guarded with a trap before the loop.
223 \data{If1}{Compare} 256 \data{If1}{Compare}
224 \controllabel{If1:succ1}{LoopEnd} 257 \controllabel{If1:succ1}{LoopEnd}
225 \controllabel{BeforeLoop}{LoopBegin} 258 \controllabel{BeforeLoop}{LoopBegin}
226 \controllabel{If1:succ2}{Exit} 259 \controllabel{If1:succ2}{Exit}
227 \end{digraphenv} 260 \end{digraphenv}
228 \caption{Graal compiler graph after loop counter transformation.} 261 \caption{Graph after loop counter transformation.}
229 \end{figure} 262 \end{figure}
230 263
231 \subsection{Bounded Loops} 264 \subsection{Bounded Loops}
232 265
233 If the total maximum number of iterations of a loop is fixed, then the loop is converted into a bounded loop. 266 If the total maximum number of iterations of a loop is fixed, then the loop is converted into a bounded loop.
255 \datalabeltext{LoopCounter:in2}{Constant0}{init} 288 \datalabeltext{LoopCounter:in2}{Constant0}{init}
256 \datalabeltext{LoopCounter:in3}{Constant1}{stride} 289 \datalabeltext{LoopCounter:in3}{Constant1}{stride}
257 \data{LoopBegin}{n} 290 \data{LoopBegin}{n}
258 \controllabel{BeforeLoop}{LoopBegin} 291 \controllabel{BeforeLoop}{LoopBegin}
259 \end{digraphenv} 292 \end{digraphenv}
260 \caption{Graal compiler graph after bounded loop transformation.} 293 \caption{Graph after bounded loop transformation.}
261 \end{figure} 294 \end{figure}
262 295
263 \subsection{Vectorization} 296 \subsection{Vectorization}
264 297
265 If we have now a bounded loop with no additional loop exit and no associated phi nodes (only associated loop counters), we can vectorize the loop. 298 If we have now a bounded loop with no additional loop exit and no associated phi nodes (only associated loop counters), we can vectorize the loop.
288 \datalabel{VectorAdd:in2}{Constant0} 321 \datalabel{VectorAdd:in2}{Constant0}
289 \datalabel{VectorMul:in1}{VectorAdd} 322 \datalabel{VectorMul:in1}{VectorAdd}
290 \datalabel{VectorMul:in2}{Constant1} 323 \datalabel{VectorMul:in2}{Constant1}
291 \data{Vector}{n} 324 \data{Vector}{n}
292 \end{digraphenv} 325 \end{digraphenv}
293 \caption{Graal compiler graph after bounded loop transformation.} 326 \caption{Graph after bounded loop transformation.}
294 \end{figure} 327 \end{figure}
295 328
296 \subsection{Project Source Structure} 329 \subsection{Project Source Structure}
297 In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.graal}). 330 In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.graal}).
298 331
310 \item Machine code creation, including debug info. 343 \item Machine code creation, including debug info.
311 \end{itemize} 344 \end{itemize}
312 \cw{So you want to keep the backend as part of the ``main compiler'' at first. Seems OK for me.} 345 \cw{So you want to keep the backend as part of the ``main compiler'' at first. Seems OK for me.}
313 \end{description} 346 \end{description}
314 347
315 \subsection{Nodes and Graphs}
316 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR).
317 The IR used in the Graal Compiler (simply referred to as \emph{the compiler} in the rest of this document) was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing.
318
319 \subsubsection{The Graph Data Structure}
320 \begin{itemize}
321 \item A graph deals out ids for new nodes and can be queried for the node corresponding to a given id.
322 \item Graphs can manage side data structures, which will be automatically invalidated and lazily recomputed whenever the graph changes. Examples for side data structures are dominator trees and temporary schedules. These side data structures will usually be understood by more than one optimization.
323 \end{itemize}
324
325 \subsubsection{The Node Data Structure}
326 \begin{itemize}
327 \item Each node is always associated with a graph.
328 \item Each node has an immutable id which is unique within its associated graph. \cw{The server compiler supports ``renumbering'' of nodes to make the ids dense again after large graph manipulations that deleted many nodes.}
329 \item Nodes represent either operations on values or control flow operations.
330 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes.
331 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes.
332 \item Nodes can only have data and control dependencies to nodes which belong to the same graph.
333 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd).
334 \cw{I don't like that item. Cycles are a normal thing for control flow and for phi functions. I would phrase it as something like that: Nodes can only have data and control dependencies to nodes that are dominators. The only exception of that are control loop headers and phi functions}
335 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers (notably the HotSpot client compiler \cw{Wrong: the client compiler only has a fixed order of pinned instructions, most instructions are not pinned and can be moved around freely}) always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
336 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions:
337 \begin{itemize}
338 \item \emph{inputs} are all nodes that this node has data dependencies on.
339 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs.
340 \item \emph{successors} are all nodes that have a control dependency on this node.
341 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors.
342 \end{itemize}
343 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
344 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc.
345 \item Nodes cannot be reassigned to another graph, they are cloned instead \cw{Why should there be the need for more than one graph when compiling a method?}
346 \end{itemize}
347 348
348 \subsection{Frame States} 349 \subsection{Frame States}
349 Frame states capture the state of the program, in terms of the source representation (e.g., Java state: local variables, expressions, ...). 350 Frame states capture the state of the program, in terms of the source representation (e.g., Java state: local variables, expressions, ...).
350 Whenever a safepoint is reached or \cw{why is that an ``or'', both is basically the same} \ls{uncommon traps can be introduced at other points, e.g., at an if branch that isn't compiled} a deoptimization is needed a valid frame state needs to be available. 351 Whenever a safepoint is reached or \cw{why is that an ``or'', both is basically the same} \ls{uncommon traps can be introduced at other points, e.g., at an if branch that isn't compiled} a deoptimization is needed a valid frame state needs to be available.
351 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.). 352 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, loads, etc.).