comparison doc/design/graal_compiler.tex @ 2562:0023bd42eefe

comments
author christian.wimmer@oracle.com
date Fri, 29 Apr 2011 11:14:05 -0700
parents 765dd54244a6
children 7aa67f5f3884 28a8b3c8b9b4
comparison
equal deleted inserted replaced
2561:765dd54244a6 2562:0023bd42eefe
15 15
16 \newcommand{\Sa}{{\Large$^*$}} 16 \newcommand{\Sa}{{\Large$^*$}}
17 \newcommand{\Sb}{{\Large$^\dag$}} 17 \newcommand{\Sb}{{\Large$^\dag$}}
18 \newcommand{\Sc}{{\Large$^\S$}} 18 \newcommand{\Sc}{{\Large$^\S$}}
19 19
20
21 \newcommand{\mynote}[2]{
22 \textcolor{red}{\fbox{\bfseries\sffamily\scriptsize#1}
23 {\small\textsf{\emph{#2}}}
24 \fbox{\bfseries\sffamily\scriptsize }}}
25
26 \newcommand\TODO[1]{\mynote{TODO}{#1}}
27 \newcommand\cw[1]{\mynote{CW}{#1}}
28
29
30
20 \smartqed % flush right qed marks, e.g. at end of proof 31 \smartqed % flush right qed marks, e.g. at end of proof
21 32
22 \journalname{Graal Compiler Design} 33 \journalname{Graal Compiler Design}
23 \def\makeheadbox{{% 34 \def\makeheadbox{{%
24 \hbox to0pt{\vbox{\baselineskip=10dd\hrule\hbox 35 \hbox to0pt{\vbox{\baselineskip=10dd\hrule\hbox
60 71
61 \section{Design} 72 \section{Design}
62 For the implementation of the Graal compiler, we rely on the following design decisions: 73 For the implementation of the Graal compiler, we rely on the following design decisions:
63 \begin{description} 74 \begin{description}
64 \item[Graph Representation:] 75 \item[Graph Representation:]
65 The compiler's intermediate representation is modelled as a graph with nodes that are connected with directed edges. 76 The compiler's intermediate representation is modeled as a graph with nodes that are connected with directed edges.
66 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime. 77 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime.
67 Every node is serializable and has an id that is unique within its graph. 78 Every node is serializable and has an id that is unique within its graph.
68 Every edge is classified as either a control flow edge (anti-dependency) or a data flow edge (dependency) and represented as a simple pointer from the source node to the target node. 79 Every edge is classified as either a control flow edge (anti-dependency) or a data flow edge (dependency) and represented as a simple pointer from the source node to the target node.
69 There is no cycle in the graph that contains only control flow edges or only data flow edges. 80 There is no cycle in the graph that contains only control flow edges or only data flow edges. \cw{What does that sentence mean? I can certainly think of a loop that has a control-flow cycle, but no data-flow cycle.}
70 It is possible to replace a node with another node without traversing the full graph. 81 It is possible to replace a node with another node without traversing the full graph.
71 \item[Extensibility:] 82 \item[Extensibility:]
72 The compiler is extensible by adding new compiler phases and new node subclasses without modifying the compiler's sources. 83 The compiler is extensible by adding new compiler phases and new node subclasses without modifying the compiler's sources.
73 A node has an abstract way of expressing its effect and new compiler phases can ask compiler nodes for their properties and capabilities. 84 A node has an abstract way of expressing its effect and new compiler phases can ask compiler nodes for their properties and capabilities.
85 \cw{Add: We use the ``everything is an extension'' concept. Even standard compiler optimizations are internally modeled as extensions, to show that the extension mechanism exposes all necessary functionality.}
74 \item[Detailing:] 86 \item[Detailing:]
75 The compilation starts with a graph that contains nodes that represent the operations of the source language (e.g., one node for an array store to an object array). 87 The compilation starts with a graph that contains nodes that represent the operations of the source language (e.g., one node for an array store to an object array).
76 During the compilation, the nodes are replaced with more detailed nodes (e.g., the array store node is split into a null check, a bounds check, a store check, and a memory access). 88 During the compilation, the nodes are replaced with more detailed nodes (e.g., the array store node is split into a null check, a bounds check, a store check, and a memory access).
77 Compiler phases can choose whether they want to work on the earlier versions of the graph (e.g., escape analysis) or on later versions (e.g., null check elimination). 89 Compiler phases can choose whether they want to work on the earlier versions of the graph (e.g., escape analysis) or on later versions (e.g., null check elimination).
90 \cw{In general, I agree that the lowering should happen without changing the style of IR. However, I don't agree that optimizations such as null check elimination should work on a lower level graph. Isn't it bette to model ``needs null check'' as a capability of high-level instructions? Then the eliminator just sets a property that no null check is necessary. But that is a good discussion point: How much optimization do we want to do by augmenting a high-level IR, and how much do we want to do by rewriting a low-level IR.}
78 \item[Generality:] 91 \item[Generality:]
79 The compiler does not require Java as its input. 92 The compiler does not require Java as its input.
80 This is achieved by having a graph as the starting point of the compilation and not a Java bytecodes array. 93 This is achieved by having a graph as the starting point of the compilation and not a Java bytecodes array.
81 Building the graph from the Java bytecodes must happen before giving a method to the compiler. 94 Building the graph from the Java bytecodes must happen before giving a method to the compiler.
82 This enables front-ends for different languages (e.g., Ruby) to provide their own graph. 95 This enables front-ends for different languages (e.g., Ruby) to provide their own graph.
83 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase. 96 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase.
97 \cw{Here we are getting into the nits of terminology. I think the term ``compiler'' should always refer to the whole system that goes from bytecodes to machine code. Yes, there will be additional parsers for different bytecode formats. But still, the compiler doesn't have graphs as input and outputs, but still bytecodes and machine code, respectively.}
84 \end{description} 98 \end{description}
85 99
86 \section{Milestones} 100 \section{Milestones}
87 The Graal compiler is developed starting from the current C1X source code base. 101 The Graal compiler is developed starting from the current C1X source code base.
88 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks. 102 This helps us testing the compiler at every intermediate development step on a variety of Java benchmarks.
92 \item[M2:] We modified the high-level intermediate representation to be based on the Graal compiler graph data structure. 106 \item[M2:] We modified the high-level intermediate representation to be based on the Graal compiler graph data structure.
93 \item[M3:] We have reimplemented and reenabled compiler optimizations in the Graal compiler that previously existed in C1X. 107 \item[M3:] We have reimplemented and reenabled compiler optimizations in the Graal compiler that previously existed in C1X.
94 \item[M4:] We have reintegrated the new Graal compiler into the Maxine VM and can use it as a Maxine VM bootstrapping compiler. 108 \item[M4:] We have reintegrated the new Graal compiler into the Maxine VM and can use it as a Maxine VM bootstrapping compiler.
95 \end{description} 109 \end{description}
96 110
111 \cw{That's a very coarse (not to say useless) listing, sound a bit like the generic ``define problem - think hard about it - publish it''...}
112
113 \cw{Mario wants a timeline. You better think about that carefully, so that you can keep the timeline. Mario doesn't want to repeat the C1X experience where at first it should take only 2 months, but it finally takes 2 years. Take that as a confidential hint from me...}
114
97 After those four milestones, we see three different possible further development directions that can be followed in parallel: 115 After those four milestones, we see three different possible further development directions that can be followed in parallel:
98 \begin{itemize} 116 \begin{itemize}
99 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the Graal compiler graph. 117 \item Removal of the XIR template mechanism and replacement with a snippet mechanism that works with the Graal compiler graph.
100 \item Improvements for Graal peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback). 118 \item Improvements for Graal peak performance (loop optimizations, escape analysis, bounds check elimination, processing additional interpreter runtime feedback).
101 \item Implementation of a prototype front-end for a scripting language (e.g., Ruby). 119 \item Implementation of a prototype front-end for different languages, e.g., JavaScript.
102 \end{itemize} 120 \end{itemize}
103 121
104 \section{Implementation} 122 \section{Implementation}
105 123
106 124
107 \subsection{Project Source Structure} 125 \subsection{Project Source Structure}
108 In order to have clear interfaces between the different parts of the compiler, the code will be divided into the following source code projects: 126 In order to have clear interfaces between the different parts of the compiler, the code will be divided into the following source code projects:
127 \cw{Use new naming scheme com.oracle.graal...}
109 \begin{description} 128 \begin{description}
110 \item[Graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes. 129 \item[Graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes.
111 \item[Nodes] contains the node implementations, ranging from high-level to machine-level nodes. 130 \item[Nodes] contains the node implementations, ranging from high-level to machine-level nodes. \cw{Can't we just stay with the name ``instruction'', which everyone understands, instead of ``Node''? I strongly vote for that.}
112 \item[GraphBuilder] contains helpers for building graphs from java bytecodes and other source representations. 131 \item[GraphBuilder] contains helpers for building graphs from Java bytecodes and other source representations.
113 \item[Assembler] contains the assembler classes that are used to generate the compiled code of methods and stubs. 132 \item[Assembler] contains the assembler classes that are used to generate the compiled code of methods and stubs.
114 \item[Optimizations] contains all the optimizations, along with different optimization plans. 133 \item[Optimizations] contains all the optimizations, along with different optimization plans.
115 \item[GraalCompiler] contains the compiler, including: 134 \item[GraalCompiler] contains the compiler, including:
116 \begin{itemize} 135 \begin{itemize}
117 \item Handling of compilation phases. 136 \item Handling of compilation phases.
120 \item Register allocation. 139 \item Register allocation.
121 \item Machine code creation, including debug info. 140 \item Machine code creation, including debug info.
122 \item Debug output and compilation observation. 141 \item Debug output and compilation observation.
123 \item Compiler options management. 142 \item Compiler options management.
124 \end{itemize} 143 \end{itemize}
144 \cw{So you want to keep the backend as part of the ``main compiler'' at first. Seems OK for me.}
125 \end{description} 145 \end{description}
126 146
127 \subsection{Initial Steps} 147 \subsection{Initial Steps}
128 \begin{itemize} 148 \begin{itemize}
129 \item Restructuring of the project to include the compiler and the modified HotSpot code within one repository. The CRI project will remain in the Maxine repository, because it will remain mostly unchanged. 149 \item Restructuring of the project to include the compiler and the modified HotSpot code within one repository. The CRI project will remain in the Maxine repository, because it will remain mostly unchanged.
131 \item Creating Node and Graph classes, along with the necessary auxiliary classes. 151 \item Creating Node and Graph classes, along with the necessary auxiliary classes.
132 \item Writing documentation on the design of the compiler. 152 \item Writing documentation on the design of the compiler.
133 \item Use the Node class as the superclass of the existing Value class. 153 \item Use the Node class as the superclass of the existing Value class.
134 \item Identify (and later: remove) extended bytecodes. 154 \item Identify (and later: remove) extended bytecodes.
135 \item Implement the new frame state concept. 155 \item Implement the new frame state concept.
136 \item Remove LIR - in the long run there should only be one IR, which will be continually lowered until only nodes that can be translated into machine code remain. 156 \item Remove LIR - in the long run there should only be one IR, which will be continually lowered until only nodes that can be translated into machine code remain. \cw{That cannot be an initial step, because you have nothing yet that could replace the LIR.}
137 \end{itemize} 157 \end{itemize}
138 158
139 \subsection{Nodes and Graphs} 159 \subsection{Nodes and Graphs}
140 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR). 160 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR).
141 The IR used in the Graal Compiler (simply refered to as \emph{the compiler} in the rest of this document) was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing. 161 The IR used in the Graal Compiler (simply referred to as \emph{the compiler} in the rest of this document) was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing.
142 162
143 \subsubsection{The Node Data Structure} 163 \subsubsection{The Node Data Structure}
144 \begin{itemize} 164 \begin{itemize}
145 \item Each node is always associated with a graph. 165 \item Each node is always associated with a graph.
146 \item Each node has an immutable id which is unique within its associated graph. 166 \item Each node has an immutable id which is unique within its associated graph. \cw{The server compiler supports ``renumbering'' of nodes to make the ids dense again after large graph manipulations that deleted many nodes.}
147 \item Nodes represent either operations on values or control flow operations. 167 \item Nodes represent either operations on values or control flow operations.
148 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes. 168 \item Nodes can have a data dependency, which means that one node requires the result of some other node as its input. The fact that the result of the first node needs to be computed before the second node can be executed introduces a partial order to the set of nodes.
149 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes. 169 \item Nodes can have a control flow dependency, which means that the execution of one node depends on some other node. This includes conditional execution, memory access serialization and other reasons, and again introduces a partial order to the set of nodes.
150 \item Nodes can only have data and control dependencies to nodes which belong to the same graph. 170 \item Nodes can only have data and control dependencies to nodes which belong to the same graph.
151 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incurr cycles (like loops) are represented by special nodes (like LoopEnd). 171 \item Control dependencies and data dependencies each represent a \emph{directed acyclic graph} (DAG) on the same set of nodes. This means that data dependencies always point upwards, and control dependencies always point downwards. Situations that are normally incur cycles (like loops) are represented by special nodes (like LoopEnd).
152 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers (notably the HotSpot client compiler) always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated. 172 \cw{I don't like that item. Cycles are a normal thing for control flow and for phi functions. I would phrase it as something like that: Nodes can only have data and control dependencies to nodes that are dominators. The only exception of that are control loop headers and phi functions}
173 \item Ordering between nodes is specified only to the extent which is required to correctly express the semantics of a given program. Some compilers (notably the HotSpot client compiler \cw{Wrong: the client compiler only has a fixed order of pinned instructions, most instructions are not pinned and can be moved around freely}) always maintain a complete order for all nodes (called \emph{scheduling}), which impedes advanced optimizations. For algorithms that require a fixed ordering of nodes, a temporary schedule can always be generated.
153 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions: 174 \item Both data and control dependencies can be traversed in both directions, so that each node can be traversed in four directions:
154 \begin{itemize} 175 \begin{itemize}
155 \item \emph{inputs} are all nodes that this node has data dependencies on. 176 \item \emph{inputs} are all nodes that this node has data dependencies on.
156 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs. 177 \item \emph{usages} are all nodes that have data dependencies on this node, this is regarded as the inverse of inputs.
157 \item \emph{successors} are all nodes that have a control dependency on this node. 178 \item \emph{successors} are all nodes that have a control dependency on this node.
158 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors. 179 \item \emph{predecessors} are all nodes that this node has control dependencies on, this is regarded as the inverse of successors.
159 \end{itemize} 180 \end{itemize}
160 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors. 181 \item Only inputs and successors can be changed, and changes to them will update the usages and predecessors.
161 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc. 182 \item The Node class needs to provide facilities for subclasses to perform actions upon cloning, dependency changes, etc.
162 \item Nodes cannot be reassigned to another graph, they are cloned instead 183 \item Nodes cannot be reassigned to another graph, they are cloned instead \cw{Why should there be the need for more than one graph when compiling a method?}
163 \end{itemize} 184 \end{itemize}
164 185
165 \subsubsection{The Graph Data Structure} 186 \subsubsection{The Graph Data Structure}
166 \begin{itemize} 187 \begin{itemize}
167 \item A graph deals out ids for new nodes and can be queried for the node corresponding to a given id. 188 \item A graph deals out ids for new nodes and can be queried for the node corresponding to a given id.
169 \item Graphs are 190 \item Graphs are
170 \end{itemize} 191 \end{itemize}
171 192
172 \subsection{Frame States} 193 \subsection{Frame States}
173 Frame states capture the state of the program, in terms of the source representation (e.g., Java state: local variables, expressions, ...). 194 Frame states capture the state of the program, in terms of the source representation (e.g., Java state: local variables, expressions, ...).
174 Whenever a safepoint is reached or the a deoptimization is needed a valid frame state needs to be available. 195 Whenever a safepoint is reached or \cwi{why is that an ``or'', both is basically the same} a deoptimization is needed a valid frame state needs to be available.
175 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, etc.). 196 A frame state is valid as long as the program performs only actions that can safely be reexecuted (e.g., operations on local variables, etc.).
176 Thus, frame states need only be generated for bytecodes that can not be reexecuted: putfield, astore, invoke, monitorenter/exit, ... 197 Thus, frame states need only be generated for bytecodes that cannot be reexecuted: putfield, astore, invoke, monitorenter/exit, ...
177 198
178 Within the node graph a frame state is represented as a node that is fixed between the node that caused it to be generated (data dependency) and the node that will invalidate it (control dependency). 199 Within the node graph a frame state is represented as a node that is fixed between the node that caused it to be generated (data dependency) and the node that invalidates it (control dependency).
179 200
180 Deopmization nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state. 201 Deopmization nodes are not fixed to a certain frame state node, they can move around freely and will always use the correct frame state.
181 At some point during the compilation deoptimization nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that it can not move outside the scope of the associated frame state. 202 At some point during the compilation, deoptimization nodes need to be fixed, which means that appropriate data and control dependencies will be inserted so that it can not move outside the scope of the associated frame state.
182 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations. 203 This will generate deoptimization-free zones that can be targeted by the most aggressive optimizations.
183 204
184 Frame states should be represented using a delta-encoding. 205 Frame states should be represented using a delta-encoding.
185 This will make them significantly smaller and will make inlining, etc. much easier. 206 This will make them significantly smaller and will make inlining, etc. much easier.
186 In later compilation phases unnecessary frame states might be removed, using a mark-and-merge algorithm. 207 In later compilation phases unnecessary frame states might be removed, using a mark-and-merge algorithm.
188 209
189 210
190 \subsection{Graph Building} 211 \subsection{Graph Building}
191 \begin{itemize} 212 \begin{itemize}
192 \item The graph built by the initial parser (called \emph{GraphBuilder}) should be as close to the source representation (bytecode, ...) as possible. 213 \item The graph built by the initial parser (called \emph{GraphBuilder}) should be as close to the source representation (bytecode, ...) as possible.
193 \item All nodes should be able to immediately lower themselves to a machine-level representation. This allows for easier compiler development, and also leads to a compiler that is very flexible in the amount of optimizations it performs (e.g. recompilation of methods with more aggressive optimizations). 214 \item All nodes should be able to immediately lower themselves to a machine-level representation. \cw{What is that? You mean every node has x86 specific code that spits out machine code? Hope you are joking...} This allows for easier compiler development, and also leads to a compiler that is very flexible in the amount of optimizations it performs (e.g. recompilation of methods with more aggressive optimizations).
194 \item 215 \item
195 \end{itemize} 216 \end{itemize}
196 217
197 \subsection{Graphical Representation} 218 \subsection{Graphical Representation}
198 The graphs in this document use the following node layout: 219 The graphs in this document use the following node layout:
201 \node{node1}{nop} 222 \node{node1}{nop}
202 \nodebi{node2}{+} 223 \nodebi{node2}{+}
203 \nodetri{node3}{phi} 224 \nodetri{node3}{phi}
204 \nodesplit{node4}{if} 225 \nodesplit{node4}{if}
205 \end{digraphenv} 226 \end{digraphenv}
227
228 \cw{That doesn't compile with my latex. What do I have to do to get it working?}
206 229
207 Red arrows always represents control dependencies, while black arrows represent data dependencies: 230 Red arrows always represents control dependencies, while black arrows represent data dependencies:
208 231
209 \begin{digraphenv}{scale=0.5}{layout1} 232 \begin{digraphenv}{scale=0.5}{layout1}
210 \node{a}{a} 233 \node{a}{a}