changeset 2588:26ecba0ead6d

Update on doc.
author Thomas Wuerthinger <thomas@wuerthinger.net>
date Thu, 05 May 2011 13:59:43 +0200
parents 70d8d239eb89
children 795df30f979d
files doc/design/graal_compiler.pdf doc/design/graal_compiler.tex
diffstat 2 files changed, 15 insertions(+), 30 deletions(-) [+]
line wrap: on
line diff
Binary file doc/design/graal_compiler.pdf has changed
--- a/doc/design/graal_compiler.tex	Thu May 05 13:24:34 2011 +0200
+++ b/doc/design/graal_compiler.tex	Thu May 05 13:59:43 2011 +0200
@@ -79,24 +79,24 @@
 There is only a single node base class and every node has an associated graph object that does not change during the node's lifetime.
 Every node is serializable and has an id that is unique within its graph.
 Every edge is classified as either a control flow edge (anti-dependency) or a data flow edge (dependency) and represented as a simple pointer from the source node to the target node.
-There is no cycle in the graph that contains only control flow edges or only data flow edges. \cw{What does that sentence mean?  I can certainly think of a loop that has a control-flow cycle, but no data-flow cycle.}
 It is possible to replace a node with another node without traversing the full graph.
+The graph does not allow data flow edge cycles or control flow edge cycles.
+We achieve this by explicitely modelling loops (see Section~\ref{sec:loops}). 
 \item[Extensibility:]
 The compiler is extensible by adding new compiler phases and new node subclasses without modifying the compiler's sources.
 A node has an abstract way of expressing its effect and new compiler phases can ask compiler nodes for their properties and capabilities.
-\cw{Add: We use the ``everything is an extension'' concept. Even standard compiler optimizations are internally modeled as extensions, to show that the extension mechanism exposes all necessary functionality.}
+We use the ``everything is an extension'' concept.
+Even standard compiler optimizations are internally modeled as extensions, to show that the extension mechanism exposes all necessary functionality.
 \item[Detailing:]
 The compilation starts with a graph that contains nodes that represent the operations of the source language (e.g., one node for an array store to an object array).
 During the compilation, the nodes are replaced with more detailed nodes (e.g., the array store node is split into a null check, a bounds check, a store check, and a memory access).
 Compiler phases can choose whether they want to work on the earlier versions of the graph (e.g., escape analysis) or on later versions (e.g., null check elimination).
-\cw{In general, I agree that the lowering should happen without changing the style of IR.  However, I don't agree that optimizations such as null check elimination should work on a lower level graph.  Isn't it bette to model ``needs null check'' as a capability of high-level instructions?  Then the eliminator just sets a property that no null check is necessary.  But that is a good discussion point: How much optimization do we want to do by augmenting a high-level IR, and how much do we want to do by rewriting a low-level IR.}
 \item[Generality:]
 The compiler does not require Java as its input.
 This is achieved by having a graph as the starting point of the compilation and not a Java bytecodes array.
 Building the graph from the Java bytecodes must happen before giving a method to the compiler.
 This enables front-ends for different languages (e.g., Ruby) to provide their own graph.
 Also, there is no dependency on a specific back-end, but the output of the compiler is a graph that can then be converted to a different representation in a final compiler phase.
-\cw{Here we are getting into the nits of terminology.  I think the term ``compiler'' should always refer to the whole system that goes from bytecodes to machine code.  Yes, there will be additional parsers for different bytecode formats.  But still, the compiler doesn't have graphs as input and outputs, but still bytecodes and machine code, respectively.}
 \end{description}
 
 \section{Milestones}
@@ -120,6 +120,7 @@
 \section{Implementation}
 
 \subsection{Loops}
+\label{sec:loops}
 Loops form a first-class construct in the IR that is expressed in specialized IR nodes during all optimization phases.
 We only compile methods with a control flow where every loop has only one single entry point.
 This entry point is a \nodename{LoopBegin} node.
@@ -293,39 +294,24 @@
 \end{figure}
 
 \subsection{Project Source Structure}
-In order to have clear interfaces between the different parts of the compiler, the code will be divided into the following source code projects:
-\cw{Use new naming scheme com.oracle.graal...}
+In order to support the goal of a modular compiler, the code will be divided into the following source code projects (as subprojects of \textbf{com.oracle.graal}).
+
 \begin{description}
-    \item[Graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes.
-    \item[Nodes] contains the node implementations, ranging from high-level to machine-level nodes. \cw{Can't we just stay with the name ``instruction'', which everyone understands, instead of ``Node''?  I strongly vote for that.}
-    \item[GraphBuilder] contains helpers for building graphs from Java bytecodes and other source representations.
-    \item[Assembler] contains the assembler classes that are used to generate the compiled code of methods and stubs.
-    \item[Optimizations] contains all the optimizations, along with different optimization plans.
-    \item[GraalCompiler] contains the compiler, including:
+    \item[graph] contains the abstract node implementation, the graph implementation and all the associated tools and auxiliary classes.
+    \item[nodes] contains the implementation of known basic nodes (e.g., phi nodes, control flow nodes, \ldots).
+ 				 Additional node classes should go into seperate projects and be specializations of the known basic nodes.]
+    \item[java] contains code for building graphs from Java bytecodes and Java-specific nodes.
+    \item[opt] contains optimizations such as global value numbering or conditional constant propagation.
+    \item[compiler] contains the compiler, including:
         \begin{itemize}
-            \item Handling of compilation phases.
-            \item Compilation-related data structures.
+            \item Schedules the compilation phases.
             \item Implementation of the \emph{compiler interface} (CI).
-            \item Register allocation.
+            \item Implements the final compilation phase that produces the low-level representation.
             \item Machine code creation, including debug info.
-            \item Debug output and compilation observation.
-            \item Compiler options management.
         \end{itemize}
 		\cw{So you want to keep the backend as part of the ``main compiler'' at first.  Seems OK for me.}
 \end{description}
 
-\subsection{Initial Steps}
-\begin{itemize}
-    \item Restructuring of the project to include the compiler and the modified HotSpot code within one repository. The CRI project will remain in the Maxine repository, because it will remain mostly unchanged.
-    \item Stripping optimizations from the existing compiler, they will be reimplemented later on using the new infrastructure.
-    \item Creating Node and Graph classes, along with the necessary auxiliary classes.
-    \item Writing documentation on the design of the compiler.
-    \item Use the Node class as the superclass of the existing Value class.
-    \item Identify (and later: remove) extended bytecodes.
-    \item Implement the new frame state concept.
-    \item Remove LIR - in the long run there should only be one IR, which will be continually lowered until only nodes that can be translated into machine code remain. \cw{That cannot be an initial step, because you have nothing yet that could replace the LIR.}
-\end{itemize}
-
 \subsection{Nodes and Graphs}
 The most important aspect of a compiler is the data structure that holds information about an executable piece of code, called \emph{intermediate representation}~(IR).
 The IR used in the Graal Compiler (simply referred to as \emph{the compiler} in the rest of this document) was designed in such a way as to allow for extensive optimizations, easy traversal, compact storage and efficient processing.
@@ -369,7 +355,6 @@
     \item Array stores: {\tt IASTORE, LASTORE, FASTORE, \\DASTORE, AASTORE, BASTORE, CASTORE, SASTORE}
     \item Field stores: {\tt PUTSTATIC, PUTFIELD}
     \item Method calls: {\tt INVOKEVIRTUAL, INVOKESPECIAL, \\INVOKESTATIC, INVOKEINTERFACE}
-    \item Memory allocations: {\tt NEW, NEWARRAY, ANEWARRAY, \\MULTIANEWARRAY}
     \item Synchronization: {\tt MONITORENTER, MONITOREXIT}
 \end{itemize}