view ProblemsIdeas.txt @ 3233:e3d3fd5b638a

Canonicalize Negate(Negate(x)) for int/long remove incorrect canonicalization of FloatSub(0.0, x) to Negate(x)
author Gilles Duboscq <gilles.duboscq@oracle.com>
date Wed, 27 Jul 2011 11:53:37 +0200
parents 4a6bda6cfe28
children 0ffcce571d09
line wrap: on
line source

Problems
========

* -Xcomp + synchronized method + unresolved type => deopt to bci -1
  Example : try "runfop.sh -G:-Inline -Xcomp" should fail on the compilation of org.apache.xmlgraphics.util.Service.providers(java.lang.Class, boolean)
  
  The first bytecode of the method is NEW (java.lang.StringBuffer)it will deopt because java.lang.StringBuffer is not resolved, the logic in append(FixedNode) will go back until the start node, detaching the MonitorEnter from the start node and attaching the Deopt instead.
  Deopt will now get the FrameState generated by LIRGenerator in setOperandsForLocals which is bci -1 for a synchronized method.

  Can interpreter handle bci -1 for synchronized methods ? if this is the case, the corresponding assert in LIRGenerator.visitDeoptimize should be removed as bci -1 should now only be used for synchronized entry points and not for exception handling or anything else


* Deopt caused by profiling can interfer with later optimisations
  Exmple : have a look to some of the visit methods in avrora.arch.legacy.LegacyInterpreter sometimes if constructs which are used to materialize some booleans have a Deopt branch which prevents us from detecting an opportunity for a MaterilizeNode canonicalization. As long as the MaterializeNode itself doesnt get canonicalized away and in the end translates to jumps in the final assembly this doesnt really matter but it may if we optimise the emitted assembly

Ideas
=====

* Always inline 'specialization' methods
  Example :
  public void foo(int a) {
      foo(a, 1); // invoke 1
  }

  public void foo(int a, int b) {
      foo(a, b, false); // invoke 2
  }

  public void foo(int a, int b, boolean c) {
      // ...
  }
 
  Here invoke 1 and 2 should always be inlined regardless of the size of the inlined graph/method size and without increasing the inlining depth
  specializations should always be inlined if we are in a trivial method, not only he methods with only one invoke, we shoudl also do it for exemple for methods that invoke and only do simple operations on paramteres or result


* 'computable' slots in Framstates/debug info

  Framestates/debug info will keep some values live for longer than they should because the values are needed for exemple for a very unlikely Deopt, this increases the pressure on register allocator.
  In the current system Deopt is the unlikely/slow path so maybe we can make it a bit slower to avoid this problem and increase fast path performances.
  An idea would be to insert an 'expression' in some slots instead of value referances, the expression would use values that have to be live at this point for program semantics reasons.
  Expressions would then be evaluated by the runtime when the framestate values are needed. Expression operators probably have to be kept simple (non-trapping arithmetics)
  This could be done by deopting to some deopt handler that will fixup the computable slots, this would probably require less runtime hacking. For exemple an object containing the informations necessary to fixup the framestate could be inserted in one of the framestate solts and then used by a deopt handler called in the same way than the deopt handler example.


* Profilable expressions

  Some optimizations may benefit from knowing is some assumption can be done. For exemple some loop optimisations may want to know the likelyhood of 2 values being aliased so that it can know if inserting a deopt-if-aliased guard is really beneficial.
  This kind of information can not always be derived just from branch probabilities and it would be interesting to be able to ask the runtime to profile an expression, the simplest version would be to generate a boolean expression and get the probability for it being true.
  This requires going through the compiler and asking for further profiling there are 2 main options here :
   - Reprofile the method in interpreter with the new profiled expression (we may modify the method bytecode to insert a conditional branch on the expression we want to profile, thus using the classical branch probability infrastructure)
   - insert a profiling snippet in the compiled code and ask runtime to recompile the method after N executions


* Transform some contiguous array accesses into phis when possible
  Example : (most probably found in scientific code, for example relaxation code)
  
  for (int i = 1; i < A.length - 1; i++) {
     vm1 = A[i-1];
     v0  = A[i];
     vp1 = A[i+1];
     // ...
  }

  should be transformed into
  vm1 = A[0];
  v0 = A[1];
  for (int i = 0; i < A.length - 1; i++) {
     vp1 = A[i+1];
     // ...
     vm1 = v0;
     v0 = vp1;
  }

  This could be done in the context of a more advanced induction varaible analysis to be able to detect such access patterns. In this example we removed 2 array access (2 loads + 2 address computation + 2 bounds checks if not hoisted) while only adding 2 moves (phis)


* Implement array bounds check elimination

* Rematerialize only the nodes that were affected by GVN
  This will probably require something that tracks changes to the Graph, the cost of such a tracking should be evaluated

* Hints on register pressure

  Sometimes we can make better decisions if we know the register pressure, it would be nice to have a way to know about it. Maybe we have register allocation on SSA we can somehow interact with it and try to lower the pressure in some areas on request?