0

Originally posted on SO.

I have a very simple language that gets compiled to an Expression tree, and then evaluated. Users can define mathematical operations, use variables and control flow. Moreover, from the script it's also possible to read an external input, which is fixed for a given evaluation (i.e. it outputs the same value if called repeatedly). For example this is a valid script:

if readInput() < 10 then
    x = 2 * readInput()
else
    x = 0
end

My problem: I need to compute the maximum and minimum value of x, for a given input domain, e.g. [0, 1000].

Solutions I have thought of:

  • assign to readInput() the boundaries of the domain and evaluate the values of x. It works in many cases but fails in many others, for example in the one above it evaluates to x = 0 in both 0 and 1000

  • compute the max/min numerically, which would work in many cases but is compute intensive, and doesn't guarantee to reach the maximum, especially when there are discontinuities

  • somehow convert the tree and use a symbolic math library to compute the boundaries

Are there any other alternatives that I'm missing?

apocalypsis
  • 101
  • 1

1 Answers1

1

The general solution is to use symbolic execution.

Let me start by explaining how to solve this if you have straight-line code, with no conditional statements (no if's). First, for each program variable x (or intermediate value/expression), you introduce a mathematical variable $x$. Second, for each line of code, you form a constraint that expresses how the value of the variable assigned by that line of code relates to the other variables in that line. For each readInput(), you have no constraints on the mathematical variable that corresponds to it, or set an upper and lower bound based on the range of values it could receive. Finally, you conjoin all the constraints, and ask a solver to find the maximum value of the final result, subject to all of the constraints.

For instance, consider the following straight-line code:

a = readInput()
b = 2*a + 3
c = readInput()
d = b/c + c/b

You introduce mathematical variables $a,b,c,d$, and then convert the above to the constraints

$$\begin{align*} 0 &\le a \le 1000\\ b &= 2a + 3\\ 0 &\le c \le 1000\\ d &= b/c + c/b \end{align*}$$

Finally, you construct a boolean formula via the conjunction of these constraints,

$$(0 \le a \le 1000) \land (b = 2a+3) \land (0 \le c \le 1000) \land (d = b/c + c/b),$$

and you ask a solver to compute the maximum possible value of $d$, subject to this boolean formula. For instance, you could use the Z3 solver to compute this maximum.

That only works for straight-line code. What about an expression tree, such as you list?

You can do something similar once you introduce conditional statements, but now you need to introduce the notion of path constraints.

Pick any one path through the tree. This corresponds to a sequence of straight-line code, corresponding to the statements that are executed along this path. We can create a boolean formula $\varphi$ from this straight-line code, as above. Also, we can create a boolean formula $\psi$ from the conjunction of the conditional expressions that are evaluated along this path through the tree. For instance, if at the top-level we branch on a < 17, and this branch is taken, then we add $a<17$ to $\psi$. If at the next level we branch on a*b > 5, and this branch is not taken, then we add $ab \le 5$ to $\psi$. We take the conjunction of all of these, e.g., $(a<17) \land (ab \le 5)$, to be $\psi$. Here $\psi$ is known as the path constraint. It encodes what conditions must be true for execution to follow that path.

We then maximize the final value along this execution path, subject to the boolean constraint $\varphi \land \psi$. This can again be solved with a solver.

Finally, we repeat this, once per path (i.e., once per leaf of the tree), and take the largest value that is possible along any of the paths.

D.W.
  • 167,959
  • 22
  • 232
  • 500