Math Expression Parser (2/2)

The parser developed in the first part Math Expression Parser (1/2) can be readily used as a simple calculator to evaluate math expressions. But lets consider the parser being used in a numerical algorithm where an expression containing one or more variables needs to be evaluated many times. The benchmarks presented in the following chapter will show, that there is a huge performance loss compared to a compiled math expression. So in this second post of the series, we will see how to optimize expressions and convert the postfix token array to IL, giving a performance comparable to compiled code.

Gitlab repositories:

wo80/math-parser (C# version)
wo80/math-parser-ts (Typescript version)

Benchmark

To test the performance of the parser, we will use the following code, which is a simple quadrature formula for a function in one argument.

public abstract class Benchmark
{
    /// <summary>
    /// Perform any one-time initialization needed to run the benchmark.
    /// </summary>
    public virtual void Initialize(string expression) { }

    /// <summary>
    /// Evaluate the expression. Must be overridden by the actual benchmark.
    /// </summary>
    public abstract double Evaluate(double x);

    public double Run(double a, double b, int n)
    {
        double step = (b - a) / n;

        double sum = 0.0;

        try
        {
            for (int i = 0; i <= n; i++)
            {
                sum += Evaluate(a + i * step);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }

        return step * sum;
    }
}

The arguments a and b of the Run function are the left and right endpoints of the integration interval and n is the number of integration steps. The function used for testing will be

\[-2 \exp(0.5) \ (x \cos(e^x) + \cos(\sqrt 2 ) \ \sin(2 \pi x^2)^2) \ / \ e^2\]

The Evaluate function for the parser benchmark looks like

public override double Evaluate(double x)
{
    expression.Variables["x"] = x;

    return expression.Evaluate();
}

The Evaluate function for the benchmark of the native code looks like

public override double Evaluate(double x)
{
    return -2.0 * Exp(0.5) * (x * Cos(Pow(E, x)) + Cos(Sqrt(2.0)) * Pow(Sin(2.0 * PI * x * x), 2)) / Pow(E, 2.0);
}

Now lets run the benchmark (results for a Ryzen 5 3600):

[...]
MathParser Evaluate: 733.0ms (sum = 0.16583)
Native: 125.1ms (sum = 0.16583)
5.9 MathParser Evaluate
1.0 Native

The parser is about 6x slower compared to the native code.

Optimization

One idea to improve performance is to scan the expression for sub-expressions which value is independent of any variable, for example products like 2*pi. Those can be precomputed instead of being evaluated each time the expression is evaluated for a new set of variable values. Lets take a look at the example expression:

-2 * exp(0.5) * (x*cos(e^x) + cos(sqrt(2)) * sin(2*pi*x^2)^2) / e^2
^^^^^^^^^^^^^                 ^^^^^^^^^^^^       ^^^^           ^^^
-2 * exp(0.5)  = 4 tokens
 cos(sqrt(2))  = 3 tokens
         2*pi  = 3 tokens
          e^2  = 3 tokens

So overall, with the given idea we can reduce the evaluation of 13 tokens (6 operators/functions, 7 operands) to 4 tokens (operands only). Note that we are counting the tokens of the postfix expression, so parentheses are already eleminated.

The implementation is straight forward and similar to the Evaluate() method: each time an operator or function is encountered, check if its operands are constant. If so, replace the sub-expression with the result of the operation/function call.

public Expression Simplify()
{
    var functions = Parser.Functions;

    var stack = new Stack<Token>();

    foreach (var token in expression)
    {
        if (token.Type == TokenType.Operand)
        {
            stack.Push(token);
        }
        else if (token.Type == TokenType.Operator)
        {
            if (Operator.IsBinaryOperator(token.Text))
            {
                var op2 = stack.Pop();
                var op1 = stack.Pop();

                if (op1.Type == TokenType.Operand && op2.Type == TokenType.Operand)
                {
                    var text = string.Format("[{0} {1} {2}]", op1.Text, op2.Text, token.Text);
                    var value = Operator.Evaluate(token.Text, op1.Value, op2.Value);

                    stack.Push(new Token(TokenType.Operand, text, op1.Position, value));
                }
                else
                {
                    stack.Push(op1);
                    stack.Push(op2);
                    stack.Push(token);
                }
            }
            else
            {
                var op = stack.Pop();

                if (op.Type == TokenType.Operand)
                {
                    var text = string.Format("[{0} {1}]", op.Text, token.Text);
                    var value = Operator.Evaluate(token.Text, op.Value);

                    stack.Push(new Token(TokenType.Operand, text, token.Position, value));
                }
                else
                {
                    stack.Push(op);
                    stack.Push(token);
                }
            }
        }
        else if (token.Type == TokenType.Function)
        {
            int order = functions.GetOrder(token.Text);

            if (order == 0)
            {
                var text = string.Format("[{0}]", token.Text);

                functions.TryGet(token.Text, out Func0 func);

                stack.Push(new Token(TokenType.Operand, text, token.Position, func()));
            }
            else if (order == 1)
            {
                var op = stack.Pop();

                if (op.Type == TokenType.Operand)
                {
                    var text = string.Format("[{0} {1}]", op.Text, token.Text);

                    functions.TryGet(token.Text, out Func1 func);

                    stack.Push(new Token(TokenType.Operand, text, token.Position, func(op.Value)));
                }
                else
                {
                    stack.Push(op);
                    stack.Push(token);
                }
            }
            else if (order == 2)
            {
                var op1 = stack.Pop();
                var op2 = stack.Pop();

                if (op1.Type == TokenType.Operand && op2.Type == TokenType.Operand)
                {
                    var text = string.Format("[{0} {1} {2}]", op1.Text, op2.Text, token.Text);

                    functions.TryGet(token.Text, out Func2 func);

                    stack.Push(new Token(TokenType.Operand, text, token.Position, func(op1.Value, op2.Value)));
                }
                else
                {
                    stack.Push(op1);
                    stack.Push(op2);
                    stack.Push(token);
                }
            }
        }
        else
        {
            stack.Push(token);
        }
    }

    expression = stack.Reverse().ToArray();

    return this;
}

Lets run the benchmark again:

[...]
MathParser Evaluate: 733.0ms (sum = 0.16583)
MathParser Evaluate (optimized): 484.2ms (sum = 0.16583)
Native: 125.1ms (sum = 0.16583)
5.9 MathParser Evaluate
3.9 MathParser Evaluate (optimized)
1.0 Native

We reduced the execution time by nearly 34%. That is a great improvement, but we are still about 4x slower than the native code. Lets see what we can do about this.

Compiling to IL

In general, we cannot expect an interpreted language to be as fast as a compiled one. This is why languages like Python or PHP offer Just-In-Time Compilation (JIT). Modern JavaScript engines will also try to use some kind of JIT to optimize performance. Luckily, in .NET there’s a simple way to compile expressions to IL using the System.Linq.Expressions namespace. Though we could also do it the hard way by using System.Reflection.Emit to get the best performance (see for example the ILCalc project), using LINQ expressions is much easier.

Note that since the class name Expression is already used by our parser, we use an alias LExpression for the LINQ expression class.

using LExpression = System.Linq.Expressions.Expression;

class ExpressionCompiler
{
    public static T Create<T>(Token[] expression, string[] variables)
        where T : Delegate
    {
        return Create<T>(expression, GetParams(variables)).Compile();
    }

    private static Expression<T> Create<T>(Token[] expression, Dictionary<string, ParameterExpression> parameters)
        where T : Delegate
    {
        var functions = Parser.Functions;

        var lambda = new Stack<LExpression>();

        foreach (var token in expression)
        {
            if (token.Type == TokenType.Operand)
            {
                lambda.Push(LExpression.Constant(token.Value, typeof(double)));
            }
            else if (token.Type == TokenType.Operator)
            {
                if (Operator.IsBinaryOperator(token.Text))
                {
                    var op2 = lambda.Pop();
                    var op1 = lambda.Pop();

                    lambda.Push(GetBinaryOperator(token.Text, op1, op2));
                }
                else if (Operator.IsUnaryMinus(token.Text))
                {
                    var op = lambda.Pop();
                    lambda.Push(LExpression.Negate(op));
                }
                else
                {
                    throw new Exception("Unexpected operator token.");
                }
            }
            else if (token.Type == TokenType.Function)
            {
                if (functions.TryGet(token.Text, out Func0 func0))
                {
                    var op = lambda.Pop();
                    lambda.Push(LExpression.Call(func0.GetMethodInfo()));
                }
                else if (functions.TryGet(token.Text, out Func1 func1))
                {
                    var op = lambda.Pop();
                    lambda.Push(LExpression.Call(func1.GetMethodInfo(), new LExpression[] { op }));
                }
                else if (functions.TryGet(token.Text, out Func2 func2))
                {
                    var op2 = lambda.Pop();
                    var op1 = lambda.Pop();
                    lambda.Push(LExpression.Call(func2.GetMethodInfo(), new LExpression[] { op1, op2 }));
                }
            }
            else if (token.Type == TokenType.Symbol)
            {
                if (parameters.TryGetValue(token.Text, out var variable))
                {
                    lambda.Push(variable);
                }
                else
                {
                    throw new Exception("Unknown symbol token.");
                }
            }
        }

        return LExpression.Lambda<T>(lambda.Pop(), parameters.Values);
    }

    private static Dictionary<string, ParameterExpression> GetParams(string[] variables)
    {
        Dictionary<string, ParameterExpression> parameters = new();

        foreach (var variable in variables)
        {
            parameters.Add(variable, LExpression.Parameter(typeof(double), variable));
        }

        return parameters;
    }

    private static LExpression GetBinaryOperator(string op, LExpression t1, LExpression t2)
    {
        switch (op)
        {
            case "+": return LExpression.Add(t1, t2);
            case "-": return LExpression.Subtract(t1, t2);
            case "*": return LExpression.Multiply(t1, t2);
            case "/": return LExpression.Divide(t1, t2);
            case "^": return LExpression.Power(t1, t2);
        }

        throw new Exception("Unknown operator.");
    }
}

Conclusion

Lets run the benchmark one last time:

[...]
MathParser Evaluate: 733.0ms (sum = 0.16583)
MathParser Evaluate (optimized): 484.2ms (sum = 0.16583)
MathParser Compile: 182.0ms (sum = 0.16583)
Native: 125.1ms (sum = 0.16583)
5.9 MathParser Evaluate
3.9 MathParser Evaluate (optimized)
1.4 MathParser Compile
1.0 Native

By compiling the expression to IL, we reduced the execution time by 76% compared to the original, non-optimized version and performance is now nearly as good as the native expression. Take a look at the unit tests to see how this feature may be used.

Please use the contact form or the GitLab issues section for comments.