Constructing a data structure for a computer algebra system

Question

In thinking about how to approach this problem I think several things will be required, some tivial:

An expression tree where non-leaf node is an operation (not sure if that part is redundant), but not every node has just two children.
All nodes for operations have a defined number of children that they must have (some operators are unary (like $!$) while others are binary ($*,+,-,$ etc) and still other are n-ary ($f(a,b,d)$ and versions with different amounts of variables).
All leaf nodes are some type of number

I am under the impression that the tree should not explicitly retain information regarding the order of operations, but rather that information should be used in the parsing stage to insert things into the tree correctly.

This leads to the question, how should inserting to a specific position in the tree be done? Simply passing a list of directions (from root, take node zero, then node 1, etc, then insert) will work, but it seems overly clunky.

Or should I avoid that situation entirely (not talking about editing an equation here, just building a representation of one) by using the fact that in some sense the tree must be complete (all binary operations MUST have two children, etc, and even operators that are seemingly ambiguous (the $_{^-}$ sign for example) but these ambiguities are resolved before this point. That would all me to insert "in order"

Am I taking a reasonable approach? Does it make no sense whatsoever?

Additionally, are there papers or articles that I should read about CAS systems?

Clarification: The tree will need to support three different compound operations.

Creation: (from a string, but how to actually do that is beyond the scope of this question)
Reduction: (to some type of canonical form) so that if $a+b$ and $b+a$ are both entered and reduced, they will form identical trees.
Evaluation: Be able to traverse the tree

These are all the operations that need to be supported. There are probably many other more basic operations that may need to be supported, but in this case it only matters that the three operations above are supported. My understanding is that search for example is not a property that will be required, but deletion will be (of a whole subtree).

Gaslight Deceive Subvert · Answer 1 · 2022-07-24T15:01:53.793

The most straightforward approach for representing an algebraic expression is a parse tree. However, its big drawback is that it is difficult to reason about. That $xyz = zyx$ and $(xy)^2 = x^2y^2$ is not easy to infer using a normal parse tree. Instead, I suggest using this data structure:

$$ node \leftarrow sum | name | const\\ sum \leftarrow term_1, \ldots, term_n\\ term \leftarrow pow_1, \ldots, pow_n\\ pow \leftarrow (fun, fun)\\ fun \leftarrow (name, node_1, \ldots, node_n) $$

It is a tree that constrains the types of the nodes on each level. The sum's children are terms and the terms' children are factors, and the factors have a pair of functions which serves as the base and the exponent as their children.

To represent an expression, say $x + 2x + \sqrt x$, create the tree:

sum(term(pow(fun(id, x), fun(id, 1))),
    term(pow(fun(id, 2), fun(id, 1)),
         pow(fun(id, x), fun(id, 1))),
    term(pow(fun(id, 2), fun(id, 1/2)),
         pow(fun(id, x), fun(id, 1))))

The tree is evaluated recursively. To evaluate the sum, create a mapping from the non-constant factors of every term to the sum of its constant factors. If the term has no constant factors, use term(pow(fun(id, 1), fun(id, 1))):

{pow(fun(id, x), fun(id, 1)) : sum(
    term(pow(fun(id, 1), fun(id, 1))),
    term(pow(fun(id, 2), fun(id, 1))),
    term(pow(fun(id, 2), fun(id, 1/2))
)}

Evaluate all keys values. It is easy since there are no variables in the expression:

{pow(fun(id, x), fun(id, 1)) : sum(
    term(pow(fun(id, 3), fun(id, 1))),
    term(pow(fun(id, 2), fun(id, 1/2))
)}

Note that $1 + 2 + \sqrt 2 = 3 + \sqrt 2$ which is not a rational number. The next step is to produce a new sum by joining all keys with their values:

 sum(term(pow(fun(id, sum(term(pow(fun(id, 3), fun(id, 1)))
                          term(pow(fun(id, 2), fun(id, 1/2)))))),
          pow(fun(id, x), fun(id, 1))))

So the result is $(3 + \sqrt 2)x$. Optionally, you can distribute multiplication over addition to get $3x + \sqrt 2 x$.

Evaluation of terms work similarly; you create a mapping from factors to the sums of their powers. E.g $x^2x^3$ would produce the mapping:

{fun(id, x) : sum(term(pow(fun(id, 2), fun(id, 1))),
                  term(pow(fun(id, 3), fun(id, 1))))}

Then you'd just evaluate the sum and create the new term $x^5$.

Nobody · Accepted Answer · 2012-06-01T11:59:11.663

From the comments OP wrote, I understand he wants to start small. However, we need to think more generally so that the to-be developped CAS can be of real use. Otherwise it is nothing but another calculator. Please see Definition of Computer Algebra System.

My suggestion:

Numbers, elements, sets, operations, etc. are fundermental in algebra. The data structures will have to be about them. In particular, you can start with elementary set theory. Determine what will be an element, a set, etc. The elements should be generic(they can be anything). Abstraction is the key. Then, what are the operations associated with them? Constructing a set, membership test, union, intersection, etc. Once we have them in place, plug in the number manipulation packages to the system is a matter of instantiation.

score 2 · Answer 3 · answered May 30 '12 at 21:26

This looks like a classical application for Object-Oriented Programming (OOP), or more explicitly, Polymorphism.

You could create a basic object, e.g. treeObj with a method evaluate, and then generate sub-types for every object in your language, e.g. a plusOp object for the $+$ expression, the constructor of which takes two other treeObj as its left and right operands.

Every subclass of treeObj implements its own evaluate method which is called recursively on its operands. This is essentially Depth-First tree traversal.

Construction of such a tree from a string is usually done by converting the string into tokens, and then assembling the tree from the bottom up. For this you may want to have a look at LALR parsers, or, for more simple languages, a Recursive Descent parser.

Reduction is a bit trickier, as each node defines equivalence differently. Personally, I would handle this by making each treeObj provide a function isEqual, which compares it to another treeObj. Each different sub-type of treeObj could then implement its own commutativity, depending on the underlying mathematical properties of the operator.

In summary, a lot of Polymorphism and Recursion.

score 2 · Answer 4 · answered May 31 '12 at 11:48

2

Take a look at Chapter 8 of Paradigms of Artificial Intelligence Programming - where the author Peter Norvig solves this problem in Common Lisp.

answered May 31 '12 at 11:48

hawkeye

1,199
8
20

Constructing a data structure for a computer algebra system

4 Answers4

Linked