3

I have a (binary) decision tree consisting of nodes $N=\{N_i\}$ that take on boolean propositions/features $F=\{F_k\}$. Different decision paths can split on the same feature so

$ |N| >> |F| $

Edit: and in addition the number of repeating features is fixed for a graph. So if F1 is repeated twice it has to be repeated twice for each permuted graph. This is because I have a "starter" graph and I want to see how much it would change if I rearrange the nodes.

I want to compute probability

$ P(N_i = F_k | G(N,E)) $

where $G(N,E)$ is a DAG with nodes $\{N_i\}$ and edges $\{E_{jk}\}$ (in particular it is a decision/binary tree).

Assuming nodes are assigned random features with the constraint that that no decision path has the same feature twice (so the decision tree never repeats a decision boundary).

This means the following permutation is not allowed (F1 is repeated in F1->F3->F1 path):

  F1
 /  \
F2  F3
   /  \
  F1  F4

but this permutation is allowed (no repeats):

  F2
 /  \
F1  F3
   /  \
  F1   F4

The graphs are relatively large (~1000-10000 nodes) and I need to compute over a few hundred different versions so I wonder if there is something better than brute force that I can use.

Edit:

So for the tree structure above let's define nodes as

  N0
 /  \
N1  N2
   /  \
  N3   N4

and have a feature set {F1,F1,F2,F3,F4}

Then brute forcing with our constraint we have

        F1          F2          F3        F4
 N0:   [0.         0.33333333 0.33333333 0.33333333]
 N1:   [0.75       0.08333333 0.08333333 0.08333333]
 N2:   [0.25       0.25       0.25       0.25.     ]
 N3:   [0.5        0.16666667 0.16666667 0.16666667]
 N4:   [0.5        0.16666667 0.16666667 0.16666667]
Ilya
  • 176
  • 1
  • 7
  • this is not the case. For the tree I've shown above P(N0=F1)=0 which is to say the root cannot be F1 (because F1 has to appear twice and everything starts at the root) – Ilya Jul 11 '20 at 01:31
  • I added a brute forced example of how the probability distribution is not trivial – Ilya Jul 11 '20 at 01:38
  • Ok, i think I understand the problem with my specification, the particular list of features has to be used, so if a feature is presented twice it has to be presented twice in its permutations. I will edit the main post. – Ilya Jul 11 '20 at 01:41
  • Oh, I didn't realize you also specified how often each feature must appear. – Misha Lavrov Jul 11 '20 at 01:41
  • yeah, my bad, I didn't specify it! I have a starting graph, and I want to see what happens if i randomly rearrange it (with the non-repeating constraint). – Ilya Jul 11 '20 at 01:43

0 Answers0