I am a pure math person doing some ML self-study and I am pretty lost.
I am trying to solve the following exercises on decision trees:
Exercise 1. Consider the following training set where $X_1,X_2,X_3,X_4$ are the attributes and $Y$ is the class variable. $$ \begin{matrix} Y & X_1 & X_2 & X_3 & X_4 \\ 1 & 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 1 & 1 & 0 \\ -1& 0 & 0 & 1 & 1 \\ -1& 0 & 0 & 0 & 0 \\ -1& 0 & 0 & 1 & 0 \\ -1& 1 & 0 & 0 & 0 \\ -1& 0 & 0 & 1 & 1 \\ \end{matrix} $$
- Learn a decision tree using the ID3 algorithm.
- Draw a decision tree having only 4 leaf nodes, 3 internal nodes and depth bounded by 2, that has 100% accuracy on the given dataset.
Exercise 2. Let $x$ be a vector of $n$ Boolean variables ${X_1,...,X_n}$ and let $k$ be an integer less than $n$. Let $f_k$ be a target concept which is a disjunction consisting of $k$ literals. State the size of the smallest possible consistent decision tree (namely a decision tree that correctly classifies all possible examples) for $f_k$ in terms of $n$ and $k$ and describe its shape.
Now, I've done the first part of Exersice 1 as follows: first I noticed that $X_2$ is pure and so I chose it as root; then I used Information Gain to find the nodes. The second question confuses me: my understanding was that ID3 already gives the smallest tree, so how am I supposed to make it smaller? Or if am I wrong can anybody help me clarify?
For the second problem, I really don't know where to start from, so any hint would be appreciated.