-1

I have a graph which looks something like below Search

I am searching for a given string s going down such that at every node it makes a decision (either to visit the left node or right). The code goes something like this.

String s=input;
Node CurrentNode=start;
int rightSubstringIndex=-1;
int leftSubstringIndex=-1; 
while(true){
    if(CurrentNode.getValue()==s){
        return 1;//indicating that the string was found in the graph
    }
    if(CurrentNode.right==null){//there won't be a node with only one child so either both child nodes are there or there is no child
        return 0;//indicating that the string was not found in the graph
    }
    rightSubstringIndex=findSubStringIndex(CurrentNode.right.getValue(),s); 
    leftSubstringIndex=findSubStringIndex(CurrentNode.left.getValue(),s); 


    if(rightSubstringIndex<leftSubstringIndex){
        CurrentNode=CurrentNode.right;//go towards the right node
    }
    else if(leftSubstringIndex<rightSubstringIndex){
        CurrentNode=CurrentNode.left;//go towards the left node
    }
    else{//i.e when rightSubstringIndex==1000 && leftSubstringIndex==1000
        return 0://indicating that the string was not found in the graph
    }
}

findSubstringIndex(a,b) returns the index in the string array b where the substring a starts . Returns 1000 if the the substring a doesn't occur in b(assuming that there will not be a string having length greater than 1000 characters).The time complexity of function findSubstring(a,b) is assumed to be $O(1)$.

I am interested in finding out the time complexity of the above described algorithm.I know that if it was a binary tree and I had to look up for the string using a similar strategy the complexity will be $O(logN)$. I don't know how to think of complexity in terms of number of nodes in this scenario.

I am not very good at theory so kindly don't go hard on me

2 Answers2

4

This answer is under the assumption that findSubstring() is $O(1)$, which it realistically is not. Notice that at each iteration, when you make a step down to the next level in the graph, the problem size is being reduced by a factor of 2. That is, the amount of possible solutions is being reduced by a factor of 2. Specifically this is because the size of the strings duplicate at each layer down into the graph. So at any rate, if you have a string of length $n$, it will take $\log n$ steps to reach a solution using this algorithm. Furthermore, the string length must be a power of 2. This is a bit easier to visualize if you split your graph into its equivalent binary tree:

binary tree rep

This is a valid representation because each time you go left or right, you are cutting yourself off from impossible solutions, and that is exactly what happens in binary trees.

$\log n$ is just the time bound with respects to the length of the string however. You must also consider the height of the graph (number of nodes from start to some end node). If we have a full graph of this format (every layer is filled). If the graph is of height $h$, you could imagine this graph as the top left half of an $h \times h$ grid if you cut it in half along the diagonal. This means for a graph of height $h$, we have $h \cdot h / 2$ nodes in the graph.

So now we have two scenarios, (1) the upper bound of string length ends the algorithm or (2) the upper bounds of graph height ends the algorithm. Which gives a result:

$O(max(\log n, \sqrt{|V|})) = O(\log n + \sqrt{|V|})$ where $n$ is the length of the input string and $|V|$ is the number of nodes in the graph.

Another similar approach to this problem that you might be able to utilize:

If you know the length of the string (assuming it's given), you know the precise layer of the graph it will be in. If you can get to that layer in constant time (not sure if this will be possible with your setup), you might be able to use a different greedy approach to binary search for the solution inside that layer.

A string of length $n$ will be at level $\log n + 2$ in the graph. At level $l$ of the graph there will be $l$ nodes (possible solutions). So if the string is at level $\log n + 2$, there are $\log n + 2$ nodes at that level. If you can binary search somehow (an exercise for the reader) through that layer, you might be able to get a $\log \log n$ solution.

ryan
  • 4,533
  • 1
  • 16
  • 41
1

No. The running time is not $O(\log n)$. The running time is at least $\Omega(n)$. In each iteration of the loop, you call findSubStringIndex() to check whether the string in a node is a substring of s (where $n$ is the length of the string s). That check can take $\Theta(n)$ time.

This appears to be an inefficient data structure, and it appears there are better choices -- e.g., a hashtable, a trie.

D.W.
  • 167,959
  • 22
  • 232
  • 500