dynamic programming exercise on cutting strings

Question

I have been working on the following problem from this book.

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions $3$ and $10$, then making the first cut at position $3$ incurs a total cost of $20 + 17 = 37$, while doing position 10 first has a better cost of $20 + 10 = 30$.

I need a dynamic programming algorithm that given $m$ cuts, finds the minimum cost of cutting a string into $m +1$ pieces.

score 11 · Answer 1 · answered May 10 '12 at 00:38

The basic idea is: Try out all cut positions as first choice, solve the respective parts recursively, add the cost and choose the minimum.

In formula:

$\qquad \displaystyle \operatorname{mino}(s, C) = \begin{cases} |s| &, |C| = 1 \\ |s| + \min_{c \in C} \left[ \begin{align}&\operatorname{mino}(s_{1,c}, \{c' \in C \mid c' < c\})\ \\ +\ &\operatorname{mino}(s_{c+1,|s|}, \{c' - c \in C \mid c' > c\}) \end{align}\right] &, \text{ else} \end{cases}$

Note that applying memoisation to this recursion actually saves work as switching the order of any successively applied pair of cuts results in the same three subproblems being solved.

score 2 · Answer 2 · edited Nov 13 '15 at 08:00

It is always a good idea to find a recursive algorithm first and then turn it into a table.

$f(C,n)$
$~~$if(C = $\emptyset$) return 0;
$~~$else
$~~~~$opt = infinity;
$~~~~$for each $c\in C$ do
$~~~~~~D=\{d\in C:d<c\}$
$~~~~~~E=\{e-c:e\in D,e>c\}$
$~~~~~~opt = min\{opt,f(D,c)+f(E,n-c)\}$
$~~~~$return $opt+n$;

So you may ask: isn't there too many subsets of C to be put in a table? Observe that only 'consecutive' subsets are needed. And there are only $n \choose 2$ of them.(why?) Another problem is: some entries will change value in $E$. We can walk around this by indicating start and end in each $f$ rather than just specifying the length.

score 0 · Answer 3 · answered Nov 13 '15 at 17:08

This is very similar to Quicksort on a multiset; it's optimal when the cut point is closest to the middle, and then we recurse.

If I gave you a shuffled version of the multiset M = {1,1,1..1,2,2...2,....,m,m..m} where the runs end at each cut point, you would optimally quicksort it by picking a cut $s_k$ nearest the middle as the pivot. The operation of splitting the elements into left and right partitions takes n operations the same way that string splitting does, so you can use the same arguments as Quicksort to show that the median is optimal.

dynamic programming exercise on cutting strings

3 Answers3

Linked