17

I have been working on the following problem from this book.

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions $3$ and $10$, then making the first cut at position $3$ incurs a total cost of $20 + 17 = 37$, while doing position 10 first has a better cost of $20 + 10 = 30$.

I need a dynamic programming algorithm that given $m$ cuts, finds the minimum cost of cutting a string into $m +1$ pieces.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
Mark
  • 373
  • 1
  • 3
  • 7

3 Answers3

11

The basic idea is: Try out all cut positions as first choice, solve the respective parts recursively, add the cost and choose the minimum.

In formula:

$\qquad \displaystyle \operatorname{mino}(s, C) = \begin{cases} |s| &, |C| = 1 \\ |s| + \min_{c \in C} \left[ \begin{align}&\operatorname{mino}(s_{1,c}, \{c' \in C \mid c' < c\})\ \\ +\ &\operatorname{mino}(s_{c+1,|s|}, \{c' - c \in C \mid c' > c\}) \end{align}\right] &, \text{ else} \end{cases}$

Note that applying memoisation to this recursion actually saves work as switching the order of any successively applied pair of cuts results in the same three subproblems being solved.

Raphael
  • 73,212
  • 30
  • 182
  • 400
2

It is always a good idea to find a recursive algorithm first and then turn it into a table.

  1. $f(C,n)$
  2. $~~$if(C = $\emptyset$) return 0;
  3. $~~$else
  4. $~~~~$opt = infinity;
  5. $~~~~$for each $c\in C$ do
  6. $~~~~~~D=\{d\in C:d<c\}$
  7. $~~~~~~E=\{e-c:e\in D,e>c\}$
  8. $~~~~~~opt = min\{opt,f(D,c)+f(E,n-c)\}$
  9. $~~~~$return $opt+n$;

So you may ask: isn't there too many subsets of C to be put in a table? Observe that only 'consecutive' subsets are needed. And there are only $n \choose 2$ of them.(why?) Another problem is: some entries will change value in $E$. We can walk around this by indicating start and end in each $f$ rather than just specifying the length.

0

This is very similar to Quicksort on a multiset; it's optimal when the cut point is closest to the middle, and then we recurse.

If I gave you a shuffled version of the multiset M = {1,1,1..1,2,2...2,....,m,m..m} where the runs end at each cut point, you would optimally quicksort it by picking a cut $s_k$ nearest the middle as the pivot. The operation of splitting the elements into left and right partitions takes n operations the same way that string splitting does, so you can use the same arguments as Quicksort to show that the median is optimal.

KWillets
  • 1,274
  • 8
  • 9