0

I'm trying to make a function that takes a string and compresses the repeating blocks. The code I'm using is implemented in such a way that a single character like 'a' would be converted to '1(a)' resulting in a bigger length.

The code is something like this:

import re


def _format_so(bchars, brep):
    return '%i(%s)' % (brep, bchars) if bchars else ''


def char_rep(txt, _format=_format_so):
    output, lastend = [], 0

    for match in re.finditer(r"""(?ms)(?P<repeat>(?P<chars>.+?)(?:(?P=chars))+)""", txt):
        beginpos, endpos = match.span()
        repeat, chars = match.group('repeat'), match.group('chars')

        if lastend < beginpos:
            output.append(_format(txt[lastend:beginpos], 1))
        output.append(_format(chars, repeat.count(chars)))
        lastend = endpos
    output = ''.join(output) + _format(txt[lastend:], 1)
    return output


givenList = ['dwdawdawd', 'aaaaaaaaa', 'abcabcabca']
newList = []

for txt in givenList:
    output_so = char_rep(txt, _format=_format_so)

    newList.append(output_so)

print(newList)


Output = ['1(d)2(wda)1(wd)', '9(a)', '3(abc)1(a)']

I want to make sure that the output will have the shortest length possible. The previous example should output ['d2(wda)1wd', '9(a)', '3(abc)a']

What do you suggest as the best approach for solving this problem?

John L.
  • 39,205
  • 4
  • 34
  • 93

0 Answers0