Used Python 2.7 to reproduce. This answer shows the issue with not found backreferences for re.sub in Python 2.7 and some patterns to fix.
Both patterns compile
import re
# both seem identical
regex1 = '^(.+?)(\d+-\d+)?$'
regex2 = '^(.+?)(\d+-\d+)?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile() saves time when re-using multiple times like in this loop.
Fix: test for groups found
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub (docs to 2.7) there are references to groups like the second capturing group (\2) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str) or the compiled variant pattern.match(str) to create a Match object, then Match.groups() to return all found groups as tuple.
import re
regex = '^(.+?)(\d+-\d+)?$' # a key followed by optional digits-range
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def dict_with_expanded_digits(fields_list):
entry_list = []
for fields in fields_list:
(key_digits_range, value) = fields.split() # a pair of ('key0-1', 'value')
# test for match and groups found
match = pattern.match(key_digits_range)
print("DEBUG: groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3rd iteration has only group(1), while group(2) is None
# break to next iteration here, if not maching pattern
if not match:
print('ERROR: no valid key! Will not add to dict.', fields)
continue
# if no 2nd group, only a single key,value
if not match.group(2):
print('WARN: key without range! Will add as single entry:', fields)
entry_list.append( (key_digits_range, value) )
continue # stop iteration here and continue with next
key = pattern.sub(r'\1', key_digits_range)
index_range = pattern.sub(r'\2', key_digits_range)
# no strip needed here
(start, end) = index_range.split('-')
for index in range(int(start), int(end)+1):
expanded_key = "{}{}".format(key, index)
entry = (expanded_key, value) # use tuple for each field entry (key, value)
entry_list.append(entry)
return dict([e for e in entry_list])
list_a = [
'abcd1-2 4d4e', # 2 entries
'xyz0-1 551', # 2 entries
'foo 3ea', # 1 entry
'bar1 2bd', # 1 entry
'mc-mqisd0-2 77a' # 3 entries
]
dict_a = dict_with_expanded_digits(list_a)
print("INFO: resulting dict with length: ", len(dict_a), dict_a)
assert len(dict_a) == 9
Prints:
('DEBUG: groups:', ('abcd', '1-2'))
('DEBUG: groups:', ('xyz', '0-1'))
('DEBUG: groups:', ('foo', None))
('WARN: key without range! Will add as single entry:', 'foo 3ea')
('DEBUG: groups:', ('bar1', None))
('WARN: key without range! Will add as single entry:', 'bar1 2bd')
('DEBUG: groups:', ('mc-mqisd', '0-2'))
('INFO: resulting dict with length: ', 9, {'bar1': '2bd', 'foo': '3ea', 'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
Note on added improvements
- renamed function and variables to express intend
- used tuples where possible, e.g. assignment
(start, end)
- instead of
re. methods used the equivalent methods of compiled pattern pattern.
- the guard-statement
if not match.group(2): avoids expanding the field and just adds the key-value as is
- added
assert to verify given list of 7 is expanded to dict of 9 as expected