I would like to use re module with streams, but not necessarily file streams, at minimal development cost.
For file streams, there's mmap module that is able to impersonate a string and as such can be used freely with re.
Now I wonder how mmap manages to craft an object that re can further reuse. If I just pass whatever, re protect itself against usage of too incompatible objects with TypeError: expected string or bytes-like object. So I thought I'd create a class that derives from string or bytes and override a few methods such as __getitem__ etc. (this intuitively fits the duck typing philosophy of Python), and make them interact with my original stream. However, this doesn't seem to work at all - my overrides are completely ignored.
Is it possible to create such a "lazy" string in pure Python, without C extensions? If so, how?
A bit of background to disregard alternative solutions:
- Can't use
mmap(the stream contents are not a file) - Can't dump the whole thing to the HDD (too slow)
- Can't load the whole thing to the memory (too large)
- Can seek, know the size and compute the content at runtime
Example code that demonstrates bytes resistance to modification:
class FancyWrapper(bytes):
def __init__(self, base_str):
pass #super() isn't called and yet the code below finds abc, aaa and bbb
print(re.findall(b'[abc]{3}', FancyWrapper(b'abc aaa bbb def')))