I have not worked with threading in Python at all and asking this question as a complete stranger.
I am wondering if defaultdict is thread-safe. Let me explain it:
I have
d = defaultdict(list)
which creates a list for missing keys by default. Let's say I have multiple threads started doing this at the same time:
d['key'].append('value')
At the end, I'm supposed to end up with ['value', 'value']. However, if the defaultdict is not thread-safe, if the thread 1 yields to thread 2 after checking if 'key' in dict and before d['key'] = default_factory(), it will cause interleaving, and the other thread will create list in d['key'] and append 'value' maybe.
Then when thread 1 is executing again, it will continue from d['key'] = default_factory() which will destroy the existing list and value, and we will end up in ['key'].
I looked at CPython source code for defaultdict. However, I could not find any locks or mutexes. I guess it is not thread-safe as long as it is documented so.
Some guys last night on IRC said that there is GIL on Python, so it is conceptually thread-safe. Some said threading should not be done in Python. I'm pretty confused. Ideas?