To answer your question about dict vs list you'd have to give the exact information about the number of elements, the number of missing elements etc, so tha we could estimate exactly the memory usage of the two data structure and try to predict and/or check their performance.
In general a dict of N key-value pairs requires quite a bit more memory than a list with N values:
- The
dict must keep track of the keys
- The
dict is never more than 2/3 full. When this happens the memory allocated is doubled (this is required to have O(1) amortized time operations on the dict).
However there is an alternative to these data structure which should provide very good performance: blist. The blist package provide an interface that matches that of list, only it is implemented using B-trees. It can efficiently handle sparse lists. Most operations take either O(1) or O(log n) time, so they are quite efficient.
For example you could first create a sparse blist doing:
from blist import blist
seq = blist([None])
seq *= 2**30 # create a 2**30 element blist. Instantaneous!
And then you can set only the indices that have a value:
for i, value in zip(indices, values):
seq[i] = value
The full documentation is here.
Note that blist provides other efficient operations such as:
- Concatenating two
blists take O(log n) time
- Taking an
[i:j] slice takes O(log n) time
- Inserting an element at a given index takes
O(log n) operations
- Popping an element (from every position) takes
O(log n) operations
Since you gave some numbers, here's how they compare to dicts:
>>> from blist import blist
>>> b = blist([None])
>>> b *= 5000
>>> for i in range(100):b[i] = i
...
>>> b.__sizeof__()
2660
>>> d = dict()
>>> for i in range(100):d[i] = i
...
>>> d.__sizeof__()
6216
>>> b = blist([None])
>>> b *= 750
>>> for i in range(30):b[i] = i
...
>>> b.__sizeof__()
1580
>>> d = dict()
>>> for i in range(30):d[i] = i
...
>>> d.__sizeof__()
1608
In both cases a blist takes less memory (in the first case it takes 1/3 of the memory of the equivalent dict). Note that the memory taken by a blist also depends on whether or not the indices are contiguous (contiguous is better). However even using random indices:
>>> b = blist([None])
>>> b *= 5000
>>> import random
>>> for i in range(100):b[random.randint(0, 4999)] = i
...
>>> b.__sizeof__()
2916
It's still much better than the dict.
Even lookup times are better:
In [1]: from blist import blist
...: import random
...:
In [2]: b = blist([None])
In [3]: b *= 5000
In [4]: for i in range(100):b[random.randint(0, 4999)] = i
In [5]: %timeit b[0]
10000000 loops, best of 3: 50.7 ns per loop
In [6]: d = dict()
In [7]: for i in range(100):d[random.randint(0, 4999)] = i
In [10]: %timeit d[1024] # 1024 is an existing key in this dictionary
10000000 loops, best of 3: 70.7 ns per loop
In [11]: %timeit b[1024]
10000000 loops, best of 3: 50.7 ns per loop
Note that a list takes about 47 ns to lookup an index on my machine, so blist is really really close to a list in terms of lookup performance on small lists as what you have.