Summary.
A data structure maintains in constant time a sorted list of counter values, for a dynamic set of counters. I am interested in references using this structure, and in possible improvements.
Problem and motivation.
Consider a set of counters that may be increased (by 1), decreased (by 1), deleted (when at 0), or created (with value 0). We want to efficiently update the sorted list of counter values when these operations are performed.
This is helpful for many counting tasks in dynamic contexts where we want to have the maximal value or the median value at any time. Typical examples include counting item occurrences within a bounded time window in any stream (one counter per item), the degrees in dynamic graphs (one counter per vertex/node), etc.
Data structure.
values is the array of counter values sorted in decreasing order: it has one cell per counter, values[0] is the largest counter value, values[1] the second largest (that may be equal to the first one), and so on.
c2pos is the dictionary giving the index of counters in values: c2pos[c] gives the index of the value of c, therefore the value of c is values[c2pos[c]].
pos2c is the array that gives the counter corresponding to index i in values.
val2pos is the dictionary giving the smallest index val2pos[v] of a counter with value v in values.
distrib is the dictionary giving the number of counters of value v, for any v
All structures above are initialized to empty.
All counters are created with initial value 0, and deleted only when their value is 0.
Algorithm.
Addition of counter c:
if
distrib[0]does not exist then set it to0
ifval2pos[0]does not exist then set it to the length ofvalues
increasedistrib[0]
setc2pos[c]to the length ofvalues
appendctopos2c
append0tovalues
Removal of counter c:
decrease
distrib[0]
if it becomes0then remove entry0fromdistribandval2pos
swapcand the last counter invalues
remove entrycfromc2pos
remove the last cell ofpos2c
remove the last cell ofvalues
Increase counter c:
let
vbe the value ofc
swapcand the first counter invalueswith valuev
ifdistrib[v+1]does not exist, then set it to0
ifval2pos[v+1]does not exist, then set it toval2pos[v]
increasedistrib[v+1]
decreasedistrib[v]
increaseval2pos[v]
ifdistrib[v]equals0remove entryvfromdistribandval2pos
increasevalues[c2pos[c]]
Decrease counter c:
let
vbe the value ofc
swapcand the last counter invalueswith valuev
ifdistrib[v-1]does not exist, then set it to0
ifval2pos[v-1]does not exist, then set it toval2pos[v]+distrib[v]
increasedistrib[v-1]
decreasedistrib[v]
increaseval2pos[v]
decreaseval2pos[v-1]
ifdistrib[v]equals0remove entryvfromdistribandval2pos
decreasevalues[c2pos[c]]
Complexity.
If dictionaries are implemented as hash tables, then the expected time of dictionary operations is $O(1)$. If counters are never deleted and if their values are bounded, then one may use arrays instead of hash tables, leading to a $O(1)$ worst case cost for dictionary operations.
If we use dynamic arrays, then array operations are in $O(1)$ amortized time. If we know the total number of counters in advance (or a bound), then the worst case complexity is $O(1)$.
Space complexity is linear with the number of counters, as the arrays and dictionaries contain one entry per counter, or one entry per counter value at most.
Questions.
Is this a well known method (maybe folklore?)? Where does it appear in the literature?
Is it possible to significantly improve it? For instance, are all the mentionned arrays and dictionaries mandatory? I wanted to avoid distrib but something of this kind seems necessary to avoid memory cost to grow with the maximal counter value.