12

Boyer-Moore's majority vote algorithms can be used to determine the majority element in a linear time and constant space.

The intuition behind finding the majority element is understandable as it has to be greater than the count of other elements in the input sequence if a majority element exists.

However, there is a variation of this algorithm to find the elements occurring more than [n / 3] where n is the length of the sequence.

The algorithm goes like this.

  • Start with two empty candidate slots and two counters set to 0.
  • for each item:
    • if it is equal to either candidate, increment the corresponding count
    • else if there is an empty slot (i.e. a slot with count 0), put it in that slot and set the count to 1
    • else reduce both counters by 1

I can understand there will be at most (n /3) - 1 entries so we keep two containers and their counts.

But I'm not sure why the last reduce both by 1 is pivotal to this algorithm. I would be very helpful if you explain the intuition behind this.

Code of the above algorithm

vector<int> majorityElement(vector<int>& nums) {
    int cnt1=0, cnt2=0;
    int a,b;
    for(int n: nums){
        if (cnt1 == 0 || n == a){
            cnt1++;
            a = n;
        }
        else if (cnt2 == 0 || n==b){
            cnt2++;
            b = n;
        }
        else{ // This part
            cnt1--;
            cnt2--;
        }
    }
    cnt1=cnt2=0;
    for(int n: nums){
        if (n==a) cnt1++;
        else if (n==b) cnt2++;
    }
    vector<int> result;
    if (cnt1 > nums.size()/3) result.push_back(a);
    if (cnt2 > nums.size()/3) result.push_back(b);
    return result;
}

Algorithm and code source

thebenman
  • 223
  • 2
  • 7

1 Answers1

22

Here is a different way to consider the same algorithm, or rather its general form with $k$ counters (in your case, $k=2$). There are $k$ "containers" and a trash bag. Each container will contain one type of element at a time. Initially everything is empty. When we read a new element $x$, we consider the following cases:

  • If there is a container containing $x$'s, add $x$ to that container.
  • If there is no container containing $x$ and there is an empty container, put $x$ in the empty container.
  • If there is no container containing $x$ and there are no empty containers, remove one element from each container, and put $x$ in the trash.

When the algorithm terminates, each of the $n$ elements is either found in one of the $k$ containers, or in the trash bag.

Suppose that element $x$, not found in any of the containers, is found $m$ times in the input. Thus all $m$ copies of $x$ are in the trash bag. Each time an element is put in the trash bag, $k$ other elements are also put in the trash bag together with it. This means that the trash bag contains at least $(k+1)m$ elements, and so $(k+1)m \leq n$ or $m \leq n/(k+1)$.

Consequently, if an element appears more than $n/(k+1)$ times in the input then it will necessarily be found in one the $k$ containers. In other words, the algorithm correctly identifies all elements appearing more than $n/(k+1)$ times.

Note how this analysis crucially depends on the fact that when trashing $x$ in the third case above, we also trash $k$ additional elements together with it.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514