If I have a large random array of 0s and 1s that I want to sort what kind of an algorithm and data structures should I consider?

Question

What are the types of things that need to be considered if I need to sort a large random array of 0s and 1s?

You can assume large array is in the order of million or billions.

I understand there are tons of sorting algorithms out there (quick, merge, radix,.etc.) and there are so many different data structures out there (trees, skip lists, linked lists, etc.)

If somebody asks me to sort this large array, do I simply jump to Quick Sort and say that's the best solution? If not, what am I supposed to be thinking about?

I'm not even sure if I know the right answer to this question, but I would really appreciate it if somebody in the community can give some advise.

Thanks.

score 9 · Accepted Answer · answered Dec 09 '12 at 03:29

Use counting sort: run through the array once and count the number of 0's. Then run through the array once more and write in it the counted number of 0's, followed by 1's. In any case, this is a purely academic exercise because nobody would ever need to do such a thing in real life.

score 1 · Answer 2 · answered Dec 09 '12 at 03:52

While Andrej Bauer points out that your problem can be solved very efficiently, 0-1 sorting has some interesting and nontrivial aspects. For example, a sorting network is valid if and only if it can sort all sequences of 0s and 1s.

Intuitively, a sorting network is an sorting algorithm that does not change what it does based on previous results. This is not true of, say, quicksort, which recurses differently based on the rank of the chosen pivot (quicksort is clearly not a sorting network for several reasons in its standard form). This is why for sorting networks 0-1 sorting is exactly as difficult as unrestricted sorting--the algorithm can't examine the input to see how to handle it most efficiently. In this case, the most efficient way to handle the 1s and 0s is to not really sort at all but count instead. This option is not available to a sorting network, so instead it performs all operations as usual, costing as much as any other kind of sort.

score 0 · Answer 3 · answered Dec 09 '12 at 10:57

My approach would be this:

ptr1 := start_of_array
ptr2 := end_of_array
while ptr2 > ptr1 :

    while arr[ptr1] == 0 : //pass1
        ptr1++

    while arr[ptr2] == 1 : //pass2
        ptr2--

    if ptr2 > ptr1 :
        swap ptr1, ptr2    //swap

This will work like this :

Input Array : 0 0 0 0 1 1 1 0 1 0 1 1

Pass1 : 0 0 0 0 1 1 1 0 1 0 1 1

Pass2 : 0 0 0 0 1 1 1 0 1 0 1 1

Swap : 0 0 0 0 0 1 1 0 1 1 1 1

Pass1 : 0 0 0 0 0 1 1 0 1 1 1 1

Pass2 : 0 0 0 0 0 1 1 0 1 1 1 1

Swap : 0 0 0 0 0 0 1 1 1 1 1 1

Pass1 : 0 0 0 0 0 0 1 1 1 1 1 1

Pass2 : 0 0 0 0 0 0 1 1 1 1 1 1

Exit from loop

score -3 · Answer 4 · answered Dec 09 '12 at 19:51

This really depends on the number of zeroes and ones your dealing with on a per element basis.

If your dealing with 64 or less, just convert it to a number, pop it into an array, and sort the array.

32 or less, make a blank array, convert the entry to a number, and keep track of the number of times the number passed.

ok If this is a pathological case-- lets say a billion entries each a 16kilobits long.. now your into diskland... you split your file into ~100 mb pieces then use the unix split, sort, and >> operators to implement a merge sort. There is a temptation to just let sort do it all in one go, ignore that temptation it leads to the land of frustration (It might have been a memory limitation). (I did an 800GB sort this way via script a few years back)

Best of luck --Storm

If I have a large random array of 0s and 1s that I want to sort what kind of an algorithm and data structures should I consider?

4 Answers4