2

This question stems from this question and this answer. I also want to preface this question by stating that this question is done from the perspective of a RAM (or PRAM if it's more accurate term) model.

From the comments in the answer, it seems like when doing algorithm analysis for the solution:

  • $O(n)$ solution (guaranteed): sum up the elements of the first array. Then, sum up the elements of the second array. Finally, perform the substraction.

for the problem of:

finding the number that is different between two arrays (I'm assuming a fixed size structure, if it matters) of unsorted numbers (paraphrased by me)

that it isn't as black and white as just coming to the conclusion as $2n$ (1 pass for each array), because you have to take into account the size of the number too. That is the part I am confused about. I've asked one of the commenters to elaborate a bit more for me:

My idea is that while the time to add two numbers is proportional to their length, but there are $O(\log n)$ extra bits that the partial sums have over the longest input. Now, there are two things that complicate things. First, the inputs need not have the same length, but we'd like complexity in terms of their total length. If your addition is proportional to the longer number, you might rack up $O(n^2)$ quite easily. The solution is to add in place, which means the addition is proportional to the addend length - if not for carry. Now you just need to find how many carries you can have.

Despite this pretty detailed comment, I'm still having difficulty understanding why it's different. (I suspect the reason is due to my ignorance of the lower level ongoings of a computer and numerical representations). Not wanting to test their patience, I googled it and asked here, but unable to properly articulate the question, the most I found was this answer, which seems to echo the quote above (I think), thus didn't help me further.

Is it possible to simplify the explanation by perhaps illustrating it or elaborating further? Or is this as simple as it can get (and I just need to get the prerequisite knowledge)?

Honinbo Shusaku
  • 541
  • 7
  • 17

4 Answers4

3

Actually, you can avoid any higher precision calculations.

Assume you are adding n k-bit numbers in one array, and adding n+1 k-bit numbers in the other array. The difference is a k-bit number (the one extra number that was added to the other array).

So you can perform the addition, and just ignore anything beyond k bits. Instead of getting the sum, you get the sum modulo $2^k$, but that is enough to get the correct result.

gnasher729
  • 32,238
  • 36
  • 56
3

Runtime analysis as it is done in practice is not terribly rigorous. This is why you can have different answers that are equally correct, they just differ in their assumptions.

To do a formally correct runtime analysis you first have to define a machine model. Almost nobody does this explicitly, usually some variant of the RAM is used. Then you write your algorithm using only operations your machine supports. This is not typically done either, usually some form of pseudo-code is used and the mapping to the machine instructions is assumed to be obvious. Only after doing these steps you can start counting how many instructions you use to solve an instance.

In the question you linked, the proposed algorithm summed a list of $n$ numbers. In a casual runtime analysis one typically assumes that adding two numbers takes constant time. This is true for the RAM, but it's not true in the more realistic Word-RAM model. In the Word-RAM (and in real computers), you can only operate on $k$ bits at once. If the numbers you want to add are too big to fit in $k$ bits you need more than one operation. This is the same if you add numbers by hand: For small numbers (say with only 1 digit) you know the result by heart, for large numbers you have to manually add each digit and take care of the carry and so on.

So if you run with the assumption that additions take constant time, summing a list of numbers takes linear time. If you want to be more precise, the length of the list alone is not sufficient to determine the runtime. You need to know how many bits your numbers have and a more reasonable $n$ for your runtime bound is the number of bits you need to encode your list. You should also think about the exact algorithm you use for adding two numbers and the order in which you add the numbers of your list, as this can have an influence on the runtime.

If you want to see examples of very rigorous runtime analysis, I recommend Knuth's TAOCP. He defines his own machine and writes his algorithms using only simple machine instructions.

adrianN
  • 5,991
  • 19
  • 27
0

I am new to this site so don't really have a feel for what is scope

First you need to assume the array elements are unique and not sorted

Let's assume positive integers

Size of the array and size of the elements is not the same. Size of the elements must be >= the size of the array.

The maximum sum would be max element size * number of elements / 2.

A trick to keep the sum down would be to iterate both at once
Also using the 2k from gnasher (+1)

int k = number of bits;  
int modK = power(2,k);
int count = smallerArray.Count;
for(int i = 0; i < count ; i++) 
{
    sum += (laregerArray[i] - smallerArray[i]);
    if(abs(sum) >= modK)
       sum = sum%modK;
}
sum += laregerArray[count];
paparazzo
  • 431
  • 3
  • 13
0

Remember "Rolling hash", the value of hash after applying hash function on key could be a very long integer but it is explicitly multiplied by mod P (as X mod P), where X is the final hash value and P is a prime number, say 31. With multiplication by (mod P), the final value is restrained to 1 word size only. The final value is a small integer.

that is, 38269540 mod 31 is 9 only where 9 is a small hash value as compared to 38269540.

hi.nitish
  • 109
  • 2