I recently worked on an algorithm which, among other things, checks strings for equality using the classic builtin equality operator:
str1 == str2
(I think it should be irrelevant to the question, but I faced this issue in C++, and str1 and str2 are obviously std::strings.)
At a later point I decided to relax the equality condition between strings such that it actually does
trimspace(str1) == trimspace(str2)
where trimspace removes leading and trailing whitespace.
I innocently thought that the complexity (at least in the worst case scenario of equal strings) would stay the same, because just like the original code has to check each and every corresponding character of the two strings for equality, also the new one would do the same, just after skipping any leading or trailing whitespace, which is also done by visiting each space character only once.
In reality I observed an enormous slowdown.
Stepping through the code with a debugger, I eventually realized that the str1 == str2 call was translated to a single call to memcmp, which is a builtin that checks two char arrays for equality, whereas the trimspace(str1) == trimspace(str2) resulted in as many calls to memchr as the total number of leading and trailing whitespaces of the two strings plus 4, plus one call to memcmp for comparing the trimmed strings.¹
This clearly resulted in the slowdown I mentioned, I guess because of the cost of the function calls.
I see that the fundamental problem is that, despite the two expressions require both O(N),
- the first one,
str1 == str2, which requires all characters to be compared, doesn't require them to be compared in order, because they can all be compared at the same time, in parallel, so the CPU can take advantage of that; - on the other hand the expression
trimspace(str1) == trimspace(str2)requires to traverse the string in an ordered way (from left to right and from right to left) to trim the (leading and trailing) spaces one by one.
How is this difference between the two computations above formalized in the computational complexity theory?
¹ For example, to compare hello (3 spaces before and 2 spaces after the word hello) with itself, the 3 leading and 2 trailing characeters are successfully compared for equality with the space character, and the h and o are checked unsuccessfully, thus terminating the trimming; and this happens for the two copies of the string, resulting in 14 characters being compared for equality with the speace character, each comparison happening via a call to memchr; then hello is compared with its identical counterpart via one single call to memcmp.
I initially posted this question on stackoverflow; you can see it (deleted) here.