16

If it is only concatenation of strings as follows, it finish immediately.

test_str = "abcdefghijklmn123456789"
str1 = ""
str2 = ""

start = time.time()
for i in range(1, 100001):

    str1 = str1 + test_str
    str2 = str2 + test_str

    if i % 20000 == 0:
        print("time(sec) => {}".format(time.time() - start))
        start = time.time()

Constant processing time

time(sec) => 0.013324975967407227
time(sec) => 0.020363807678222656
time(sec) => 0.009979963302612305
time(sec) => 0.01744699478149414
time(sec) => 0.0227658748626709

Inexplicably, assigning a concatenated string to another variable makes the process slower and slower.

test_str = "abcdefghijklmn123456789"
str1 = ""
str2 = ""

start = time.time()
for i in range(1, 100001):

    str1 = str1 + test_str
    # str2 = str2 + test_str
    # ↓
    str2 = str1

    if i % 20000 == 0:
        print("time(sec) => {}".format(time.time() - start))
        start = time.time()

Processing time will be delayed.

time(sec) => 0.36466407775878906
time(sec) => 1.105351209640503
time(sec) => 2.6467738151550293
time(sec) => 5.891657829284668
time(sec) => 9.266698360443115

Both python2 and python3 give the same result.

Harsh Patel
  • 6,334
  • 10
  • 40
  • 73
uma66
  • 175
  • 7
  • 5
    Because you are using a quadratic time algorithm. The interpreter is able to optimize this, but only in some cases. You should not rely on that, and instead, use a linear time algorithm (generally, append to a list then `''.join`) – juanpa.arrivillaga May 11 '19 at 01:26
  • As a result of verifying it according to @shadowranger answer, The cause was found. If only string concatenation is used, the ids of str1 and str2 remain the same, but if str1 is assigned to str2, the id of str1 changes each time. In other words, it is slowed by str1 being allocated every time. – uma66 May 11 '19 at 03:08
  • Note 6 in the docs for [common sequence operations](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations) explicitly state that this behaviour is quadratic. – snakecharmerb May 12 '19 at 07:38

1 Answers1

25

In general, the Python language standard makes no guarantees here; in fact, as defined, strings are immutable and what you're doing should bite you either way, as you've written a form of Schlemiel the Painter's algorithm.

But in the first case, as an implementation detail, CPython (the reference interpreter) will help you out, and concatenate a string in place (technically violating the immutability guarantee) under some fairly specific conditions that allow it to adhere to the spirit of the immutability rules. The most important condition is that the string being concatenated must be referenced in only one place (if it wasn't, the other reference would change in place, violating the appearance of str being immutable). By assigning str2 = str1 after each concatenation, you guarantee there are two references when you concatenate, so a new str must be made by every concatenation to preserve the apparent immutability of strings. That means more memory allocation and deallocation, more (and progressively increasing) memory copies, etc.

Note that relying on this optimization is explicitly discouraged in PEP 8, the Python style guide:

  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

    For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

The note about "only works for some types" is important. This optimization only applies to str; in Python 2 it doesn't work on unicode (even though Python 3's str is based on the implementation of Python 2's unicode), and in Python 3 it doesn't work on bytes (which are similar to Python 2's str under the hood).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • This was exactly what I suspected. – Tom Karzes May 11 '19 at 01:36
  • "so a new str must be made by every concatenation to preserve the apparent immutability of strings." => This was exactly the cause. Thank you!! – uma66 May 11 '19 at 02:23
  • in languages where strings are immutable then there's often an alternate way to build strings efficiently. See [Python string class like StringBuilder in C#?](https://stackoverflow.com/q/2414667/995714) – phuclv May 11 '19 at 04:49
  • @uma66 Kindly, you can try to accept the answer if helped. – Barbaros Özhan Jun 19 '19 at 10:55