Performance difference of function and data recursion in Haskell

Question

While thinking about this other question, I realized that the following functions smoothSeq and smoothSeq'

smoothSeq :: (Integer, Integer, Integer) -> [Integer]
smoothSeq (a, b, c) = result
  where
    result = 1 : union timesA (union timesB timesC)
    timesA = map (* a) $ result
    timesB = map (* b) $ result
    timesC = map (* c) $ result

smoothSeq' :: (Integer, Integer, Integer) -> [Integer]
smoothSeq' (a, b, c) = 1 : union timesA (union timesB timesC)
  where
    timesA = map (* a) $ smoothSeq' (a, b, c)
    timesB = map (* b) $ smoothSeq' (a, b, c)
    timesC = map (* c) $ smoothSeq' (a, b, c)

-- | Merge two sorted sequences discarding duplicates.
union :: (Ord a) => [a] -> [a] -> [a]
union [] ys = ys
union xs [] = xs
union (x : xs) (y : ys)
  | x < y = x : union xs (y : ys)
  | x > y = y : union (x : xs) ys
  | otherwise = x : union xs ys

have drastically different performance characteristics:

ghci> smoothSeq (2,3,5) !! 500
944784
(0.01 secs, 311,048 bytes)
ghci> smoothSeq' (2,3,5) !! 500
944784
(11.53 secs, 3,745,885,224 bytes)

My impression is that smoothSeq is linear in memory and time (as was regularSeq) because result is shared in the recursive definition, whereas smoothSeq' is super-linear because the recursive function definition spawns a tree of computations that independently recompute multiple times the previous terms of the sequence (there is no sharing/memoization of the previous terms; similar to naive Fibonacci).

While looking around for a detailed explanation, I encountered these examples (and others)

fix f = x where x = f x
fix' f = f (fix f)

cycle xs = res where res = xs ++ res
cycle' xs = xs ++ cycle' xs

where again the non-primed version (without ' suffix) is apparently more efficient because it reuses the previous computation.

From what I can see, what differentiates the two versions is whether the recursion involves a function or data (more precisely, a function binding or a pattern binding). Is that enough to explain the difference in behavior? What is the principle behind, that dictates whether something is memoized or not? I couldn't find a definite and comprehensive answer in the Haskell 2010 Language Report, or elsewhere.

Edit: here is another simple example that I could think of:

arithSeq start step = result
  where
    result = start : map (+ step) result
arithSeq' start step = start : map (+ step) (arithSeq' start step)

ghci> arithSeq 10 100 !! 10000
1000010
(0.01 secs, 1,443,520 bytes)
ghci> arithSeq' 10 100 !! 10000
1000010
(1.30 secs, 5,900,741,048 bytes)

The naive recursive definition arithSeq' is way worse than arithSeq, where the recursion "happens on data".

The Report only mandates semantics, not performance. Any Haskell implementation is free to optimize as wanted, as long as the semantics is respected. Anyway, drastically simplifying: in a recursive definition `x = g x` where `x` is a non-function (and monomorphic) value, the value of `x` will be stored the first time is demanded, and reused later. For functions, `f x = g (f x)` won't store any result and recompute `f x` from scratch each time it's demanded. I'd say that your intuition above is basically correct. — chi, May 23 '23 at 12:17
Also: for serious performance measurements, don't use GHCi, since that disables many optimizations to ensure a rapid load time. Compile your code with `-O2`, and possibly use some benchmarking library like criterion. Here on SO we saw many cases where askers wonder why something is slower, when that is no longer the case after proper compilation. — chi, May 23 '23 at 12:20
@chi Thanks a lot, your "drastically simplifying" explanation makes a lot of sense to me now. Regarding performance measurements, I know I'm not being 100% accurate, but what I do for quick experimentation is the following: I write the code in a file, launch `ghci -fobject-code -O2` and `:load` it in compiled form (hopefully optimized?). I'll start learning about how to use Criterion soon, thanks for the suggestion! — Federico, May 23 '23 at 12:32

score 5 · Answer 1 · answered May 23 '23 at 12:21

5

Operationally with GHC and no optimizations, each call to smoothSeq will create one results list and use that same list in all the computation. In contrast, calling smoothSeq' will create one list and then recursively call itself three times which creates three more lists, each of those create three lists of their own, and so on. That explain the immense difference in memory usage and performance.

There is one implementation in between these two extremes, namely:

smoothSeq'' :: (Integer, Integer, Integer) -> [Integer]
smoothSeq'' (a, b, c) = 1 : union timesA (union timesB timesC)
  where
    result = smoothSeq'' (a, b, c)
    timesA = map (* a) result
    timesB = map (* b) result
    timesC = map (* c) result

Note that GHC will optimize smoothSeq' to this form if you compile with -O. Unfortunately, GHC is not smart enough to optimize it all the way to smoothSeq.

answered May 23 '23 at 12:21

Noughtmare

9,410
1
12
38

Thanks @Noughtmare! While it makes sense to me intuitively that that is what's happening, I find my intuition very fragile in discerning whether sharing happens or not. I feel that most of the time I "guess right", but still I feel unsure whether I'm correct or not. Is there a reference place where clear and rigorous rules that govern sharing of values are laid out? – Federico May 23 '23 at 12:40
4

The general rule of thumb is that sharing always happens except under lambdas. And function definitions are considered lambda, e.g. you could (almost) equivalently write `smoothSeq (a,b,c) = ...` as `smoothSeq = \(a,b,c) -> ...`. Values under lambdas will generally not be shared across invocations of that lambda (function calls) except if GHC applies optimizations like the [full laziness optimization](https://stackoverflow.com/q/35115172/15207568). – Noughtmare May 23 '23 at 13:03

Ben · Accepted Answer · 2023-05-24T13:13:53.743

As a rule of thumb, when you bind something to a name all references to that name in the same scope are referring to a single object in memory. Whereas when you have multiple identical expressions (that are more complicated than a reference to a single name) they are different objects in memory that happen to have the same value.¹

So let's see how that general principle applies to your code:

smoothSeq :: (Integer, Integer, Integer) -> [Integer]
smoothSeq (a, b, c) = result
  where
    result = 1 : union timesA (union timesB timesC)
    timesA = map (* a) $ result
    timesB = map (* b) $ result
    timesC = map (* c) $ result

Here there you have given the name result to the value 1 : union timesA (union timesB timesC). All the other places where result are used are in the same scope (the local scope within applications of smoothSeq), so there are referring to a single shared value. That means when timesA is evaluated some more and triggers some more evaluation of result, then timesB and timesC will subsequently see the results of that work already evaluated inside result.

And in this one:

smoothSeq' :: (Integer, Integer, Integer) -> [Integer]
smoothSeq' (a, b, c) = 1 : union timesA (union timesB timesC)
  where
    timesA = map (* a) $ smoothSeq' (a, b, c)
    timesB = map (* b) $ smoothSeq' (a, b, c)
    timesC = map (* c) $ smoothSeq' (a, b, c)

Here you don't have a single name for the expression smoothSeq' (a, b, c), you've just written that expression multiple times. Each of those expressions is an independent call, so they will be represented by a separate object in memory, and evaluating one will have no impact on the others.

Furthermore the scope in which timesA, timesB, and timesC are defined is the scope inside the application of smoothSeq', after it has received its argument. That means that every application of smoothSeq' has its own scope with timesA, timesB, and timesC values, including the calls defining timesA, timesB, and timesC. So every call forks into 3 more calls, unlike the version with result which only entered smoothSeq once. This will be super-linear indeed.

I think it's also worth comparing this to the similarly-structured code in your other question:

regularSeq :: [Integer]
regularSeq = 1 : union timesTwo (union timesThree timesFive)
  where
    timesTwo   = map (* 2) regularSeq
    timesThree = map (* 3) regularSeq
    timesFive  = map (* 5) regularSeq

This is quite like smoothSeq, in that the 3 inner definitions are all working with a single name regularSeq and thus are referring to a single object. But here the scope of that name/object is quite different. In smoothSeq the shared variable result is defined inside the application of smoothSeq. Thus there is a separate result every time smoothSeq is called, but it is shared between all the internal definitions timesA, timesB, and timesC. In regularSeq, the shared name is a name in the global scope; it will thus be shared for the entire lifetime of the program.

Hopefully this illustrates that the key thing to think about when trying to predict what is shared are the variables and their scopes.

¹ Technically the compiler may inline the definition of a name to its usage sites, and thus create distinct objects even when you used the same name. It also might apply common-subexpression-elimination so that multiple identical expressions end up referring to the same value. It could even notice that an expression in a local scope doesn't depend on the arguments of the function it occurs inside, and lift the expression outside to become a single object in memory shared across calls of the function. So if we're being precise you cannot be sure just by looking at the source code, and that's why there isn't any absolutely rigorous reference for what will be shared. The language semantics promise you a result, they don't promise you any particular evaluation strategy.

However these transformations are applied as optimisations. The compiler will try to make sure it only makes these changes when it improves the performance of your code; the GHC developers would likely regard it as a compiler bug if either were applied and made a large negative difference in performance. So it's pretty safe to just think about it as "if you named it then it's the same object, otherwise they are different objects". Any deviation is unlikely to matter; the compiler might surprise you by running faster than you expected from the simple rule of thumb, but it probably won't violate your expectations in a way that creates a large problem. You generally only need to worry about exactly what it does if you're trying to squeeze every last bit of performance out of your code.

This is such a great answer that I will print it and frame it on the wall! A huge thanks, @Ben! — Federico, May 24 '23 at 10:16

Performance difference of function and data recursion in Haskell

2 Answers2