Folds and Infinite Lists

Folds and Infinite Lists

You learned in this chapter that you can only use foldr on infinite lists, but not foldl. Try to work manually through calling foldl on an infinite list What happens? What does this tell you about why you can’t use foldl on an infinite list? Are there any other benefits to foldr when dealing with large but finite lists?

Hints

Click to reveal

Try stepping through this function manually:

findFirstEvenFoldl :: [Int]
findFirstEvenFoldl =
  foldl firstEven [] [1..]
  where
    firstEven result x =
      if even x
      then [x]
      else result
Click to reveal

We can use foldr with inifite lists because we are able to stop processing without trying to consume the entire list. Is this applicable with large but finite lists?

Solution

Click to reveal

Let’s imagine that we want to write a function that finds the first even element in a possibly infinite list using foldl. We’ve seen solutions to similar problems using foldr before, but let’s review before we move onto foldl. We’ll start by creating our own version of foldr for reference, and then writing findFirstEvenFoldr to look for the first even number in an infinite list:

foldr f acc [] = acc
foldr f acc (x:xs) = f x $ foldr f acc xs

findFirstEvenFoldr =
  foldr firstEven [] [1..]
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

As expected, if we run this in ghci we can see that it gives us exactly what we’d expect:

λ findFirstEvenFoldr
[2]

First, let’s inline foldr and make the arguments to the function explicit so that we can step through it more easily. We’ll drop the part of the code that checks for an empty list to keep things more readable. Since we’re dealing with an infinite list, we don’t need to worry about running out of elements.

findFirstEvenFoldr acc (x:xs) =
  firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

Next, let’s apply [1..] to findFirstEvenFoldr and step through it once:

findFirstEvenFoldr [] [1..] =
firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

-- pattern match on [1..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

-- replace x with 1 and xs with [2..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise = result

-- expand result
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise = findFirstEvenFoldr acc [2..]

-- pattern match [2..] and expand the recursive call
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise =
          findFirstEvenFoldr acc (2 : [3..]) =
            firstEven 2 $ findFirstEvenFoldr acc [3..]
            where
              firstEven x result
                | even x = [x]
                | otherwise = result

-- Replace x with 2 and xs with [3..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise =
          findFirstEvenFoldr acc (2 : [3..]) =
            firstEven 2 $ findFirstEvenFoldr acc [3..]
            where
              firstEven 2 result
                | even 2 = [2]
                | otherwise = findFirstEvenFoldr acc [3..]

-- Simplify by removing guards that are false
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result =
      findFirstEvenFoldr acc (2 : [3..]) =
        firstEven 2 $ findFirstEvenFoldr acc [3..]
        where
          firstEven 2 result = [2]

As you can see when working through this example, the value of our accumulator comes from the recursive call we make to foldr. Since we’re only looking at the accumulator if the current element fails to match our predicate, we naturally terminate the recursion as soon as we find a value.

Let’s see what happens now if we try this with foldl:

foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs

findFirstEvenFoldl =
  foldl firstEven [] [1..]
  where
    firstEven result x
      | even x = [x]
      | otherwise = result

Just like before, let’s drop the empty list case for foldl, and inline foldl so that we have a single function we can step through:

findFirstEvenFoldl [] (1 : [2..]) =
  findFirstEvenFoldl (firstEven [] 1) [2..]
  where
    firstEven [] 1
      | even 1 = [1]
      | otherwise = []

You’ll notice once of the key differences between left and right folds in this first example. When we were using foldr, we called firstEven and made a recursive call if (and only if) it needed to look at the accumulator. In this foldl example we’re directly returning the result of our recursive call. Let’s step through a couple more times:

findFirstEvenFoldl [] (1 : [2..]) =
  findFirstEvenFoldl (firstEven [] 1) [2..]
  where
    firstEven [] 1
      | even 1 = [1]
      | otherwise = []

findFirstEven [] (2 : [3..]) =
  findFirstEvenFoldl (firstEven [] 2) [3..]
  where
    firstEven [] 2
      | even 2 = [2]
      | otherwise = []

findFirstEven [2] (3 : [4..]) =
  findFirstEvenFoldl (firstEven [] 3) [4..]
  where
    firstEven [2] 3
      | even 3 = [3]
      | otherwise = [2]

In the last two steps you can see the problem. Even though firstEven has found an even number and returned it, findFirstEvenFoldl continues to make recursive calls.

What about large, but finite, lists?

When dealing with large but finite lists, the choice between foldr and foldl can be a bit more nuanced. The benefits to foldlr still hold: when folding over a large list using an operation that can short-circuit, foldr will tend to be a better choice because it allows you to stop processing the list as soon as you have an answer.

If you need to consume the entire list though, foldl may be the better choice. Why? Look back to the examples from the previous solution. Notice how the examples where we unrolled foldr got more and more nested over time. That’s because each time we use the accumulator value (result in the examples above) in foldr, we’re stuck waiting until we’ve hit the end of the list, then working “backwards” applying functions to get the value we need. This means we’re creating a very deep callstack. Haskell is built for functional programming, and it does better than most languages at dealing with this kind of recursive call, but it can still end up being expensive in the worst case scenario.

Our call to foldl on the other hand is tail recursive. The return value of foldl is either the accumulator value, a direct recursive call. That means we don’t need to increase the depth of our stack. Instead, we can update the accumulator “as we go” resulting is less memory usage and more effecient code when we need to use every element in the list.

Unless you have a specific reason to believe otherwise, it’s generally better to use foldr because you can benefit from laziness and the compiler can generally optimize away the performance penalties. If you do know that your code will benefit from a left fold, be sure to use foldl' instead of plain foldl. These functions work identically, but foldl' is optimized and should generally result in using less memory.