Working with Lists

Working with Lists

In this chapter you’ll learn about how to work with Lists. Along the way, you’ll improve you intution for how to write recursive functions, and get hands on experience with one of Haskell’s most powerful features: pattern matching.

Reversing a List with Folds

It’s possible to easily implement a reverse function using folds. Try to implement a function that will reverse a list using both foldl and foldr. Which one is simpler? why? Might one be more efficient than the other?

Hints

Click to reveal

Reversing a list is really just creating a new list one element at a time so that by the time you’re done adding elements, the final list is in the reverse order. For both of your folds, the initial accumulator value should be an empty list.

Click to reveal

Regardless of the fold you are using, the general shape of the solution will look something like this:

reverseFold = fold insertElem []

The definition of insertElem will different depending on the fold you are using.

Remember: For a left fold, you’ll add new elements to the list from first to last. For a right fold, you’ll be adding elements from last to first.

Click to reveal

Imagine that you want to reverse the list [1,2,3]. You could rewrite thestarting list more explicitly as:

1 : 2 : 3 : []

The reversed version of that list is [3,2,1] or, more explicitly:

3 : 2 : 1 : []

For a left fold the first call to insertElem will pass in the starting accumulator and the first value of the list:

insertElem 1 []

The second call will pass in the next value, and the previous accumulator, and so on. For our list [1,2,3] the calls will end up looking like this:

insertElem 3 $
  insertElem 2 $
    insertElem 1 []

Try to compare that to the shape of our reversed list and see if you can spot a function you already know that would do that. You might need to consider reversing the order of arguments of that function.

Solution

Click to reveal

Reversing a list using a foldl can be done by prepending each new element to the front of the new list. Since foldl is left-associative, we’ll start with the first element of our old list. Adding each new element to the beginning of the reversed list means we’ll finish by adding the final element of the original list to the beginning of the new list. Following the pattern from the earlier hint, we could a solution like this:

reverseLeft = foldl insertElem []
  where insertElem reversed a = a : reversed

Alternatively, we can use the flip function to make this a bit more compact. Remember that flip just flips the arguments of a function:

flip f b a = f a b

Our foldl version of insertElem is just a flipped version of (:), so we can rewrite our reverse function as:

reverseLeft = foldl (flip (:)) []

We can also reverse a list using foldr but this will be less efficient. Since foldr is right associative, we start adding elements from the end of the input list. That means each new element we process needs to be added to the end of our accumulated list:

reverseRight = foldr insertElem []
  where
    insertElem a reversed = reversed <> [a]

This is less efficient because we have to walk through the entire reversed list for every item we add, so that we can insert new items at the end.

Zipping Lists

The zip function is a special case of a more general function available in Prelude called zipWith. The zipWith function combines two lists according to a function. Consider this implementation of zip in terms of zipWith:

λ let zip' = zipWith (,)
λ zip' [1..5] [5,4..1]
[(1,5),(2,4),(3,3),(4,2),(5,1)]

Implement the zipWith function with and without using list comprehensions. Can you implement zipWith using foldl?

Hints

Click to reveal

You can use pattern matching to easily figure out if either of the lists that you are zipping is empty.

Click to reveal

The zip and zipWith functions in Prelude always return a list as long as the shortest input list. If either list is empty, they return an empty list. Let’s look at a couple of examples:

λ zip [] [1..100]
[]

λ zip [1..100] []
[]

λ zip [1] [2..100]
[(1,2)]
Click to reveal

Implementing zipWith using list comprehensions will be tricky. Remember that by default a list comprehension will generate every combination of elements:

λ [(a,b) | a <- [1,2,3], b <- [1,2,3]]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Can you think of any ways to work around this?

Solution

Click to reveal

As you might expect, it’s possible to implement zipWith with manual recursion, foldl, or using a list comprehension. In fact, there are several different options, all with their own tradeoffs.

Let’s start by looking at the manual recursive definition. We’ll use pattern matching to decide whether we have enough data to create a new value. Naively, we could check if either list is empty, and otherwise return a value:

exampleZipWith f [] bs = []
exampleZipWith f as [] = []
exampleZipWith f (a:as) (b:bs) = f a b : exampleZipWith f as bs

Although it works, this approach is a bit more verbose than necessary. If either list is empty, or if both of them are, then we’re returning an empty list. We’re only applying f when both lists are non-empty. If we put that pattern first, then we can use a wildcard pattern for all other cases:

exampleZipWith f (a:as) (b:bs) = f a b : exampleZipWith f as bs
exampleZipWith _f _as _bs = []

Alternatively, you could use a case expression to implement this function. The logic is the same, but we’ll use a single implementation of our function:

exampleZipWithCase f a b =
  case (a,b) of
    (a':as, b':bs) -> f a' b' : exampleZipWith f as bs
    _ -> []

Another option for implementing our function with manual recursion would be to use a guard. Naively you might want to use null and head to implement this function using a guard:

exampleGuard f as bs
  | null as || null bs = []
  | otherwise = f (head as) (head bs) : exampleGuard f (tail as) (tail bs)

Although technically safe and correct, since we’re testing for empty lists before using the partial head function, it’s common practice to avoid partial functions like head in general, even when we know them to be safe. In that case, we can use pattern guards to pattern match inside of a guard clause:

examplePatternGuard f as bs
  | (a:as') <- as, (b:bs') <- bs = f a b : examplePatternGuard f as' bs'
  | otherwise = []

You’ll notice that the syntax here is the same as the syntax when working with a list comprehension. We use the left arrow (<-) to pattern match on a value. If any of the patterns fail , then the guard clause fails and we move onto the next one.

Now that you’ve seen how to implement zipWith using manual recursion, can you do it using foldl or a list comprehension? Try it yourself, or click below to see the next part of the solution.

Click to reveal

Now that you’ve implemented a manually recursive version of zipWith, let’s move our attention to a version that uses foldl. If we’re willing to cheat a little bit, our implementation is pretty straightforward:

zipWithFoldl f as bs = reverse $ foldl applyFunction [] zipped
  where
    zipped = zip as bs
    applyFunction accumulator (a,b) = f a b : accumulator

In this solution we’re using zip to handle combining each pair of elements in the two lists. Afterwards, we take one trip through the list with foldl and apply f to each pair of arguments. You’ll notice that we’re prepending each new value to the beginning of our accumulator, and then reversing the whole list at the end. Doing a single call to reverse at the end of our fold lets us avoid having to update the entire list every time we add a new element. Alternatively, we could use foldr and save ourselves the call to reverse:

zipWithFoldr f as bs = foldr applyFunction [] zipped
  where
    zipped = zip as bs
    applyFunction (a,b) accumulator = f a b : accumulator

Since the exercise asked us to solve this with foldl let’s stick with that. If we don’t want to cheat by using zip, we can still solve the problem with foldl, but we need to do a bit more work to keep track of our two lists.

Instead of zipping both lists together, then applying our function, we can do both in a single step:

zipWithFoldl' f as bs =
  reverse $ fst results
  where
    results = foldl applyFunction ([], as) bs
    applyFunction (zipped, []) _ = (zipped, [])
    applyFunction (zipped, x:xs) val = (f x val : zipped, xs)

This function isn’t too different from our original version. We’re still starting with an empty list in our accumulator, and we’re still calling f for each item in our fold. What’s different is that our accumulator value is now keeping track of both the new list that we’re building up, and also the second list that we’re slowly breaking down. If as is shorter than bs we’ll start ignoring any new values in the fold and return the list that we were able to build up as long as we had values in each list.

Once we’re finished with the fold, we’re left with a tuple that contains both the new list, as well as any remainder of as that we weren’t able to process. We discard the leftover as and return the reversed list just like we did with our earlier foldl implementation.

Now that you’ve seen how to implement zipWith using both manual recursion and foldl, you can try to implement it with a list comprehension yourself, or expand the solution below.

Click to reveal

Now that you’ve written zipWith using both foldl and with manual recursion, the last task in this exercise is to build a version that uses list comprehensions. This is the most challenging of the three parts of this problem, because we’re working against the language. This part of the example shows that just because you can use a feature to do something doesn’t mean it’s the best way to do it.

The problem with using a list comprehension to implement zipWith is, as you may recall from the chapter, a list comprehension returns a value for each pairing of our two lists. That means the naive approach won’t work. Let’s try it and see why:

exampleZipWithComprehensionBad f as bs = [f a b | a <- as, b <- bs]

Let’s load this up in ghci and compare the behavior of this function with the real zipWith:

λ zipWith (,) [1,2] [3,4]
[(1,3),(2,4)]

λ exampleZipWithComprehensionBad (,) [1,2] [3,4]
[(1,3),(1,4),(2,3),(2,4)]

As you can see, the real definition of zipWith pairs the first element of the first list with the first element of the second list, and so on, until it reaches the end of one of the lists. Our list comprehension version pairs the first element of the first list with each element of the second list, and so on, until it’s gone through every pairing. That’s a significantly different behavior.

So, how can work work around this? Just like with our earlier foldl example, the easiest option is to cheat by using zip:

exampleZipWithComprehension f as bs = [f a b | (a,b) <- zip as bs]

Not only does using zip mean that we don’t need to worry about one list being longer than the other, it also combines our two lists so that we don’t need to worry about the fact that list comprehensions don’t combine elements the way we want for zipWith.

If we don’t want to cheat by using zip, then we need to be a bit creative in how we approach the problem. Using two lists won’t work, but how can we get a single list out of our two lists if we’re not combining them with zip? Let’s think again about the nature of our problem: We want to combine the first element of as with the first element of bs, then we want to combine the second element of as with the second element of bs and so on until we reach the end of one of our two lists. Although we have two lists, at each step we’re combining the values at the same index. All we need to do is to step through the list of indexes.

exampleZipWithComprehensionIndex f as bs =
  [f (as !! idx) (bs !! idx) | idx <- [0 .. len - 1]]
  where
    len = min (length as) (length bs)

As you can see, moving to an index based approach to using a list comprehension lets get an implementation that’s fairly easy to read, and doesn’t require that we use zip.

Thinking about how to implement something like zipWith using list comprehensions is a great way to stretch your mind and think about the different ways you can apply the features of Haskell creatively, but in practice this isn’t the way we’d normally implement something like this. Although the (!!) operator should be safe in this example since we’re checking the length of our inputs, it’s still an unsafe operation. Indexing into the lists repeatedly is also much less efficient than even using a foldl and reversing the output. Indexing into a list requires that we traverse the whole list up to the element we want, and so repeated indexing ends up being more work than walking through the list twice (once for the foldl and again for the reverse).

Implementing concatMap

The concat function joins a list of lists into a single list:

concat [[1,2,3],[4,5,6]]
[1,2,3,4,5,6]

Prelude provides a function named concatMap that can be naively implemented by composing concat and map:

concatMap f = concat . map f

Try implementing concatMap using foldl and foldr. Which one is better? Why?

Hints

Click to reveal

Remember that you can concatenate two lists using the (<>) operator:

λ [1,2,3] <> [4,5,6]
[1,2,3,4,5,6]
Click to reveal

You can concatenate two lists by folding over them using the (<>) operator.

λ concatFoldl = foldl (<>) []
λ concatFoldl [[1,2],[3,4],[5,6]]
[1,2,3,4,5,6]
Click to reveal

You can map over a list by applying your function and then adding the result to a new list using (:)

λ mapFoldr f = foldr (\x acc -> f x : acc) []
λ mapFoldr (+1) [1..10]
[2,3,4,5,6,7,8,9,10,11]

Solution

Click to reveal

We can write concatMap easily using either foldl or foldr. Let’s take a look at a foldr based version first:

concatMapFoldr f = foldr (\x acc -> f x <> acc) []

Like the other examples where we’ve used foldr, this version of concatMap is lazy and well-behaved when we need to deal with infinite lists. The most obvious example of this is directly passing an infinite list to concatMapFoldr:

λ take 10 $ concatMapFoldr (\x -> [x,x]) [1..]
[1,1,2,2,3,3,4,4,5,5]

We can also handle cases where the input is a finite list, but our function outputs infinite lists that we we want to concatenate together. You’ll notice in this example we’re using a new function: repeat. This function takes a value and generates a list of that value repeated indefinitely:

-- When we return an infinite list for each finite input element
λ take 10 $ concatMapFoldr (\x -> repeat x) [1,2,3]
[1,1,1,1,1,1,1,1,1,1]

As you might expect, we can also handle both cases at the same time. Even with an infinitely long input, and a function that generates infinitely long lists from each input, we’re still able to get a finite result:

-- When we return an infinite list for each of infinite inputs
λ take 10 $ concatMapFoldr (\x -> repeat x) [1..]
[1,1,1,1,1,1,1,1,1,1]

We can implement a foldl based version of concatMap just as easily as our foldr version, we just need to flip around the order of some of our operations:

concatMapFoldl f = foldl (\acc x -> acc <> f x) []

As you might expect, this version of concatMap doesn’t do well with infinite lists. Whether you pass in an finitely list, generate one with the function you pass in, or both, concatMapFoldl will hang and ever complete.

So, Which One Is Better?

In most cases, concatMapFoldr is the version of this function that we’d want. If we look at the behavior of the concatMap defined for us in Prelude you’ll notice it’s behavior with infinite lists matches our foldr based implementation:

λ take 10 $ concatMap repeat [1..]
[1,1,1,1,1,1,1,1,1,1]

Maps and Folds

Think about the following two lines of code that use map and foldr. When might they do the same thing? When might they differ? How might that change if you used foldl instead of foldr?

λ \f g -> foldr g 0 . map f
λ \f g -> foldr (g . f) 0

Hints

Click to reveal

Try writing out the definitions of foldr and map individually. Then, write out the definition of a function that behaves like foldr (g . f). Are they the same?

Click to reveal

If you’re having trouble with this exercise, consider returning to it after finishing chapters 3 and 4. Using the type system can help you get a better understanding of different functions.

Solution

Click to reveal

It turns out that these two examples should always behave exactly the same. In the first example, we first use map f to transform every element of our list, and then fold over the resulting values. In the second example, each time we’d apply our fold function we first transform the element we’re about to fold, again using f. Since Haskell is pure lazy language, it turns out that these should always return the same value.

To help make things a little bit more clear, let’s walk through a short example. Imagine that we wanted to double a list of numbers, and then add them up.

Let’s start by giving names to our two different possible functions:

composeMapFoldr f g = foldr g 0 . map f
composeFandGFoldr f g = foldr (g . f) 0

If we call them in ghci we can see that with some casual testing they do in fact seem to return the same thing:

λ double a = a * 2
λ add a b = a + b

λ composeMapFoldr double add [1..10]
110

λ composeFandGFoldr double add [1..10]
110

Let’s use a smaller list and step through each of these and see how they work. We’ll start with composeMapFoldr:

composeMapFoldr double add = foldr add 0 . map double

-- If we apply [1,2,3] we'll get
composeMapFoldr double add [1,2,3] = foldr add 0 (map double [1,2,3])

-- Next, let's apply `map double` manually
composeMapFoldr double add [1,2,3] = foldr add 0 [double 1, double 2, double 3]

-- Now we can apply foldr. Refer to the chapter for a definition of foldr
foldr add 0 [double 1, double 2, double 3] =
  if null [double 1, double 2, double 3] -- false
  then 0
  else add (double 1) $ foldr add 0 [double 2, double 3]

-- Expanding another step
foldr add 0 [double 1, double 2, double 3] =
  add (double 1) $
    if null [double 2, double 3] -- false
    then 0
    else add (double 2) $ foldr add 0 [double 3]

-- Expanding one more time
foldr add 0 [double 1, double 2, double 3] =
  add (double 1) $
    add (double 2) $
     add (double 3) $
       if null [] then 0 else ...

-- Simplifying
foldr add 0 [double 1, double 2, double 3] =
  add (double 1) (add (double 2) (add (double 3) 0))

= 12

Next, let’s do the same exercise with composeFandGFoldr:

composeFandGFoldr double add [1,2,3] = foldr (add . double) 0 [1,2,3]

-- No need to apply map first, let's go to the first step of foldr
foldr (add . double) 0 [1,2,3] =
  if null [1,2,3]
  then 0
  else add (double 1) $ foldr (add . double) 0 [2,3]

-- Expanding another step
foldr (add . double) 0 [1,2,3] =
  add (double 1) $
    add (double 2) $ foldr (add . double) 0 [3]

-- Expanding one more time
foldr (add . double) 0 [1,2,3] =
  add (double 1) $
    add (double 2) $
      add (double 3) $
        if null [] then 0 else ...

-- Simplifying
foldr add 0 [double 1, double 2, double 3] =
  add (double 1) (add (double 2) (add (double 3) 0))

= 12

As you can see, these two functions end up behaving exactly the same.

Unfortunately, the story with foldl turns out to be a little bit more complicated. In theory, thanks to laziness and purity, we ought to be able to get the same easy substitution with foldl, but the the implementation details mean we have to work a bit harder. Let’s start by recreating our two original functions, this time using foldl

double a = a * 2
add a b = a + b

composeMapFoldl f g = foldl g 0 . map f
composeFandGFoldl f g = foldl (g . f) 0

Like before, let’s load these up in ghci and see what happens:

λ composeMapFoldl double add [1..10]
110

λ composeFandGFoldl double add [1..10]
2036

That last result is certainly surprising! Let’s try to expand composeFandGFoldl manually and see if we can spot the problem.

composeFandGFoldl double add [1,2,3] = foldl (g . f) 0

-- This time we start by doubling our carry value, then adding
foldl (add . double) 0 [1,2,3] =
  if null [1,2,3]
  then 0
  else foldl (add (double 0) 1) [2,3]

-- Simplifying our recursive call to foldl
foldl (add . double) 0 [1,2,3] =
  if null [1,2,3]
  then 0
  else foldl 1 [2,3]

-- Expanding another step
foldl (add . double) 0 [1,2,3] =
  if null [2,3]
  then 1
  else foldl (add (double 1) 2) [3]

-- Expanding again
foldl (add . double) 0 [1,2,3] =
  if null [3]
  then 4
  else foldl (add (double 3) 4) []

-- Expanding one last time
foldl (add . double) 0 [1,2,3] =
  if null []
  then (add (double 3) 4)
  else ...

-- Simplifying
foldl (add . double) 0 [1,2,3] =
  if null []
  then 11
  else ...

= 11

As you walk through this code, you can see how the behavior of foldl and foldr differs. With foldl we’re starting with our accumulator value, and doubling it each time before we add the next element in the list.

What About Infinite Lists?

Hopefully by now you are convinced that our two foldr examples behave the same way for some finite length list, but what about cases where the list we’re mapping and folding over is infinite? You’ve already seen an example in the chapter of how foldl and foldr behave differently when working with infinite lists thanks to laziness; now let’s see if foldr will behave the same way with and without map when working with infinite lists.

Before we dive into the mechanics, let’s try testing the behavior experimentally. We’ll need to a function that will let us get a result when we’re folding over an infinite list. Luckily, we already defined one earlier in the chapter: findFirst. Let’s review the definition quickly:

findFirst predicate =
  foldr findHelper []
  where
    findHelper listElement maybeFound
      | predicate listElement = [listElement]
      | otherwise = maybeFound

You’ve already seen this work with infinite lists. For example, we can find the first even number in an infinite list:

λ findFirst even [1..]
[2]

Let’s try running a version of this that won’t return at all. First, we’ll make an infinite list of odd numbers. Remember that you can start a range with two numbers to specify the step size:

odds = [1,3..]

Next, let’s try finding the first even number in our list of odd numbers. Obviously this won’t even return a value, so you can press control+c to stop trying once you are satisfied:

λ findFirst even odds
Interrupted.

Next, let’s add two new versions of findFirst that each take another function. findFirstCompose will apply the function using composition, and findFirstMap will use map:

findFirstCompose predicate g =
  foldr (findHelper . g) []
  where
    findHelper listElement maybeFound
      | predicate listElement = [listElement]
      | otherwise = maybeFound

findFirstMap predicate g =
  foldr findHelper [] . map g
  where
    findHelper listElement maybeFound
      | predicate listElement = [listElement]
      | otherwise = maybeFound

Let’s load these up into ghci and test them by once again looking for even numbers- but this time we’ll turn any even number greater than 9 into an odd number:

λ incrementOverNine n = if n > 9 then n + 1 else n
λ odds = [1,3..]
λ> findFirstCompose even incrementOverNine odds
[12]
λ> findFirstMap even incrementOverNine odds
[12]

It appears that, even when working with infinite lists, both the composition and mapping versions of our function remain identical. In a way this might not be surprising- we’ve shown that they should behave the same way when we manually stepped through each version, but it might still be unintuitive that we can apply map to an infinite list to get back a new infinite list, and then use foldr on that infinite list and get back a regular value.

To understand why that works, let’s revisit on of our earlier examples and refine it a bit. Earlier, when we were looking at how we could manually expand \f g -> foldr g 0 . map f you’ll recall that we made a substitution like this:

map double [1,2,3] = [double 1, double 2, double 3]

This kind of substitution is fine as long as we’re working with some small finite list, but once we’re dealing with an infinite list you’ll be wondering: how can we safely replace a call to map with a list where every element has a function applied, when the list itself never ends. The answer is that we can’t, but we don’t have to!

Instead of imagining map as transforming the entire list at once, we can think of it in terms of a transformation to the head and tail of a list. In other words, if we can imagine a list as a head and tail:

list = head list : tail list

Then we can think of map f as applying f to the head and map f to the tail:

map f list = f (head list) : map f (tail list)

More concretely:

map double [1,2,3] = double 1 : map double [2,3]
                   = double 1 : (double 2 : map double [3])
                   = double 1 : (double 2 : (double 3 : map double []))
                   = double 1 : (double 2 : (double 3 : []))
                   = [double 1, double 2, double 3]

The thing that makes this all work is that we never need to evaluate any further than the head of the list. Let’s take a look at how this works with one last step through a problem. This time, we’ll look at using foldr and map over our infinite list of odds.

incrementOverTwo = if n > 2 then n + 1 else n

findFirstMap predicate g =
  foldr findHelper [] . map g
  where
    findHelper listElement maybeFound
      | predicate listElement = [listElement]
      | otherwise = maybeFound

findFirstMap even incrementOverTwo odds =

-- Replace incrementOverTwo and odds with their definitions
findFirstMap even (\n -> if n > 2 then n + 1 else n) [1,3,..] =

-- expand findFirstMap
foldr findHelper [] . map (\n -> if n > 2 then n + 1 else n) [1,3,..] =
  if null (map (\n -> if n > 2 then n + 1 else n) [1,3,..])
  then []
  else findHelper ((\n -> if n > 2 then n + 1 else n) 1) $ foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..]

-- simplify
foldr findHelper [] . map (\n -> if n > 2 then n + 1 else n) [1,3,..] =
  findHelper (if 1 > 2 then 2 else 1) $ foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..]

-- simplify
foldr findHelper [] . map (\n -> if n > 2 then n + 1 else n) [1,3,..] =
  findHelper 1 (foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..])

-- expand findHelper
findHelper 1 (foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..])
  | even 1 = [1]
  | otherwise = foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..]

-- 'even 1' is false, so simplify the 'otherwise' branch
foldr findHelper [] $ map (\n -> if n > 2 then n + 1 else n) [3,5..]

-- expand foldr again
if null (map (\n -> if n > 2 then n + 1 else n) [3,5..])
then []
else findHelper ((\n -> if n > 2 then n + 1 else n) 3) $ (map (\n -> if n > 2 then n + 1 else n) [5,7..])

-- simplify
if null (map (\n -> if n > 2 then n + 1 else n) [3,5..])
then []
else findHelper 4 $ map (\n -> if n > 2 then n + 1 else n) [5,7..]

-- expand findHelper again
findHelper 4 $ map (\n -> if n > 2 then n + 1 else n) [5,7..]
  | even 4 = 4
  | otherwise = ... -- we wont' get here

As you can see in this example, each time we step through another call to foldr we first apply the function that we’re mapping, then we continue with the fold. Since the recursive call is passed in as the second argument to findHelper, we stop recursively walking through our list as soon as we find a number that’s even after mapping. Thanks to laziness and the way map and foldr work, we’re able to map and fold over an infinite list and still get out a normal value.

Folds and Infinite Lists

You learned in this chapter that you can only use foldr on infinite lists, but not foldl. Try to work manually through calling foldl on an infinite list What happens? What does this tell you about why you can’t use foldl on an infinite list? Are there any other benefits to foldr when dealing with large but finite lists?

Hints

Click to reveal

Try stepping through this function manually:

findFirstEvenFoldl :: [Int]
findFirstEvenFoldl =
  foldl firstEven [] [1..]
  where
    firstEven result x =
      if even x
      then [x]
      else result
Click to reveal

We can use foldr with inifite lists because we are able to stop processing without trying to consume the entire list. Is this applicable with large but finite lists?

Solution

Click to reveal

Let’s imagine that we want to write a function that finds the first even element in a possibly infinite list using foldl. We’ve seen solutions to similar problems using foldr before, but let’s review before we move onto foldl. We’ll start by creating our own version of foldr for reference, and then writing findFirstEvenFoldr to look for the first even number in an infinite list:

foldr f acc [] = acc
foldr f acc (x:xs) = f x $ foldr f acc xs

findFirstEvenFoldr =
  foldr firstEven [] [1..]
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

As expected, if we run this in ghci we can see that it gives us exactly what we’d expect:

λ findFirstEvenFoldr
[2]

First, let’s inline foldr and make the arguments to the function explicit so that we can step through it more easily. We’ll drop the part of the code that checks for an empty list to keep things more readable. Since we’re dealing with an infinite list, we don’t need to worry about running out of elements.

findFirstEvenFoldr acc (x:xs) =
  firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

Next, let’s apply [1..] to findFirstEvenFoldr and step through it once:

findFirstEvenFoldr [] [1..] =
firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

-- pattern match on [1..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven x $ findFirstEvenFoldr acc xs
  where
    firstEven x result
      | even x = [x]
      | otherwise = result

-- replace x with 1 and xs with [2..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise = result

-- expand result
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise = findFirstEvenFoldr acc [2..]

-- pattern match [2..] and expand the recursive call
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise =
          findFirstEvenFoldr acc (2 : [3..]) =
            firstEven 2 $ findFirstEvenFoldr acc [3..]
            where
              firstEven x result
                | even x = [x]
                | otherwise = result

-- Replace x with 2 and xs with [3..]
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result
      | even 1 = [1]
      | otherwise =
          findFirstEvenFoldr acc (2 : [3..]) =
            firstEven 2 $ findFirstEvenFoldr acc [3..]
            where
              firstEven 2 result
                | even 2 = [2]
                | otherwise = findFirstEvenFoldr acc [3..]

-- Simplify by removing guards that are false
findFirstEvenFoldr [] (1 : [2..]) =
  firstEven 1 $ findFirstEvenFoldr acc [2..]
  where
    firstEven 1 result =
      findFirstEvenFoldr acc (2 : [3..]) =
        firstEven 2 $ findFirstEvenFoldr acc [3..]
        where
          firstEven 2 result = [2]

As you can see when working through this example, the value of our accumulator comes from the recursive call we make to foldr. Since we’re only looking at the accumulator if the current element fails to match our predicate, we naturally terminate the recursion as soon as we find a value.

Let’s see what happens now if we try this with foldl:

foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs

findFirstEvenFoldl =
  foldl firstEven [] [1..]
  where
    firstEven result x
      | even x = [x]
      | otherwise = result

Just like before, let’s drop the empty list case for foldl, and inline foldl so that we have a single function we can step through:

findFirstEvenFoldl [] (1 : [2..]) =
  findFirstEvenFoldl (firstEven [] 1) [2..]
  where
    firstEven [] 1
      | even 1 = [1]
      | otherwise = []

You’ll notice once of the key differences between left and right folds in this first example. When we were using foldr, we called firstEven and made a recursive call if (and only if) it needed to look at the accumulator. In this foldl example we’re directly returning the result of our recursive call. Let’s step through a couple more times:

findFirstEvenFoldl [] (1 : [2..]) =
  findFirstEvenFoldl (firstEven [] 1) [2..]
  where
    firstEven [] 1
      | even 1 = [1]
      | otherwise = []

findFirstEven [] (2 : [3..]) =
  findFirstEvenFoldl (firstEven [] 2) [3..]
  where
    firstEven [] 2
      | even 2 = [2]
      | otherwise = []

findFirstEven [2] (3 : [4..]) =
  findFirstEvenFoldl (firstEven [] 3) [4..]
  where
    firstEven [2] 3
      | even 3 = [3]
      | otherwise = [2]

In the last two steps you can see the problem. Even though firstEven has found an even number and returned it, findFirstEvenFoldl continues to make recursive calls.

What about large, but finite, lists?

When dealing with large but finite lists, the choice between foldr and foldl can be a bit more nuanced. The benefits to foldlr still hold: when folding over a large list using an operation that can short-circuit, foldr will tend to be a better choice because it allows you to stop processing the list as soon as you have an answer.

If you need to consume the entire list though, foldl may be the better choice. Why? Look back to the examples from the previous solution. Notice how the examples where we unrolled foldr got more and more nested over time. That’s because each time we use the accumulator value (result in the examples above) in foldr, we’re stuck waiting until we’ve hit the end of the list, then working “backwards” applying functions to get the value we need. This means we’re creating a very deep callstack. Haskell is built for functional programming, and it does better than most languages at dealing with this kind of recursive call, but it can still end up being expensive in the worst case scenario.

Our call to foldl on the other hand is tail recursive. The return value of foldl is either the accumulator value, a direct recursive call. That means we don’t need to increase the depth of our stack. Instead, we can update the accumulator “as we go” resulting is less memory usage and more effecient code when we need to use every element in the list.

Unless you have a specific reason to believe otherwise, it’s generally better to use foldr because you can benefit from laziness and the compiler can generally optimize away the performance penalties. If you do know that your code will benefit from a left fold, be sure to use foldl' instead of plain foldl. These functions work identically, but foldl' is optimized and should generally result in using less memory.