Introduction to Recursion

This article used to be 6 part series of articles on recursion that I had on my old blog. I have combined the series into a single article and posted it here. This article is still dated for the 12th of April, 2008, which is when I originally posted this article. I have taken this opportunity to fix some typos, update a few sentenced and rewrite the code snippets from C -- which I used at that time -- into Python, which is my preferred teaching language nowdays.

Recursion Part 1: Introduction to Recursion

In this, the first part, we talk about some theory behind recursion. This theory will form the foundation and help in understanding the remaining parts of the series. Most of this theory will already be known and it is just a quick overview to refresh your memory.

Let us start with the most common example of recursion, calculating the factorial of a number. Here is the code:

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

How does this work? Say we want to calculate factorial(4). Since 4 is not equal to 0, the expression evaluates to 4 * factorial(3). factorial(3) in turn is expanded to 3 * factorial(2). At each point, the previous state of execution is pushed on to the stack. Continuing this way, the expression finally evaluates to: 4 * 3 * 2 * 1 * factorial(0)

factorial(0) is the base case and matches if n == 0 which returns 1.

At this point, the stack of operations gets popped and the value changes as follows:

4 * 3 * 2 * 1 * 1
4 * 3 * 2 * 1
4 * 3 * 2
4 * 6
24

The final answer is 24, which is indeed the factorial of 4. Let us summarize the state of execution at every step:

factorial(4)
4 * factorial(3)
4 * 3 * factorial(2)
4 * 3 * 2 * factorial(1)
4 * 3 * 2 * 1 * factorial(0)
4 * 3 * 2 * 1 * 1
4 * 3 * 2 * 1
4 * 3 * 2
4 * 6
24

Note an important point: No multiplications are performed until the stack starts to get popped. Thus, the execution of this function consists of a series of "deferred operations" which are stored in the stack state during the push operation, and which are executed after the pop operation.

Understanding the above point is the key to understanding many concepts in recursion. We will be using it in the other parts in this series.

Recursion Part 2: Tail recursion and Accumulators

This part introduces an important concept in recursion: the tail recursion. We see what tail recursion is, and where it is used.

Let us get back to the factorial program:

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

As we saw in Part 1, the execution of this program goes like this:

factorial(4)
4 * factorial(3)
4 * 3 * factorial(2)
4 * 3 * 2 * factorial(1)
4 * 3 * 2 * 1 * factorial(0)
4 * 3 * 2 * 1 * 1
4 * 3 * 2 * 1
4 * 3 * 2
4 * 6
return 24

In this program, the multiplication with 'n' is the deferred operation (See Part 1 of this series for an introduction on deferred operations). Apart from the deferred operation, there is the recursive call to calculate factorial(n - 1).

Since the recursive call is the last step of the function, this type of recursion is called tail recursion. The name derives from the fact that the recursive call occurs at the end of the function, and after the recursive call, there is nothing more to execute in the function.

But even then, the actual multiplication happens only during the stack pop step, i.e. after the inner recursive call is over. So, in order to evaluate factorial(4), first it will have to execute factorial(3) and after that returns 6, then it will calculate 4 * 6 (this multiplication is the deferred operation) and return 24. This kind of tail recursion is called ordinary tail recursion.

Ordinary tail recursion has a deferred operation and a recursive call. The above implementation of the factorial is ordinary tail recursion because it has a deferred operation.

Pure tail recursion has no deferred operation, it only has a recursive call. Take a look at this implementation of factorial:

def factorial(n):
    return factorial_acc(1, n)

def factorial_acc(acc, n):
    if (n == 0):
        return acc
    return factorial_acc(n * acc, n - 1)

Let us trace the execution of this program to calculate factorial(4):

factorial(4)
factorial_acc(1, 4)
factorial_acc(4, 3)
factorial_acc(12, 2)
factorial_acc(24, 1)
factorial_acc(24, 0)
return 24

Notice the difference with the normal factorial implementation? In this version, there are no deferred operations. Instead, the multiplication is calculated immediately and passed as the first parameter to the recursive function. This parameter is known as the accumulator parameter (in case you were wondering, that is why it is called acc, and the function is called factorial_acc). The role of the accumulator parameter is to accumulate the result of the operations at each step. Note that this version computes the multiplication during a stack push rather than a stack pop.

Nothing is done during stack pop. When the inner function completes, the result is directly returned by the outer function. This version is pure tail recursion.

Recursion Part 3: Tail call optimisation

In the previous part, we implemented a factorial program using pure tail recursion and accumulators. This was the code

def factorial(n):
    return factorial_acc(1, n)

def factorial_acc(acc, n):
    if (n == 0):
        return acc
    return factorial_acc(n * acc, n - 1)

When we run this code to calculate factorial(4), the stack trace looks like this

factorial(4)
factorial_acc(1, 4)
factorial_acc(4, 3)
factorial_acc(12, 2)
factorial_acc(24, 1)
factorial_acc(24, 0)
return 24

The main difference in pure tail optimisation is that there is no deferred operation, when the inner function returns a value, the outer function directly returns that same value without any further computation. Therefore once we go into the inner function, we no longer require the stack frame for the outer function as it will never be used again in the computation.

If we were to run the above code in Python with a large value of n after some time we will get a stack overflow error. This is because as we go deeper and deeper into the recursion, the language runtime keeps all the stack frames alive until finally it hits the limit.

However, in many functional programming languages this does not happen. The runtime detects when a pure tail call recursion happens. Then when they go into the inner function, they remove the stack of the outer function. So at any point of time during the recursion, only the current call stack is maintained, all the previous ones are gone. In these languages you can do pure tail call recursion forever without hitting any stack overflow.

This technique is called tail call optimisation. If the language you are working with supports tail call optimisation, you can happily use these recursive versions. What if the language you are using does not support it? Well, we shall see in the next below.

Recursion Part 4: Accumulators and Iteration

In the previous part we were looking at tail call optimisation of this factorial implementation

def factorial(n):
    return factorial_acc(1, n)

def factorial_acc(acc, n):
    if (n == 0):
        return acc
    return factorial_acc(n * acc, n - 1)

and we saw that the call stack was like this

factorial(4)
factorial_acc(1, 4)
factorial_acc(4, 3)
factorial_acc(12, 2)
factorial_acc(24, 1)
factorial_acc(24, 0)
return 24

When there is tail call optimisation this whole call stack can be collapsed into a single stack where each stack frame replaces the previous one.

Now let us look at an iterative version of the factorial

def factorial_iter(n):
    fact = 1
    i = n

    while i > 0:
        fact = fact * i
        i = i - 1
    return fact

Notice any similarities with the accumulator version? Look at how the variables change with each iteration while calculating factorial(4):

factorial_iter(4)
fact = 1, i = 4
fact = 4, i = 3
fact = 12, i = 2
fact = 24, i = 1
fact = 24, i = 0
return 24

Compare the above to the call stack of the accumulator version and you will find that they are identical. Here the fact variable plays the role of the accumulator.

So here is the important result: Any ordinary tail recursive program can be converted to a pure tail recursive program with accumulator, which in turn can be converted to a while loop. Thus, ordinary tail recursion and iteration are equivalent, and the above steps give us a way to convert between the two forms!

In other words, iteration is just a special case of recursion. Some languages like Scheme do not have any looping statements in the core language at all. No for loop, no while loop. They do have tail call optimisation, so you are expected to do all the looping with tail recursion.

In other languages like C, Java and Python, there is no special support for tail call optimisation. Instead these languages provide iterative looping constructs which are preferred over tail recursion.

Sometimes, certain algorithms are easier to express in a tail recursive form. Since tail recursion is equivalent to iteration, you can always convert the code back and forth between the two forms.

Recursion Part 5: Exercises in tail recursion

Try out the following problems:

Is this tail recursion?

def fibonacci(n):
    if (n <= 2):
        return 1
    return fibonacci(n - 2) + fibonacci(n - 1)

What about this?

def find_root(node):
    if node.parent is None:
        return node
    return find_root(node.parent)

And this?

def find_node(node, data_to_find):
    if node is None:
        return None
    if node.data == data_to_find:
        return node
    return find_node(node.next, data_to_find)

Try converting the above three programs to iterative versions. Do you find some harder than the others?

Now lets do the reverse. Convert this iterative program into a tail recursive version. First convert it to an accumulator based function and then convert that to an ordinary tail recursive version. Hint: First convert the for loop into a while loop.

def sum_of_first_n_numbers(n):
    sum = 0
    for i in range(n + 1):
        sum = sum + i
    return sum

How about this one? Assume all numbers are positive (> 0). Hint: The accumulator does not always have to perform some mathematical operation. It can also be used to keep track of the state so far.

def find_max(number_list):
    maximum = 0
    for num in number_list:
        if num > maximum:
            maximum = num
    return max

Recursion Part 6: Tree Recursion and Dynamic Programming

This part introduces tree recursion, a form of recursion that occurs when a function calls itself more than once in a path. We then introduce dynamic programming, a class of algorithms to tackle certain types of tree recursive problems.

Some recursive functions call themselves more than once. Consider the function to calculate the nth term of a fibonacci series (also known as virahanka series):

def fibonacci(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fibonacci(n - 2) + fibonacci(n - 1)

In order to calculate fibonacci(4), we need to calculate fibonacci(2) and fibonacci(3). For each call, there are two more recursive calls. This proceeds until n becomes 0 or 1, which are the terminating points of the recursion. We see that execution follows a tree form like this:

fibonacci(4)
   +------------------------------
   |                             |
fibonacci(2)                 fibonacci(3)
   +-----------------            +---------------
   |                |            |              |
fibonacci(0)   fibonacci(1)  fibonacci(1)  fibonacci(2)
                                                +-----------
                                                |          |
                                           fibonacci(0) fibonacci(1)

Each node has two children, one for each recursive call. A function with 'n' recursive calls will have 'n' children at each node. Since program execution follows a tree like structure, we call this form of recursion tree recursion

Is there any way to convert tree recursion to iteration?

The short answer is no. There is no general algorithm to convert tree recursion to iteration. However, specialised algorithms do exists for certain problems. These algorithms (if they exist) are usually specific to each problem.

And it just so happens that our fibonacci algorithm falls into a class of tree recursive problems that can be converted to iteration using a technique called dynamic programming!

Look at the execution pattern for fibonacci(4) again. Notice how fibonacci(2) is calculated twice: Once in the computation of fibonacci(4) and once again in the computation of fibonacci(3). fibonacci(1) is computed thrice!

What if we could save the values of previous fibonacci terms when they are first computed and reuse the values later on without having to compute them again? This optimisation is the cornerstone of dynamic programming.

Here is how we can proceed:

Store the initial values of fibonacci(0) and fibonacci(1)
Compute fibonacci(2) from fibonacci(0) and fibonacci(1). The saved value of fibonacci(0) is now no longer required, so we can discard it.
In general, compute fibonacci(n + 1) from the saved values of fibonacci(n - 1) and fibonacci(n). Discard the value of fibonacci(n - 1).
Repeat step 3 until we have computed the desired term

Look at the execution tree again. Notice how we start the computation from the leaf nodes of the tree and work our way up the tree computing each node, until we reach the root, which is the value we desire.

Here is the implementation using iteration:

def fibonacci(n):
    out = 0
    next = 1
    for _ in range(1, n + 1):
        out, next = next, out + next
    return out

We have just used dynamic programming to convert a tree recursive algorithm into an iterative algorithm. The basic idea of dynamic programming is to look at the execution tree, identify repeated computations and perform the calculations 'bottom-up' from leaf to root. At each step, the computed values of the subproblems are stored to be reused later.

Dynamic programming cannot be used on every problem. Some problems can be converted to iteration using other methods, and some problems cannot be converted to iteration at all. Nevertheless, dynamic programming works on a large class of problems, and it is a useful tool to have in the toolbox

Recursion Part 7: Structural and Generative Recursion

This part deals with two areas for which recursion is commonly applied. The first, called structural recursion, is used to traverse through different parts of a data structure, processing each part in some way. The second, called generative recursion, is used when we want to divide a problem into smaller subproblems, which are then solved.

Let us start with structural recursion. Here is an example, an inorder traversal of a binary tree:

def inorder(element):
    if element is None:
        return
    inorder(element.left)
    process(element.data)
    inorder(element.right)

This is a classic case of structural recursion. Each recursive call processes a part of the binary tree. Put together, we process all the elements in the tree. Structural recursion is often used when dealing with self-referential data structures like linked lists, trees and graphs. Functions to deal with these structures are hard to implement with iteration (except for linked lists), and even if we do manage to implement them with iteration, the resultant code is often very difficult to understand. Such code is best left as recursive.

In some cases, it is possible to modify the data structure itself so that common operations can be implemented iteratively. An example of this is the threaded tree data structure that modifies the classic binary tree to allow for iterative traversal.

In any case, except for exceptional cases, it is best to use recursive algorithms to implement structural recursion

Generative recursion is often used when we want to break up a problem into similar subproblems. The subproblems are solved and the results combined to get the final solution. Solving the subproblems in turn requires recursion to break the problem into smaller subproblems, and so on until we reach a trivial case. The recursive examples of factorial and fibonacci number calculations and the well known quicksort are examples of generative recursion.

Generative recursion that operates on a known finite set (eg: fibonacci(n) operates on integers between 0 and n) are good candidates for a dynamic programming approach. Generative recursion that operates on large or unknown sets (eg: functions that work with real numbers often fall into this category) are usually not condusive to dynamic programming. Some cases of generative recursion may have good iterative solutions, but this has to be considered on a case by case basis. In most cases, it is best to just leave them recursive.

Of course, all the above only applies to tree recursion. Tail recursion, whether structural or generative can always be converted into iteration.

Recursion Part 8: References and Further Information

The best way to learn about recursion is to try a language that doesn't support iteration. Scheme is a good language to learn in this context. It is simple and easy to understand. To experiment with some programs in Scheme, download Dr. Racket from here, which can be configured to run Scheme.

Two recommended books, both of which are available online:

Structure and Interpretation of Computer Programs (also called as The Wizard Book). There is an Indian edition, but not very easy to find.
How to design programs. This book is also available in an Indian Edition, but there is often not much stock. The new second edition teaches the basics of scheme using Dr.Racket.

Both books use the Scheme language (2022 update - SICP also comes in a Javascript variant). Both books are available for free on the Internet. If you prefer, you can follow along the books but write the code in Python or Javascript, both these languages can work with the contents of both books.

For more on dynamic programming, see Chapter 15 of the book Introduction to Algorithms by Cormen, Leiserson and Rivest. This is a college textbook and is available in Amazon as well as many bookstores. Most other books on algorithms also include a chapter on dynamic programming. Otherwise searching Google for "dynamic programming" will provide lots of articles on this topic.