Things every Kotlin Developer should know about Coroutines. Part 4: Exception Handling.

Things every Kotlin Developer should know about Coroutines. Part 4: Exception Handling.

·

13 min read

In Part 3, we have discussed structured concurrency, which was introduced to the coroutines library primarily as a solution to exception handling and cancellation.

Because of that, these two concepts go hand in hand, and even though we will not focus on cancellation just yet, we will inevitably mention it throughout this article since both of them share the same mechanism.

That mechanism is a Job. It enforces structured concurrency by creating parent-child relationships between coroutines and manages exception propagation and cancellation.

Therefore, before talking about exception handling, we should understand what a Job is and how it works.

Lifecycle of a Job

A Job is an entity with a lifecycle that acts as a handle to a coroutine. Every coroutine's Job goes through several states that culminate in its completion.

The lifecycle of a Job looks like this:

job-lifecycle-white-01.png

A Job has six states which fall under three main categories:

  1. Initial state - New, Active
  2. Transient state - Completing, Cancelling
  3. Final state - Completed, Cancelled

A launched coroutine immediately goes into the Active state, unless an optional parameter start = CoroutineStart.LAZY is passed to the coroutine builder, in which case it starts in the New state.

A Job in the New state can be started and moved to the Active state by either calling job.start() or job.join().

When a coroutine has finished its work, it goes to the Completing state, where it awaits the completion of all its children - which, as you now know, is an integral part of the structured concurrency principle.

If an exception or cancellation happens during either the Active or Completing state, the Job moves to the Cancelling state, where it awaits the cancellation of all its children.

After all the children have finished or are cancelled, the Job moves to either the Completed or Cancelled state.

Note that these states are not publicly exposed. To find out the state of a Job we can use public properties isActive, isCancelled, and isCompleted.

Exception propagation

Now that we understand the lifecycle of a Job, let’s talk about exception propagation. It works as follows:

  1. If a child coroutine fails, it propagates the exception to its parent.
  2. The parent cancels itself and consequently cancels all its children. The parent will wait in the Cancelling state until all the children are cancelled.
  3. The parent propagates the exception up to its parent or throws the exception if it is a root coroutine and the exception was not caught.

Both exception propagation and cancellation work the same way. Moreover, both are just cancellation with a cause of type Throwable, where an explicit cancellation by the user throws a CancellationException, which gets special treatment.

The actual implementation of this mechanism is quite complex, but here is an example from the source code for illustration:

/** 
 * ...
 * Returns `true` if exception is handled, `false` otherwise 
 * (then caller is responsible for handling an exception)
 * ...
 */
public open fun childCancelled(cause: Throwable): Boolean {
    if (cause is CancellationException) return true
    return cancelImpl(cause) && handlesException
}

Whenever a child coroutine is cancelled, the parent first checks whether it was cancelled with a CancellationException, in which case it returns true, meaning that the exception is handled and there is nothing more to do. Otherwise, it will process and propagate the exception.

Let’s see, how it works in practice:

val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    // Main Job
    scope.launch {
        // Child 1
        launch {
            while (isActive) { 
                // run
            }
        }.printOnComplete("Child 1 is cancelled!")

        // Child 2
        launch {
            delay(500)
            println("Here goes boom...")
            throw IllegalArgumentException("Boom!")
        }.printOnComplete("Child 2 is cancelled!")
    }.printOnComplete("Main Job has completed!")

    // Random coroutine on the same scope
    scope.launch {
        while (isActive) { 
            // run
        }
    }.printOnComplete("Random coroutine is cancelled!")

    delay(1000)
}

Note: printOnComplete is a custom extension function for brevity.

fun Job.printOnComplete(message: String) {
    invokeOnCompletion {
        println(message)
    }
}

In this example, we launch a new coroutine inside a CoroutineScope. This main coroutine has two children, one of which runs while the coroutine is active, and the second throws an exception after a small delay. There is also a random independent coroutine running in the same scope.

If we run this code, the output will print:

Output:
Here goes boom...
Child 1 is cancelled!
Child 2 is cancelled!
Random coroutine is cancelled!
Exception in thread "DefaultDispatcher-worker-2" 
java.lang.IllegalArgumentException: Boom!
    ...
Main Job has completed!

This is how standard exception propagation looks like.

Note: Uncaught exceptions will get passed to the thread’s default exception handler. On JVM it will be just logged to the console, like in this case, while, for example, in Android it will crash the application.

In this case, an exception in the Child 2 was propagated all the way up to the scope, triggering cancellation of the whole coroutine tree.

However, keep in mind that coroutines have to be cancellable for error propagation to work correctly.

If we change the code of one of the children to something like this:

// Child 1
launch {
    // we have changed isActive to true
    while (true) { 
        // run
    }
}.printOnComplete("Child 1 cancelled!")

And run the code again, we will get:

Here goes boom...
Random coroutine is canceled!
Child 2 is canceled!

And that’s it.

The exception will never get thrown by the parent coroutine. As we have discussed before, a parent coroutine always waits for its children to complete. However, because we have introduced an infinite loop inside the Child 1, it will never complete and the parent will keep waiting indefinitely, or in this case while the main is running.

That said, the problem here is not the infinite while (true) loop itself, but the fact that this child coroutine is not cooperating with cancellation - a topic we will discuss in Part 5 of this series.

Job vs SupervisorJob

In the example above, the whole scope was cancelled because one of the children threw an exception. In many cases however, this behavior is undesirable. We would probably want all the other independent coroutines in the same scope to continue running.

This is where a SupervisorJob comes into play. You might have noticed that in the example above we have used a Job() inside the scope’s CoroutineContext. A regular Job cancels itself and propagates exceptions all the way to the top level, while SupervisorJob relies on coroutines to handle their own exceptions and doesn't cancel other coroutines.

From the documentation: A failure or cancellation of a child does not cause the supervisor job to fail and does not affect its other children.

That said, uncaught exceptions will always be thrown regardless of the Job implementation.

Incidentally, the implementation of the SupervisorJob is very simple. It overrides a single function from the regular Job implementation:

private class SupervisorJobImpl(parent: Job?) : JobImpl(parent) {
    override fun childCancelled(cause: Throwable): Boolean = false
}

This is the same function we looked at earlier. In a SupervisorJob it always returns false, meaning that the exception is not handled and coroutines should handle exceptions themselves.

For this reason, in Part 2 we have mentioned that in most cases it makes sense to use a SupervisorJob() in a top-level scope. This way it won’t get cancelled as soon as one of its children throws an exception.

The same goes for suspending scope builders - a coroutineScope will go down with its children, while a supervisorScope will not.

Here is an example with a coroutineScope:

fun main() = runBlocking {
    val result = coroutineScope {
        launch {
            delay(100)
            throw IllegalArgumentException("A total fiasco!")
        }

        launch {
            delay(200)
            println("Hi there!")
        }

        "Result!"
    }

    println("Got result: $result")
}

Output:
Exception in thread "main" java.lang.IllegalArgumentException: A total fiasco!
 ...

And the same example with a supervisorScope:

fun main() = runBlocking {
    val result = supervisorScope {
        launch {
            delay(100)
            throw IllegalArgumentException("A total fiasco!")
        }

        launch {
            delay(200)
            println("Hi there!")
        }

        "Result!"
    }

    println("Got result: $result")
}

Output:
Exception in thread "main" java.lang.IllegalArgumentException: A total fiasco!
 ...
Hi there!
Got result: Result!

A special case of an Async builder

Different coroutine builders treat exception propagation differently. While launch automatically propagates exceptions when they are thrown, the async coroutine builder is a little bit special in that regard.

When an exception is thrown inside the async builder that is used as a root coroutine it will rely on the user to consume the exception.

val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    val deferred = scope.async {
        delay(50)
        throw IllegalStateException("Async Boom!")
    }

    delay(100)

    println("I'm done")
}

Output:
I'm done

The exception in this case will never be thrown. It will only be thrown when .await() is called on the Deferred result:

val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    val deferred = scope.async {
        delay(50)
        throw IllegalStateException("Async Boom!")
    }

    delay(100)

    // the exception will be thrown here
    deferred.await()

    println("I'm done")
}

Output:
Exception in thread "main" java.lang.IllegalStateException: Async Boom!

Note: Deferred is just a Job that returns a result.

The async builder will also behave the same way when used inside a supervisorScope, since a supervisorScope will not notify its parent about exceptions and will rely on children to handle them. In other words, coroutines inside a supervisorScope can be treated as root coroutines.

val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    scope.launch {
        supervisorScope {
            println("I am the supervisor scope!")

            val deferred = async {
                delay(50)
                throw IllegalArgumentException("Async Boom!")
            }

            println("Supervisor scope done!")
        }
    }

    delay(200)
    println("Main is done!")
}

Output:
I am the supervisor scope!
Supervisor scope done!
Main is done!

And just like in the example with the root async builder, the async exception inside the supervisorScope will be thrown only if .await() is called.

Note: When using async as a child coroutine or inside a coroutineScope, the exception will be thrown without calling .await() and immediately propagated to the parent, even if wrapped in try-catch.

Handling exceptions

Now that we understand how exception propagation works in the coroutines library, let’s talk about the most important part - handling thrown exceptions.

For that you have a couple of options.

try-catch

The most straightforward way to handle exceptions is with a try-catch block like you would anywhere else in your code. This way the exceptions get handled immediately without triggering exception propagation and cancellation:

val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    scope.launch {
        launch {
            try {
                delay(10)
                throw IllegalArgumentException("A complete failure!")
            } catch (e: Exception) {
                println("Child 1 has recovered from: ${e.message}")
            }
        }

        launch {
            delay(50)
            println("Child 2 is OK!")
        }
    }

    delay(100)

    println("Main is done!")
}

Output:
Child 1 has recovered from: A complete failure!
Child 2 is OK!
Main is done!

That said, keep in mind that in cases described in the previous section, the async builder will throw exceptions only when .await() is called, and it should be wrapped with try-catch:

fun main(): Unit = runBlocking {
    supervisorScope {
        val deferred = async {
            delay(50)
            throw IllegalArgumentException("An utter collapse!")
        }

        try {
            deferred.await()
        } catch (e: Exception) {
            println("Supervisor has recovered from: ${e.message}")
        }

        println("Supervisor scope is done!")
    }

    delay(100)
    println("Main is done!")
}

Output:
Supervisor has recovered from: An utter collapse!
Supervisor scope is done!
Main is done!

runCatching

For more idiomatic Kotlin, you can use runCatching function that comes from Kotlin’s standard library. All it does under the hood, is wraps a block of code in try-catch and returns a Result<R> wrapper:

public inline fun <T, R> T.runCatching(block: T.() -> R): Result<R> {
    return try {
        Result.success(block())
    } catch (e: Throwable) {
        Result.failure(e)
    }
}

Here is an example:

fun main(): Unit = runBlocking {
    launch {
        delay(10)

        val result = runCatching {
            throw IllegalArgumentException("An absolute disaster!")
        }

        when {
            result.isSuccess -> println("I got: ${result.getOrNull()}")
            result.isFailure -> println("I have recovered from: ${result.exceptionOrNull()?.message}")
        }
    }

    delay(100)

    println("Main is done!")
}

Output:
I have recovered from: An absolute disaster!
Main is done!

CoroutineExceptionHandler

You might remember that in Part 1 of this series we briefly mentioned the CoroutineExceptionHandler - a context Element that processes uncaught exceptions in coroutines. Now it’s time to discuss it in more detail.

The main difference between handling exceptions with a CoroutineExceptionHandler and a try-catch block is that when an exception gets to a CoroutineExceptionHandler the coroutine had already completed and you can no longer recover from the exception. That is why a CoroutineExceptionHandler should be used as a last-resort measure for exceptions that hadn't been handled differently.

The CoroutineExceptionHandler will only work if it is added to the context of either a CoroutineScope or a root coroutine. Adding it to child coroutines will have no effect, since they will automatically propagate exceptions to their parent. The exception here (no pun intended) are the coroutines that are launched directly inside a supervisorScope, since they are responsible for handling their own exceptions.

Here are some examples:

val handler = CoroutineExceptionHandler { coroutineContext, throwable ->
    println("Handler has caught: ${throwable.message}")
}

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    // we are adding a handler to the
    // context of a root coroutine
    scope.launch(handler) {
        delay(10)
        throw IllegalArgumentException("An awful crash!")
    }

    delay(100)
    println("Main is done!")
}

Output:
Handler has caught: An awful crash!
Main is done!

As an alternative we could have added the handler to our scope directly:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default + handler)

In which case there would be no need to add it every time to the context of a root coroutine. That said, if you are using a predefined scope you might not have this option.

As we have mentioned earlier, we can also use a CoroutineExceptionHandler inside a supervisorScope:

val handler = CoroutineExceptionHandler { coroutineContext, throwable ->
    println("Handler has caught: ${throwable.message}")
}

fun main(): Unit = runBlocking {
    supervisorScope {
        launch(handler) {
            delay(10)
            throw IllegalArgumentException("A shocking mishap!")
        }
    }

    delay(100)
    println("Main is done!")
}

Output:
Handler has caught: A shocking mishap!
Main is done!

However, this use case is quite rare and in most cases other error handling solutions will make more sense.

Things to keep in mind

Suppressed exceptions

One thing to keep in mind is that if multiple children fail with an exception, the first thrown exception will get propagated, while others exceptions will get attached to it as suppressed exceptions:

private val scope = CoroutineScope(Job() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    scope.launch {
        launch {
            delay(100)
            throw IllegalStateException("First Boom!")
        }

        launch {
            delay(100)
            throw IllegalStateException("Second Boom!")
        }
    }
    delay(500)
}

Output:
Exception in thread "DefaultDispatcher-worker-2" 
java.lang.IllegalStateException: First Boom!
 ...
 Suppressed: java.lang.IllegalStateException: Second Boom!
 ...

Job’s invokeOnCompletion

You can add an invokeOnCompletion callback to a Job to see if there were exceptions during its execution:

job.invokeOnCompletion { throwable ->
    when (throwable) {
        is CancellationException -> println("Job was cancelled!")
        is Throwable -> println("Job failed with exception!")
        null -> println("Job completed normally!")
    }
}

However, keep in mind that if you catch the exception with a try-catch block, it will not get passed to this callback.

CancellationException

CancellationExceptions will not get passed to a CoroutineExceptionHandler, therefore it should not be be relied upon as a resource clean-up mechanism.

For example, a common source of bugs in Android is using a CoroutineExceptionHandler to revert the state of UI in case of an exception inside a ViewModel. In most cases, there is nothing wrong with that, but keep in mind that in that case explicitly cancelling your coroutines might result in a broken UI state.

Conclusion

Handling exceptions in coroutines is not a straightforward task and will require some practice to get hold of all the concepts it envelops. Frankly, for me it was one of the most confusing parts about the coroutines library.

The most important part about all of this, once again, is to understand structured concurrency. Then, in my experience, it all magically clicks into place.

In the next part, we will tackle a sibling of exception handling - cancellation, which should be a breeze now that we understand the underlying mechanisms.

See you then.

Your friend,

Max

Did you find this article valuable?

Support Art and science of writing good code by becoming a sponsor. Any amount is appreciated!