Things every Kotlin Developer should know about Coroutines. Part 5: Cancellation.

Things every Kotlin Developer should know about Coroutines. Part 5: Cancellation.

·

15 min read

As much as we had talked about cancellation in the last part, in this article, we will often touch upon exception handling. Since, as discussed previously, these two concepts use the same mechanism.

That said, cancellation in the coroutines library is not as straightforward as it might seem. Misusing it can produce subtle bugs and puzzle developers unfamiliar with its inner workings.

So let’s get cracking.

Cancelling a coroutine

Like with exception propagation, cancellation in coroutines is managed by a Job. Moreover, cancellation is nothing more than throwing a CancellationException. The critical distinction here is that if a coroutine throws a CancellationException, it is considered to have been cancelled normally, while any other exception is considered a failure.

Another difference is that while a regular exception in a child coroutine will trigger the cancellation of its parent (unless it’s using a SupervisorJob), the CancellationException will not.

Now, let’s see how that works in practice.

First of all, a Job is our handle to a coroutine, and it has a lifecycle. If you are not familiar with the lifecycle of a Job, please refer to the Part 4 of this series.

We can cancel a coroutine by calling the .cancel() function on its Job.

/**
 * Cancels this job with an optional cancellation[cause].
 * A cause can be used to specify an error message 
 * or to provide other details on
 * the cancellation reason for debugging purposes.
 */
public fun cancel(cause: CancellationException? = null)

You can also specify a cause for cancellation, but it has to be a subtype of the CancellationException and, therefore, it will always lead to normal cancellation.

When you call .cancel() on a Job, it triggers the following events:

  • The Job goes into the Cancelling state and cannot be used as a parent to new coroutines or do suspending work anymore;
  • At the first suspension point that cooperates with cancellation (or after manually calling ensureActive(), more on that later), a CancellationException is thrown, prompting the cancellation of all children, but not its parent;
  • After all the children are cancelled, the Job moves to the Cancelled state.

Both CoroutineScope and CoroutineContext have an extension function .cancel(cause: CancellationException? = null). These functions just get a Job from the corresponding context and call .cancel(cause) on that.

Here is a simple example of how cancellation works:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main(): Unit = runBlocking {
    val parentJob = scope.launch {
        println("Starting the parent job!")

        launch {
            while (isActive) {
                delay(10)
                println("Doing some work...")
            }
        }.invokeOnCompletion {
            println("Cancelling all work! ")
        }
    }

    parentJob.invokeOnCompletion {
        println("The parent job is cancelled!")
    }

    delay(50)

    parentJob.cancel()

    // Take note of this delay!!!
    delay(100)

    println("Main is done!")
}

Output: 
Starting the parent job!
Doing some work...
Doing some work...
Doing some work...
Doing some work...
Cancelling all work! 
The parent job is cancelled!
Main is done!

We have a parent coroutine with a long-running child coroutine in this example. After some time, we cancel the parentJob, which cancels both parent and child coroutines as expected.

However, as we now know, calling .cancel() doesn't stop a Job dead in its tracks but only triggers the beginning of cancellation. Therefore if we removed the delay(100) after cancelling our job, we would get the following output:

Starting the parent job!
Doing some work...
Doing some work...
Doing some work...
Doing some work...
Main is done! <- This is printed immediately!
Cancelling all work! 
The parent job is cancelled!

“Main is done!” is printed immediately after calling .cancel() because the parent runBlocking coroutine will continue its execution, while the cancellation of the parentJob will proceed asynchronously.

In this case, we are lucky that cancellation is almost immediate, and we get all the messages printed out. However, since our parent coroutine is not a child of runBlocking, the structured concurrency principle does not apply, and the runBlocking coroutine will not wait for the cancellation to complete.

If we are not careful with such cancellations, it could lead to subtle bugs and race conditions.

That said, we have used the .delay() function for illustration purposes only. It is a horrible solution to this problem in actual code.

The proper way to wait for a Job’s completion is to call job.join().

The .join() function will suspend the calling coroutine until the joined Job is fully completed. Keep in mind that .join() will always continue normally, regardless of how the joined Job has been completed, as long as the Job of the calling coroutine is active.

Here is how the proper cancellation of the parentJob in our example would look like:

parentJob.cancel()
parentJob.join()

println("Main is done!")

Moreover, this is such a common practice in coroutines that the library offers an extension function Job.cancelAndJoin(), which does precisely that - calls .cancel() and .join() immediately afterward.

parentJob.cancelAndJoin()

println("Main is done!")

By calling cancelAndJoin(), we are cancelling our job and suspending the calling coroutine until the job has finished all the cancellation work.

Understanding suspension points

As stated earlier, the CancellationException is generally thrown at the first suspension point that cooperates with cancellation.

While a suspension point doesn’t necessarily mean your code will automatically cooperate with cancellation, all suspend functions (with one exception that I am aware of, which we will discuss in a bit) from the coroutines library are safe to cancel. Therefore, it is crucial to quickly identify them in your code to avoid redundant cooperation with cancellation.

The easiest way to identify suspension points is with help from the IDE. If you are writing Kotlin code, chances are you are using an IntelliJ IDEA based IDE, in which case suspension points are conveniently displayed in the gutter: suspension_points.png

IntelliJ IDEA marks every call to a suspend function with a special arrow symbol.

While every suspend function in the coroutines library is a valid suspension point and is cancellable, marking your own functions with the suspend modifier does not mean it will necessarily suspend a coroutine or make it cancellable.

Here is an example:

suspend fun doStuff() {
    println("I do stuff!")
}

This function does not have any suspension points, and the IDE will warn you about the redundant suspend modifier. That said, if you call this function, the IDE will display the suspension symbol in the gutter regardless.

For every rule, there is an exception

While the example above doesn’t support cancellation because we wrote a lousy suspend function, there is a case when cancellation is not supported by a suspending function from the coroutines library by design.

This exception is the suspendCoroutine function. Diving deep into the specifics of this function is out of the scope of this article. Still, it suspends current execution and runs a non-suspend block, allowing to explicitly continue the coroutine execution with a series of callbacks.

This functionality is needed to bridge the suspending world with other asynchronous solutions. However, there is a better function for that, which we will discuss in a minute.

For now, let’s take a look at this example:

private val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() = runBlocking {
    val job: Job = scope.launch {
        while (true) {
            doStuff()
        }
    }

    delay(5)

    println("Cancelling the job!")
    job.cancelAndJoin()

    println("Main is done!")
}

suspend fun doStuff() {
    // this is proper suspension point
    // but it will not throw a
    // CancellationException
    suspendCoroutine<Unit> { continuation ->
        continuation.resume(Unit)
    }
    println("I do stuff!")
}

If we run this code, we will get:

...
I do stuff!
I do stuff!
I do stuff!
...
// Forever and ever, since we have 
// joined the job after cancel

This job will never get cancelled, and the runBlocking coroutine will continue waiting indefinitely since we have joined the cancelled Job.

This function is not cancellable because there is a very similar function - suspendCancellableCoroutine, which, as its name suggests, supports cancellation while providing the same functionality.

Here is the adjusted doStuff() function:

suspend fun doStuff() {
    suspendCancellableCoroutine<Unit> { continuation ->
        continuation.resume(Unit)
    }
    println("I do stuff!")
}

Now, if we rerun the code, we will get a desired and predictable outcome:

...
I do stuff!
I do stuff!
Cancelling the job!
I do stuff!
I do stuff!
...
Main is done!

In this case, a suspendCancellableCoroutine will throw a CancellationException at its suspension point as soon as the Job gets cancelled, just as it should.

We can also achieve the same correct cancellation behavior by calling any other suspend function from the coroutines library (or that cooperates with cancellation), for example, delay(1).

Because of this behavior, there is little reason to use suspendCoroutine - a suspendCancellableCoroutine is always a safer option. Apart from cooperation with cancellation, suspendCancellableCoroutine provides a invokeOnCancellation { } callback that can be used to clean up resources.

That said, given that these functions are primarily used to write adapters to other asynchronous libraries, you will probably never use them. Almost all adapters you would ever need are already provided by either the coroutines team or the libraries' authors.

However, it is still important to be aware of this behavior for learning purposes.

Cooperating with cancellation

While suspend functions coming from the coroutines library are safe to cancel, you should always think about cooperating with cancellation when writing your own code.

There are a couple of ways to do that.

Checking the state of a Job

One way to make your code cancellable is to explicitly check the current Job's state.

The most convenient way to do this is by using the CoroutineScope.isActive extension property.

We have done it numerous times in examples throughout this series with while (isActive) loops.

Example:

scope.launch {
    // a periodical work
    while (isActive) {
        // do work
        delay(1000)
    }
}

There is a similar extension CoroutineContext.isActive, which works exactly the same way. Both these functions check the isActive property on the underlying Job.

Also, keep in mind that you can also check the isCancelled property if you have a reference to the Job. However, in most cases it is redundant and there are no corresponding extension functions for either CoroutineScope or CoroutineContext.

ensureActive()

Another common way to check for cancellation is to call ensureActive(), which is an extension function available for Job, CoroutineScope, and CoroutineContext.

This function is a great option for cases where you would otherwise write a if (isActive) statement. Moreover, under the hood ensureActive does just that, but it also throws a CancellationException for good measure:

public fun Job.ensureActive(): Unit {
    if (!isActive) throw getCancellationException()
}

Note: This function uses getCancellationException() to include the original cause of cancellation.

Just like the isActive check, ensureActive() is mostly used inside coroutines.

Example:

scope.launch { 
    // a long running for loop
    // without suspension points
    for (item in items) {
        ensureActive()
        println("Processing item...")
    }
}

Alternatively, you can use ensureActive() on a coroutineContext inside suspend functions, since, as opposed to checking isActive, it throws a CancellationException and will stop the execution:

suspend fun doSomeWork() {
    // do some work
    coroutineContext.ensureActive()
    // do some more work
}

yield()

From the documentation: Yields the thread (or thread pool) of the current coroutine dispatcher to other coroutines on the same dispatcher to run if possible.

The purpose of the yield() function is to free up the current thread to allow other coroutines to run on it. In practice, it suspends current execution and immediately resumes it.

In general, any suspended coroutine is not guaranteed to resume running on the same thread if the dispatcher allows it.

It can be beneficial to use yield() during CPU heavy work or during work that can exhaust the thread pool.

However, it is also common practice to use yield() to cooperate with cancellation.

Example:

suspend fun doHeavyWork() {
    withContext(Dispatchers.Default) {
        repeat(1000) {
            yield()
            // do heavy work
        }
    }
}

It can also be used in coroutines instead of ensureActive() if you need the added benefit of yielding the thread.

Cleaning up

Sometimes you have to clean up resources when a coroutine is cancelled. Luckily, it is quite easy, since cancellation throws a CancellationException and like any other exception, we can catch it in a try-catch block and do necessary clean up in a finally block.

However, before we discuss cleaning up in more detail, I want to address a common mistake that can be overlooked when using a try-catch block in coroutines.

Interrupted cancellation

Take a look at the following example:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() = runBlocking {
    val job = scope.launch {
        try {
            println("Doing work...")
            delay(Long.MAX_VALUE)
        } catch (e: Exception) {
            println("The work was interrupted!")
        }

        println("How am I still running?")
    }

    delay(100)

    job.cancelAndJoin()

    println("Main is done!")
}

Output:
Doing work...
The work was interrupted!
How am I still running?
Main is done!

As you see, even though we cancel our Job, it still completes until the end.

Let’s break down why this happens.

When we call job.cancelAndJoin() the job goes into the Cancelling state, and since at that moment the coroutine is suspended by delay(Long.MAX_VALUE), which is a cancellable suspend function, this suspension point throws a CancellationException.

Since it is a sub-type of Exception, we catch it inside the try-catch block and handle it manually.

After that, the coroutine will continue its execution, albeit in the Cancelling state. It means that for it to throw another CancellationException, it has to reach another suspension point. And given that in our example we don’t have any after the try-catch block, and we don’t explicitly check for cancellation, the coroutine will run all the way until the end.

This is a consequence of managing explicit cancellation with exceptions, and we have to be aware of this.

If we know that our coroutine can be explicitly cancelled and we are using a try-catch block, the common practice is to always rethrow a CancellatonException.

try {
    println("Doing work...")
    delay(Long.MAX_VALUE)
} catch (e: Exception) {
    println("The work was interrupted!")
    if (e is CancellationException) {
        throw e
    }
}

With this adjustment in place, our coroutine will get cancelled as expected:

Output:
Doing work...
The work was interrupted!
Main is done!

Tools for cleaning up

With that little detail out of the way, let’s take a look at how we can go about cleaning up resources after cancellation.

If we need to do clean up in case of a cancellation only, we can catch a CancellationException and do necessary work inside the catch block:


try {
    println("Doing work...")
    delay(Long.MAX_VALUE)
} catch (e: CancellationException) {
    // do clean up
    // and don't forget to rethrow the CancellationException
    throw e
}

Alternatively, we can do clean up regardless of the outcome, using a finally block:

try {
    println("Doing work...")
    delay(Long.MAX_VALUE)
} finally {
    // do clean up
}

That said, there is a catch (as there always seems to be with the coroutines library).

At this point, our Job is in the Cancelling state, which means it can no longer suspend execution. Therefore, any attempt to call a suspend function, for example, inside the finally block, will throw another CancellationException. Still, we might need to run some suspending functions as a part of a clean-up logic after cancellation.

Luckily, the coroutines library has a tool that allows doing just that - a special Job called NonCancellable.

The NonCancellable job is always active and is designed for use with the withContext function to handle cases as described above. withContext(NonCancellable) will switch to this non-cancellable job before checking for cancellation. Therefore it can be called even from a cancelled coroutine.

That said, let’s summarise everything we have learned about cleaning up with the following example:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() = runBlocking {
    val job = scope.launch {
        try {
            println("Doing work...")
            delay(Long.MAX_VALUE)
        } catch (e: CancellationException) {
            println("The work was cancelled!")
            throw e
        } finally {
            withContext(NonCancellable) {
                delay(1000)
                println("Did some suspending clean up")
            }

            println("Clean up is done!")
        }

        println("I will never be printed...")
    }

    delay(100)

    job.cancelAndJoin()

    println("Main is done!")
}

Output:
Doing work...
The work was cancelled!
// delaying 1 sec
Did some suspending clean up
Clean up is done!
Main is done!

A word of caution, however. Since NonCancellable is a Job, it can be used as a part of any CoroutineContext given its flexible API. But it was designed for use with the withContext function only. So using it with coroutine builders launch or async (you should never use any Job there anyways, as described in Part 3 of this series) will break structured concurrency in every possible way and should never be done.

Cancellation after a timeout

In some cases, you might want to cancel some work after a timeout. For that, the coroutines library provides a very convenient function withTimeout. Here’s its declaration:

public suspend fun <T> withTimeout(timeMillis: Long, block: suspend CoroutineScope.() -> T): T

It runs a suspending block, and after a specified timeout, it throws a TimeoutCancellationException, a subtype of CancellationException.

There is also a less aggressive version of this function - withTimeoutOrNull, which after a specified timeout, cancels its block and returns null without throwing an exception.

These functions are suspending scope functions by nature. They provide a scope just like coroutineScope, supervisorScope, and withContext functions. To be more specific, it will behave exactly like a coroutineScope function, only with a timeout.

Example:

fun main(): Unit = runBlocking {
    launch {
        try {
            withTimeout(100) {
                delay(400)
            }
        } catch (e: TimeoutCancellationException) {
            println("The coroutine has timed out!")
        }
    }
}

Output:
The coroutine has timed out!

When using the withTimeout function, after the timeout, it will cancel the parent coroutine (which will also cancel all its children) if the exception wasn’t explicitly handled. If you want other behavior or a better clean-up logic without using try-catch, consider using withTimeoutOrNull instead.

Conclusion

The concept of cooperative cancellation is not something we have to deal with often in other asynchronous solutions. Therefore, it might prove challenging to newcomers and seasoned developers alike.

Neglecting basic principles of cooperative cancellation can lead to memory leaks, wasted resources, and subtle bugs. Because of that, it might seem that it is too much hassle for a feature that shouldn’t be that complicated.

And it is a fair point. The coroutines library has quite a steep learning curve.

However, in my opinion, the resulting API of launching coroutines and elegant solution to establishing parent-child relationships is something that JetBrains deserve an applaud for, and it is a fair trade-off, all things considered.

That said, I have a couple of topic ideas for the next part of this series, but I will have to sleep on that, so no promises.

See you next time.

Your friend,

Max

Did you find this article valuable?

Support Max Kim by becoming a sponsor. Any amount is appreciated!