Things every Kotlin Developer should know about Coroutines. Part 5: Cancellation.
As much as we had talked about cancellation in the last part, in this article, we will often touch upon exception handling. Since, as discussed previously, these two concepts use the same mechanism.
That said, cancellation in the coroutines library is not as straightforward as it might seem. Misusing it can produce subtle bugs and puzzle developers unfamiliar with its inner workings.
So let’s get cracking.
Cancelling a coroutine
Like with exception propagation, cancellation in coroutines is managed by a Job
. Moreover, cancellation is nothing more than throwing a CancellationException
. The critical distinction here is that if a coroutine throws a CancellationException
, it is considered to have been cancelled normally, while any other exception is considered a failure.
Another difference is that while a regular exception in a child coroutine will trigger the cancellation of its parent (unless it’s using a SupervisorJob
), the CancellationException
will not.
Now, let’s see how that works in practice.
First of all, a Job
is our handle to a coroutine, and it has a lifecycle. If you are not familiar with the lifecycle of a Job
, please refer to the Part 4 of this series.
We can cancel a coroutine by calling the .cancel()
function on its Job
.
/**
* Cancels this job with an optional cancellation[cause].
* A cause can be used to specify an error message
* or to provide other details on
* the cancellation reason for debugging purposes.
*/
public fun cancel(cause: CancellationException? = null)
You can also specify a cause for cancellation, but it has to be a subtype of the CancellationException
and, therefore, it will always lead to normal cancellation.
When you call .cancel()
on a Job
, it triggers the following events:
- The
Job
goes into the Cancelling state and cannot be used as a parent to new coroutines or do suspending work anymore; - At the first suspension point that cooperates with cancellation (or after manually calling
ensureActive()
, more on that later), aCancellationException
is thrown, prompting the cancellation of all children, but not its parent; - After all the children are cancelled, the
Job
moves to the Cancelled state.
Both
CoroutineScope
andCoroutineContext
have an extension function.cancel(cause: CancellationException? = null)
. These functions just get aJob
from the corresponding context and call.cancel(cause)
on that.
Here is a simple example of how cancellation works:
val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)
fun main(): Unit = runBlocking {
val parentJob = scope.launch {
println("Starting the parent job!")
launch {
while (isActive) {
delay(10)
println("Doing some work...")
}
}.invokeOnCompletion {
println("Cancelling all work! ")
}
}
parentJob.invokeOnCompletion {
println("The parent job is cancelled!")
}
delay(50)
parentJob.cancel()
// Take note of this delay!!!
delay(100)
println("Main is done!")
}
Output:
Starting the parent job!
Doing some work...
Doing some work...
Doing some work...
Doing some work...
Cancelling all work!
The parent job is cancelled!
Main is done!
We have a parent coroutine with a long-running child coroutine in this example. After some time, we cancel the parentJob
, which cancels both parent and child coroutines as expected.
However, as we now know, calling .cancel()
doesn't stop a Job
dead in its tracks but only triggers the beginning of cancellation. Therefore if we removed the delay(100)
after cancelling our job, we would get the following output:
Starting the parent job!
Doing some work...
Doing some work...
Doing some work...
Doing some work...
Main is done! <- This is printed immediately!
Cancelling all work!
The parent job is cancelled!
“Main is done!” is printed immediately after calling .cancel()
because the parent runBlocking
coroutine will continue its execution, while the cancellation of the parentJob
will proceed asynchronously.
In this case, we are lucky that cancellation is almost immediate, and we get all the messages printed out. However, since our parent coroutine is not a child of runBlocking
, the structured concurrency principle does not apply, and the runBlocking
coroutine will not wait for the cancellation to complete.
If we are not careful with such cancellations, it could lead to subtle bugs and race conditions.
That said, we have used the .delay()
function for illustration purposes only. It is a horrible solution to this problem in actual code.
The proper way to wait for a Job
’s completion is to call job.join()
.
The .join()
function will suspend the calling coroutine until the joined Job
is fully completed. Keep in mind that .join()
will always continue normally, regardless of how the joined Job
has been completed, as long as the Job
of the calling coroutine is active.
Here is how the proper cancellation of the parentJob
in our example would look like:
parentJob.cancel()
parentJob.join()
println("Main is done!")
Moreover, this is such a common practice in coroutines that the library offers an extension function Job.cancelAndJoin()
, which does precisely that - calls .cancel()
and .join()
immediately afterward.
parentJob.cancelAndJoin()
println("Main is done!")
By calling cancelAndJoin(),
we are cancelling our job and suspending the calling coroutine until the job has finished all the cancellation work.
Understanding suspension points
As stated earlier, the CancellationException
is generally thrown at the first suspension point that cooperates with cancellation.
While a suspension point doesn’t necessarily mean your code will automatically cooperate with cancellation, all suspend
functions (with one exception that I am aware of, which we will discuss in a bit) from the coroutines library are safe to cancel. Therefore, it is crucial to quickly identify them in your code to avoid redundant cooperation with cancellation.
The easiest way to identify suspension points is with help from the IDE. If you are writing Kotlin code, chances are you are using an IntelliJ IDEA based IDE, in which case suspension points are conveniently displayed in the gutter:
IntelliJ IDEA marks every call to a suspend
function with a special arrow symbol.
While every suspend
function in the coroutines library is a valid suspension point and is cancellable, marking your own functions with the suspend
modifier does not mean it will necessarily suspend a coroutine or make it cancellable.
Here is an example:
suspend fun doStuff() {
println("I do stuff!")
}
This function does not have any suspension points, and the IDE will warn you about the redundant suspend
modifier. That said, if you call this function, the IDE will display the suspension symbol in the gutter regardless.
For every rule, there is an exception
While the example above doesn’t support cancellation because we wrote a lousy suspend
function, there is a case when cancellation is not supported by a suspending function from the coroutines library by design.
This exception is the suspendCoroutine
function. Diving deep into the specifics of this function is out of the scope of this article. Still, it suspends current execution and runs a non-suspend block, allowing to explicitly continue the coroutine execution with a series of callbacks.
This functionality is needed to bridge the suspending world with other asynchronous solutions. However, there is a better function for that, which we will discuss in a minute.
For now, let’s take a look at this example:
private val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)
fun main() = runBlocking {
val job: Job = scope.launch {
while (true) {
doStuff()
}
}
delay(5)
println("Cancelling the job!")
job.cancelAndJoin()
println("Main is done!")
}
suspend fun doStuff() {
// this is proper suspension point
// but it will not throw a
// CancellationException
suspendCoroutine<Unit> { continuation ->
continuation.resume(Unit)
}
println("I do stuff!")
}
If we run this code, we will get:
...
I do stuff!
I do stuff!
I do stuff!
...
// Forever and ever, since we have
// joined the job after cancel
This job will never get cancelled, and the runBlocking
coroutine will continue waiting indefinitely since we have joined the cancelled Job
.
This function is not cancellable because there is a very similar function - suspendCancellableCoroutine
, which, as its name suggests, supports cancellation while providing the same functionality.
Here is the adjusted doStuff()
function:
suspend fun doStuff() {
suspendCancellableCoroutine<Unit> { continuation ->
continuation.resume(Unit)
}
println("I do stuff!")
}
Now, if we rerun the code, we will get a desired and predictable outcome:
...
I do stuff!
I do stuff!
Cancelling the job!
I do stuff!
I do stuff!
...
Main is done!
In this case, a suspendCancellableCoroutine
will throw a CancellationException
at its suspension point as soon as the Job
gets cancelled, just as it should.
We can also achieve the same correct cancellation behavior by calling any other
suspend
function from the coroutines library (or that cooperates with cancellation), for example,delay(1)
.
Because of this behavior, there is little reason to use suspendCoroutine
- a suspendCancellableCoroutine
is always a safer option. Apart from cooperation with cancellation, suspendCancellableCoroutine
provides a invokeOnCancellation { }
callback that can be used to clean up resources.
That said, given that these functions are primarily used to write adapters to other asynchronous libraries, you will probably never use them. Almost all adapters you would ever need are already provided by either the coroutines team or the libraries' authors.
However, it is still important to be aware of this behavior for learning purposes.
Cooperating with cancellation
While suspend
functions coming from the coroutines library are safe to cancel, you should always think about cooperating with cancellation when writing your own code.
There are a couple of ways to do that.
Checking the state of a Job
One way to make your code cancellable is to explicitly check the current Job
's state.
The most convenient way to do this is by using the CoroutineScope.isActive
extension property.
We have done it numerous times in examples throughout this series with while (isActive)
loops.
Example:
scope.launch {
// a periodical work
while (isActive) {
// do work
delay(1000)
}
}
There is a similar extension CoroutineContext.isActive
, which works exactly the same way. Both these functions check the isActive
property on the underlying Job
.
Also, keep in mind that you can also check the isCancelled
property if you have a reference to the Job
. However, in most cases it is redundant and there are no corresponding extension functions for either CoroutineScope
or CoroutineContext
.
ensureActive()
Another common way to check for cancellation is to call ensureActive()
, which is an extension function available for Job
, CoroutineScope
, and CoroutineContext
.
This function is a great option for cases where you would otherwise write a if (isActive)
statement. Moreover, under the hood ensureActive
does just that, but it also throws a CancellationException
for good measure:
public fun Job.ensureActive(): Unit {
if (!isActive) throw getCancellationException()
}
Note: This function uses
getCancellationException()
to include the original cause of cancellation.
Just like the isActive
check, ensureActive()
is mostly used inside coroutines.
Example:
scope.launch {
// a long running for loop
// without suspension points
for (item in items) {
ensureActive()
println("Processing item...")
}
}
Alternatively, you can use ensureActive()
on a coroutineContext
inside suspend
functions, since, as opposed to checking isActive
, it throws a CancellationException
and will stop the execution:
suspend fun doSomeWork() {
// do some work
coroutineContext.ensureActive()
// do some more work
}
yield()
From the documentation: Yields the thread (or thread pool) of the current coroutine dispatcher to other coroutines on the same dispatcher to run if possible.
The purpose of the yield()
function is to free up the current thread to allow other coroutines to run on it. In practice, it suspends current execution and immediately resumes it.
In general, any suspended coroutine is not guaranteed to resume running on the same thread if the dispatcher allows it.
It can be beneficial to use yield()
during CPU heavy work or during work that can exhaust the thread pool.
However, it is also common practice to use yield()
to cooperate with cancellation.
Example:
suspend fun doHeavyWork() {
withContext(Dispatchers.Default) {
repeat(1000) {
yield()
// do heavy work
}
}
}
It can also be used in coroutines instead of ensureActive()
if you need the added benefit of yielding the thread.
Cleaning up
Sometimes you have to clean up resources when a coroutine is cancelled. Luckily, it is quite easy, since cancellation throws a CancellationException
and like any other exception, we can catch it in a try-catch
block and do necessary clean up in a finally
block.
However, before we discuss cleaning up in more detail, I want to address a common mistake that can be overlooked when using a try-catch
block in coroutines.
Interrupted cancellation
Take a look at the following example:
val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)
fun main() = runBlocking {
val job = scope.launch {
try {
println("Doing work...")
delay(Long.MAX_VALUE)
} catch (e: Exception) {
println("The work was interrupted!")
}
println("How am I still running?")
}
delay(100)
job.cancelAndJoin()
println("Main is done!")
}
Output:
Doing work...
The work was interrupted!
How am I still running?
Main is done!
As you see, even though we cancel our Job
, it still completes until the end.
Let’s break down why this happens.
When we call job.cancelAndJoin()
the job goes into the Cancelling state, and since at that moment the coroutine is suspended by delay(Long.MAX_VALUE)
, which is a cancellable suspend
function, this suspension point throws a CancellationException
.
Since it is a sub-type of Exception
, we catch it inside the try-catch
block and handle it manually.
After that, the coroutine will continue its execution, albeit in the Cancelling state. It means that for it to throw another CancellationException
, it has to reach another suspension point. And given that in our example we don’t have any after the try-catch
block, and we don’t explicitly check for cancellation, the coroutine will run all the way until the end.
This is a consequence of managing explicit cancellation with exceptions, and we have to be aware of this.
If we know that our coroutine can be explicitly cancelled and we are using a try-catch
block, the common practice is to always rethrow a CancellatonException
.
try {
println("Doing work...")
delay(Long.MAX_VALUE)
} catch (e: Exception) {
println("The work was interrupted!")
if (e is CancellationException) {
throw e
}
}
With this adjustment in place, our coroutine will get cancelled as expected:
Output:
Doing work...
The work was interrupted!
Main is done!
Tools for cleaning up
With that little detail out of the way, let’s take a look at how we can go about cleaning up resources after cancellation.
If we need to do clean up in case of a cancellation only, we can catch a CancellationException
and do necessary work inside the catch
block:
try {
println("Doing work...")
delay(Long.MAX_VALUE)
} catch (e: CancellationException) {
// do clean up
// and don't forget to rethrow the CancellationException
throw e
}
Alternatively, we can do clean up regardless of the outcome, using a finally
block:
try {
println("Doing work...")
delay(Long.MAX_VALUE)
} finally {
// do clean up
}
That said, there is a catch (as there always seems to be with the coroutines library).
At this point, our Job
is in the Cancelling state, which means it can no longer suspend execution. Therefore, any attempt to call a suspend
function, for example, inside the finally
block, will throw another CancellationException
. Still, we might need to run some suspending functions as a part of a clean-up logic after cancellation.
Luckily, the coroutines library has a tool that allows doing just that - a special Job
called NonCancellable
.
The NonCancellable
job is always active and is designed for use with the withContext
function to handle cases as described above. withContext(NonCancellable)
will switch to this non-cancellable job before checking for cancellation. Therefore it can be called even from a cancelled coroutine.
That said, let’s summarise everything we have learned about cleaning up with the following example:
val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)
fun main() = runBlocking {
val job = scope.launch {
try {
println("Doing work...")
delay(Long.MAX_VALUE)
} catch (e: CancellationException) {
println("The work was cancelled!")
throw e
} finally {
withContext(NonCancellable) {
delay(1000)
println("Did some suspending clean up")
}
println("Clean up is done!")
}
println("I will never be printed...")
}
delay(100)
job.cancelAndJoin()
println("Main is done!")
}
Output:
Doing work...
The work was cancelled!
// delaying 1 sec
Did some suspending clean up
Clean up is done!
Main is done!
A word of caution, however. Since NonCancellable
is a Job
, it can be used as a part of any CoroutineContext
given its flexible API. But it was designed for use with the withContext
function only. So using it with coroutine builders launch
or async
(you should never use any Job
there anyways, as described in Part 3 of this series) will break structured concurrency in every possible way and should never be done.
Cancellation after a timeout
In some cases, you might want to cancel some work after a timeout. For that, the coroutines library provides a very convenient function withTimeout
. Here’s its declaration:
public suspend fun <T> withTimeout(timeMillis: Long, block: suspend CoroutineScope.() -> T): T
It runs a suspending block, and after a specified timeout, it throws a TimeoutCancellationException
, a subtype of CancellationException
.
There is also a less aggressive version of this function - withTimeoutOrNull
, which after a specified timeout, cancels its block and returns null
without throwing an exception.
These functions are suspending scope functions by nature. They provide a scope just like coroutineScope
, supervisorScope
, and withContext
functions. To be more specific, it will behave exactly like a coroutineScope
function, only with a timeout.
Example:
fun main(): Unit = runBlocking {
launch {
try {
withTimeout(100) {
delay(400)
}
} catch (e: TimeoutCancellationException) {
println("The coroutine has timed out!")
}
}
}
Output:
The coroutine has timed out!
When using the withTimeout
function, after the timeout, it will cancel the parent coroutine (which will also cancel all its children) if the exception wasn’t explicitly handled. If you want other behavior or a better clean-up logic without using try-catch
, consider using withTimeoutOrNull
instead.
Conclusion
The concept of cooperative cancellation is not something we have to deal with often in other asynchronous solutions. Therefore, it might prove challenging to newcomers and seasoned developers alike.
Neglecting basic principles of cooperative cancellation can lead to memory leaks, wasted resources, and subtle bugs. Because of that, it might seem that it is too much hassle for a feature that shouldn’t be that complicated.
And it is a fair point. The coroutines library has quite a steep learning curve.
However, in my opinion, the resulting API of launching coroutines and elegant solution to establishing parent-child relationships is something that JetBrains deserve an applaud for, and it is a fair trade-off, all things considered.
That said, I have a couple of topic ideas for the next part of this series, but I will have to sleep on that, so no promises.
See you next time.
Your friend,
Max