Things every Kotlin Developer should know about Coroutines. Part 3: Structured Concurrency.

Things every Kotlin Developer should know about Coroutines. Part 3: Structured Concurrency.

·

8 min read

If you watch a coroutines video dated before their stable launch in Kotlin 1.3, you might notice that the way coroutines were used in their "experimental" phase is a bit different from now.

For example, at KotlinConf 2017, in his Introductions to Coroutines, Roman Elizarov presented a simple code snippet on how to launch a coroutine:

// fire and forget a coroutine! 🚀
fun postItem(item: Item) {
    launch {
        val token = requestToken()
        val post = createPost(token, item)
        proccessPost(post)
    }
}

Nowadays, this code won't even compile because not long before its stable release, the coroutines library underwent a major design shift called Structured Concurrency.

Understanding Structured Concurrency

If you are coming from Part 2 of this series, you should already understand structured concurrency and why we need it. However, let’s recap and elaborate on this concept, so everyone is on the same page.

The idea behind structured concurrency is quite simple, regardless of how intimidating it might sound. It requires every coroutine to run in a defined scope that manages a parent-child relationship between coroutines.

This achieves a couple of things:

  • Coroutines are always accounted for and never leak resources;
  • Parent coroutines always wait for children to complete, which makes concurrency predictable;
  • Exceptions are always propagated and never lost;
  • Cancellation is easy and doesn't require passing cancellation tokens to children like many other asynchronous solutions.

Here is an example of structured concurrency:

private val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() = runBlocking {
    val mainJob = scope.launch {
        println("Starting the main job!")

        launch {
            while (isActive) {
                delay(10)
                println("I am child 1!")
            }
        }

        launch {
            while (isActive) {
                delay(20)
                println("I am child 2!")
            }
        }
    }
    mainJob.invokeOnCompletion {
        println("The main job is completed/cancelled!")
    }

    delay(50)

    // this will cancel the main coroutine
    // and all its children
    scope.cancel()

    delay(500)
    println("Finishing main()...")
}

Output:
Starting the main job!
I am child 1!
I am child 2!
I am child 1!
I am child 1!
I am child 2!
I am child 1!
The main job is completed/cancelled!
Finishing main()...

Among other things, structured concurrency allows cancelling all running coroutines and their children easily. This is just an illustration of this concept, and we will dive deep into cancellation in Part 5 of this series.

The legacy woes

However, given that structured concurrency was a pretty late addition to the coroutines, some legacy woes still haunt the developers of the coroutines library, while confusing less experienced adopters.

One such woe is the GlobalScope. As we have discussed in Part 2, it doesn't have a Job attached to it, and coroutines launched inside the GlobalScope will not adhere to the structured concurrency principle.

It was introduced as an “easy way” to migrate from the legacy unstructured way of launching coroutines as described in the introduction of this article to the version of the coroutines library where launching a coroutine requires a scope.

In Kotlin 1.5, GlobalScope was marked with @DelicateCoroutinesApi annotation to discourage developers from using it. Moreover, JetBrains are planning to remove this API altogether in later releases as a part of gradual removal of APIs that allow for unstructured concurrency.

Here is an example, where we attempt to use the GlobalScope as we would a normal scope:

@OptIn(DelicateCoroutinesApi::class)
fun main() = runBlocking {
    GlobalScope.launch {
        while (isActive) {
            delay(10)
            println("I'm running!")
        }
    }

    delay(100)

    println("Cancelling GlobalScope!")
    GlobalScope.cancel()

    delay(500)
}

Even though this code looks like it makes sense, running it will throw an exception:

IllegalStateException: Scope cannot be cancelled 
because it does not have a job: kotlinx.coroutines.GlobalScope@1a93a7ca

The GlobalScope doesn't allow for a structured way to manage its coroutines. It has to be done manually, which is tedious and error-prone.

Another legacy woe is the flexibility of the CoroutineContext, which, as you now know, mirrors a functionality of a map and doesn't impose any restrictions on how you can use it.

Here is a very informative GitHub issues thread that illustrates one of the problems this can lead to.

Let’s take a look at the example from that thread (with minor tweaks):

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() = runBlocking {
    val childJob = Job()

    val mainJob = scope.launch {
        println("Starting the main job!")
        launch(childJob) {
            while (isActive) {
                delay(100)
                println("Scope is active: ${scope.isActive}")
            }
        }
    }
    mainJob.invokeOnCompletion {
        println("The main job is completed/cancelled!")
    }

    scope.cancel()

    delay(500)
    println("Finishing main()...")
}

Taking into account the philosophy behind structured concurrency, one might assume that cancelling the scope in this example will cancel both the mainJob and the childJob.

However, if we run the code:

Starting the main job!
The main job is completed/cancelled!
Scope is active: false
Scope is active: false
Scope is active: false
Scope is active: false
Finishing main()...

As you can see, structured concurrency is not a very rigid concept. We can easily break it (in most cases unintentionally), given the very flexible API of the coroutines library.

However, the behavior demonstrated in this example is intentional - by introducing a new Job into the coroutineContext, we explicitly override the parent-child relationship.

That said, in the above-mentioned GitHub thread, Roman Elizarov has provided two very good rules of thumb to make using coroutines predictable and avoid unintentional leaks and behaviors:

  1. When using a CoroutineContext you should not have a Job there.
  2. When using a CoroutineScope you should always have a Job there.

The above rules are slightly edited to improve readability.

He admitted that if they had designed the coroutines library today, they would have restricted this API. And at the moment he doesn't have a good solution for this problem, except for following the rules as mentioned above.

Structured concurrency in practice

By following the recommendations outlined in this series, beginners won't get in trouble when using coroutines and won't break structured concurrency. However, when writing their own coroutine APIs, more advanced developers must know how to follow the principles of structured concurrency in practice.

In the following section, I would like to discuss an example from Android’s lifecycle library and illustrate this point. You don't have to be familiar with Android to follow this example since my goal is to demonstrate a case of broken structured concurrency, not focus on Android’s APIs.

This example might seem quite advanced for some readers, but if this is the case, at least be aware that these considerations exist, and even engineers at Google sometimes stumble when designing coroutines APIs.

The story of a broken API

The full story is outlined in this article by Manuel Vivo.

In Android, we have to be constantly aware of the lifecycle of our UI components since they can be destroyed and recreated quite often. Because of that, we need a robust solution that doesn't leak resources when using coroutines from such components.

One of these solutions provided by the lifecycle library is repeatOnLifecycle, which can be used as follows:

class MyActivity : AppCompatActivity() {
    private val viewModel by viewModels<MyViewModel>()

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        lifecycleScope.launch {
            repeatOnLifecycle(Lifecycle.State.STARTED) {
                viewModel.uiStateFlow.collect {
                    // Collect ui state 
                }
            }
        }

        lifecycleScope.launch {
            repeatOnLifecycle(Lifecycle.State.STARTED) {
                viewModel.someOtherFlow.collect {
                    // Collect some other data 
                }
            }
        }
    }
}

In this example, lifecycleScope is a scope provided by the lifecycle library. It follows the lifecycle of the given LifecycleOwner. If you are an Android developer and want to know why we cannot rely on that alone or on lifecycleScope.launchWhenStarted { ... } when collecting flows, you can refer to this article.

As you can see, this API is quite cumbersome. If we need to collect from multiple flows, we have to launch a new coroutine for each one since collect will suspend the coroutine it runs in. At first glance, it looks like boilerplate and can be improved with Kotlin’s extension functions.

Following the same logic, in one of the alphas of the lifecycle library, Google had introduced the LifecycleOwner.addRepeatingJob API, which looked like this:

public fun LifecycleOwner.addRepeatingJob(
    state: Lifecycle.State,
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    block: suspend CoroutineScope.() -> Unit
): Job = lifecycleScope.launch(coroutineContext) {
    repeatOnLifecycle(state, block)
}

It hid the boilerplate and allowed for a seemingly cleaner code:

class MyActivity : AppCompatActivity() {
    private val viewModel by viewModels<MyViewModel>()

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        // AppCompatActivity is a LifecycleOwner
        addRepeatingJob(Lifecycle.State.STARTED) {
            viewModel.uiStateFlow.collect {
                // Collect ui state 
            }
        }

        addRepeatingJob(Lifecycle.State.STARTED) {
            viewModel.someOtherFlow.collect {
                // Collect some other data 
            }
        }
    }
}

However, if we look deeper, this API is quite problematic. It can break structured concurrency and introduce subtle bugs.

While repeatOnLifecycle is a regular suspending function, the addRepeatingJob is a non-suspending function that launches a new independent coroutine in the lifecycleScope. Still, developers might be tempted to write code like this:

val job = lifecycleScope.launch {
    addRepeatingJob(Lifecycle.State.STARTED) {
        viewModel.someDataFlow.collect {
            // Collect some data
        }
    }
}

// ...if something went wrong
job.cancel()

If you understand how structured concurrency works, you know what will happen.

The someDataFlow.collect { } will not get cancelled after job.cancel() is called, although that would be a developer’s intention.

Here is how this example would look in a simplified code for better understanding:

val scope = CoroutineScope(SupervisorJob() + Dispatchers.Default)

fun main() {
    runBlocking {
        val mainJob = scope.launch {
            println("Starting the main job!")
            // this is an independent coroutine,
            // not a child coroutine
            scope.launch {
                while (isActive) {
                    delay(100)
                    println("I'm alive!!!")
                }
            }
        }
        mainJob.invokeOnCompletion {
            println("The main job is completed/cancelled!")
        }

        delay(100)

        mainJob.cancel()

        delay(500)
        println("Finishing main()...")
    }
}

Output:
Starting the main job!
The main job is completed/cancelled!
I'm alive!!!
I'm alive!!!
I'm alive!!!
I'm alive!!!
I'm alive!!!
Finishing main()...

Similar to the example where we introduced a new Job into the context, there is no parent-child relationship between these two coroutines, which might not be immediately apparent to less experienced developers.

Because of this issue, Google removed the addRepeatingJob API from the lifecycle library.

This example illustrates why structured concurrency is crucial when using coroutines and should always be kept in mind.

Conclusion

The coroutines library went through some growing pains and is still evolving. Even though structured concurrency is the underlying principle of the modern coroutines API, there are still many ways to break it.

Therefore, it is important to be aware of the unstructured concurrency pitfalls so you can avoid them and write better code. Hopefully, this article managed to shine some light on this less-discussed side of the coroutines library.

If you are interested to learn more about how Kotlin coroutines were designed and how structured concurrency came about, I highly recommend watching this presentation from Roman Elizarov.

That said, in Part 4, we will dive deep into error handling in coroutines.

See you then.

Your friend,

Max

Did you find this article valuable?

Support Max Kim by becoming a sponsor. Any amount is appreciated!