Concurrency and Parallelism: Understanding I/O

Original Post

Concurrency is much border, genral problem than parallelism. If you have tasks having inputs and outputs, and you want to schedule them so that they produce correct results, you are solving a concurrency problem.

Take a look at this diagram:

It shows a data flow with input and output dependencies. Here tasks 2, 3, 4 can run concurrently after 1. There is no specific order between them, so we have multiple alternatives for running it sequetially. Showing only two of them:

Alternatively, these tasks can run in parallel, e.g. on another processor core, another processor, or an entirely seperate computer.

On these diagrams, thread means a computation carried out on dedicated processor core, not an OS thread, as they are not necessarily parallel.

If we only have one processor, why do we even bother with writing concurrent application? The processing time will not get shorter, and we add the overhead of scheduling. As a matter of fact, any modern operating system will also slice up the concurrent tasks and interleave them, so each of the slices will run for a short time.

There are various reasons for this:

We human like to interact with the computer in real time, e.g. as I type this text, I want to see it appearing on the screen immediately, at the same time listening to my favorite tracklist, and getting notifications about my incoming emails. Just imagine that you cannot drag a window while the movie keeps on playing it.
Not all operations are carried out on the computer’s CPU. If you want to write to an HDD for example, a lot of time is spent seeking to the position, writing the sectors, etc, and the intermittent time can be spent to do something else.

These require the operating system kernel to run tasks in an interleaved manner, referred to as time-sharing. This is a very important property of modern operating systems.

Processes and Threads

A process is a running instance of a computer program. It is what you see in the task manager of your operating system or top.

A process consists of allocated memory which holds the program code, its data, a heap for dynamic memory allocation, and a lot more. However, it is not the unit for multi-tasking in desktop operating systems.

Thread is the default unit - the task - of CPU usage. COde executed in a single thread is what we usually refer to as sequential or synchronous execution.

Threads are supported by nearly all operating systems and can be created with system calls. They have theri own call stack, virual CPU and local storage but share the application’s heap, data, codebase and resources.

They also serve as the unit of scheduling in the kernel. For this reason, we call them kernel threads, clarifying that they are native to the operating system and scheduled by the kernel, which distinguishes them from use-space threads, also called green threads which are scheduled by some user space schedular such as a library or VM.

Most desktop and sever operating system kernels use preemptive schedulers, as does the Linux, macOS and Windows kernel. We can assume that threads are preemptive scheduled, distinguishing them from their non-preemptive(cooperative) counterparts, called fiber. This preemptive scheduling is the reason that a hanging process doesn’t stall the whole computer.

The hanging time slices are interleaved with other processes’ and the OS’ code, so the system as a whole remains responsive.

Context switching(switching between threads) is done at frequent intervals by the kernel, creating the illusion that our programs are running in parallel, whereas in reality, they are running concurrently but sequentially in short slices.

CPU vs. I/O

Programs usually don’t only consist of numeric, arithmetic and logic computations, in fact, a lot of times they merely write something to the file system, do network requests or access peripheries such as the console or an external device.

While the first kind of workload is CPU intensive, the latter requries performing I/O in the majority of the time.

Doing I/O is a kernel space operation, initiated with a system call, so it results in a privilege context switch.

When an I/O operation is requested with a blocking system call, we are talking about blocking I/O.

This can deteriorate concurrency under implementation, concretely those that use many-to-one mapping. This means that all threads in a process share a common kernel thread, which implies that every thread is blocked when one does blocking I/O.

No wonder that modern OSes don’t do this. Instead, they use one-to-one mapping, i.e. map a kernel thread to each user-space thread, allowing another thread to run when one makes a blocking system call, which means that they are unaffected by the above adverse effect.

I/O favors: Blocking vs. Non-Blocking, Sycn vs. Async

Doing I/O usually consiss of two distinct steps:

checking the device
- blocking: waiting for the device to be ready
- non-blocking: e.g. polling periodically until ready
transmitting:
- synchronous: executing the operation (e.g. read or write) initiated by the program.
- asynchronous: executing the operation as response to an event from the kernel(asynchronous/event drive)

You can mix the two steps in every fashion

Busy-Waiting, Polling, and the Event Loop

Busy-waiting is the act of repeatedly checking a resource, such as I/O for availability in a tight loop. The absence of the tight loop is what distinguishes polling for busy-waiting.

// tight-loop example
while (pthread_mutex_trylock(&my_mutex) == EBUSY) { }
// mutex is unlocked
do_stuff()

// polling example
while (pthread_mutex_trylock(&my_mutex) == EBUSY) {
  sleep(POLL_INTERVAL)
}
// mutex is unlocked
do_stuff()

The difference between the two code is apparent. The sleep function puts the current thread of execution to sleep, yield control to the kernel to schedule something else to run.

It is also obvious that both of them offer a technique of turning non-blocking code into blocking code, because control won’t pass the loop until the mutex becomes free. This means that do_stuff is blocked.

Let’s say we have more of thse mutexes or any arbitrary I/O device that can be polled. We can invert control-flow by assigning handlers to be called when the resource is ready. If we periodically check the resources in the llop and execute the associated handlers on completion, we created what is called an event loop.

TCP server example

The following exmaple will illustrate the differences between working with synchronous, blocking and non-blocking network I/O. It is a dead-simple TCP echo server. After the client connects, every line is echoed back to the socket until the client write ‘bye’.

Single threaded

The first version uses the standard POSIX procedure of sys/socket.h.

The server is single-threaded, it waits until a client connects.

// wait for a connection, then accept() it
if ((conn_s) = accept(list_s, NULL, NULL) < 0) {
  // exit w err
}

Then it reads from the socket each line and echoes it back until the client closes connection or prints the word ‘bye’ on a line.

bye = 0
while(!bye) {
  read_line_from_socket(conn_s, buffer, MAX_LINE - 1)
  if (!strncmp(buffer, 'bye\n', MAX_LINE - 1)) bye = 1
  write_line_to_socket(conn_s, buffer, strlen(buffer))
}

if (close(conn_s) < 0) {
  // exit w err
}