Oleg Andreev



Software designer with focus on user experience and security.

You may start with my selection of articles on Bitcoin.

Переводы некоторых статей на русский.



Product architect at Chain.

Author of Gitbox version control app.

Author of CoreBitcoin, a Bitcoin toolkit for Objective-C.

Author of BTCRuby, a Bitcoin toolkit for Ruby.

Former lead dev of FunGolf GPS, the best golfer's personal assistant.



I am happy to give you an interview or provide you with a consultation.
I am very interested in innovative ways to secure property and personal interactions: all the way from cryptography to user interfaces. I am not interested in trading, mining or building exchanges.

This blog enlightens people thanks to your generous donations: 1TipsuQ7CSqfQsjA9KU5jarSB1AnrVLLo

A history of concurrent software design

Early approaches to concurrency

When machines were big and slow, there was no concurrency in software. Machines got faster and people figured out how to make multiple processes running together. Concurrent processes proved being extremely useful and the idea was brought further to the per-process threads. Concurrency was useful because it powered graphical interactive applications and networking systems. And those were becoming more and more popular and more advanced.

For some tasks concurrent processes and threads presented very difficult challenges. The threads participate in a preemptive multitasking, that is the system where the threads are forced-switched by the kernel every N milliseconds. At the same time, the threads has a shared access to the files, system interfaces and in-process memory. The threads do not know when they are about to be switched by the system, which makes it difficult to safely retain and release control over the shared resources. As a partial solution, different sorts of locks where invented to make multi-threaded programs safe, but those didn’t make the work any easier.

A typical code in a multi-threaded environment:

prepareData();
lock(firstResource);
startFirstOperation();
unlock(firstResource);
prepareMoreData();
lock(secondResource);
startSecondOperation();
unlock(secondResource);
finish();


Modern concurrency

Next approach to concurrency was based on a realization that the problem of shared resources lays in the very definition of the “shared”. What if you create a resource with a strictly ordered access to it? Sounds counter-intuitive: how can this be concurrent? Turns out, if you design the interface like a message box (that is only one process reads it and nobody blocks waiting for a response), you may build many of such resources and they will work concurrently and safely. This idea was implemented in many sorts of interfaces: unix sockets, higher-level message queues and application event loops. Finally, it found its way into the programming languages.

Probably, the most wide-spread programming language today, JavaScript, features function objects that capture the execution state for later execution. This greatly simplifies writing highly concurrent networking programs. In fact, a typical JavaScript program runs on a single thread, and yet it can control many concurrent processes.

Mac OS X 10.6 (Snow Leopard) features built-in global thread management mechanism and language-level blocks making writing concurrent programs as easily as in JavaScript, but taking advantage of any amount of available processing cores and threads. It is called Grand Central Dispatch (GCD) and what it does is perfectly described by a “message box” metaphor. For every shared resource you wish to access in concurrent and non-blocking way, you assign a single queue. You access a resource in a block which sits in the queue. When the block is executed, it will have an exclusive access to the resource without blocking anybody else. To access another resource with the results of the execution, you will have to post another block to another queue. The same design is possible without blocks (or “closures”), but it turned to be more tedious and limiting and resulting in less concurrent, slower or unstable programs.

The modern concurrent code looks like that:

prepareData();
startFirstOperation(^{
  prepareMoreData();
  startSecondOperation(^{
    finish();
  })
})

Every call with a block starts some task in the other thread or at a later time. The block-based API has two major benefits: the block has access to the lexically local data and executes in a proper thread. That is it eliminates the need for explicit locks or moving and storing the local data explicitly just for making it available in a proper thread.

Think of it this way: every block of code inside the curly brackets is executed in parallel with the code it was created in.


Future directions

The upcoming generation of software already is or will be written this way. But block-based approach still isn’t perfect. You have to manage queues and blocks explicitly. Some experimental languages and systems already have a transparent support for “continuations”: that is the code looks linear, in a blocking fashion, but the process jumps between different contexts and never blocks any threads:

prepareData();
startFirstOperation();
prepareMoreData();
startSecondOperation();
finish();

This is much more natural and looks like a naïve approach which we started with and fixed with the locks. However, to make it work concurrently we have to learn GCD and take it to the next level.

When you start some operation which operates on a different resource and can take some time, instead of wrapping the rest of your code within a block, you put the current procedure in a paused state and let the other procedure to resume it later.

Imagine that instead of the discrete blocks of code, the kernel manages continuously executed routines. These routines look very much like threads with an important exception: each routine gives up the execution voluntary. This is called cooperative multitasking and such routines are called coroutines (рус. сопрограммы). Still, though, each routine can be assigned to a thread just like a block or be rescheduled from one thread to another on demand. So we retain the advantage of the multi-processing systems.

Example: you have a web application which does many operations with shared resources: reads/writes to a database, communicates with another application over the network, read/writes to the disk and finally streams some data to the client. All the operations should usually be ordered for each request, but you don’t want to make thread wait each time you have some relatively long-running operation. Also, it is not efficient to run multiple preemptive threads: there is a cost of switching the threads and you get all sorts of troubles with random race conditions. GCD and blocks help for the most part, but if you use them to make every single operation on a shared resource, you will get an enormously deep nested code. Remember: even writing to a log means accessing a shared file system which better be asynchronous.


15 years later

Today, a lot of trivial operations like writing to a disk or accessing a local database do not deserve asynchronous interfaces. They seem fast enough and you still can drop more threads or CPU to make some things faster. However, the coroutines will make even these trivial tasks asynchronous and sometimes a little bit faster. So why is that important anyway?

The coroutines are important because every shared resource will get its independent, isolated coroutine. That means, every resource will have not only private data and private functionality, but also a private right for execution. The whole resource will be encapsulated as good as any networking server. The file system, every file, every socket, external device, every process and every component of an application will have a coroutine and a complete control on when to execute and not execute. This will mean that there is no need for a shared memory and a central processor. The whole RAM+CPU tandem can be replaced with a GPU-like system with hundreds of tiny processors with private memory banks. The memory access will become much faster and the kernel will not need to waste energy switching threads and processes.

A single design change which makes programming easier will make a shift to much-much more efficient architecture possible. It won’t be just faster, it will be efficient: while the servers could be 100 times more productive, the personal devices could be 10 times faster while consuming 10 times less energy.


30 years later

By the time operating systems will support coroutines and a truly multi-processor architecture, new applications will emerge with capabilities we can only dream about. Things like data mining, massive graphics processing and machine learning work mostly in the huge data centers. Twenty years later this will be ubiquitous just like a 3D games on the phones today. These task will require more memory space. Finally, the common storage memory will be merged with RAM and processor and processing of huge amounts of data will become much more efficient.

Given such a great advance in technology, humanity will define its unpredictably unique way to educate and entertain itself. As we get closer to that time, it will become more clear what is going to be next.