Why We Find Concurrency So Hard - Programming on a Multicore Machine with a Single-core Mindset

In this my first blog (or should I say mental ramble), the context of my comments is the business programming community in general, not the specialist areas of computing such as machine vision, artificial intelligence, computer animation or simulation of physical processes.

I particularly address those involved in the management of business software design and development.

But before I start my mental meandering, allow me to set the scene...

Back In The [Good|Bad]* Old Days

*Delete as appropriate

If I go right back to my earliest memories of instructing a computer to do stuff, then I have about 35 years programming experience. I wrote my first programs in BASIC on a Commodore PET with 4Kb of RAM and a cassette player for persistent storage. At the time I remember thinking "Who could write a program as big as 4Kb?"

Back then, programming was pretty simple insomuch as your task consisted of simply having to solve the immediate problem at hand. It never crossed my teenage mind that there might be other issues to consider such as security: my user was authorised to perform any task I wanted simply because the concept of "users" didn't exist, or protecting against buffer-overrun attacks: there was no network connection. Neither did I have to care about whether my program has exceeded its memory allocation because I had the whole computer to myself (although the 4Kb limit eventually turned out to be somewhat restrictive).

My naive little programming world consisted of:

  • A single, very slow (1 MHz MOS 6502) processor
  • 4Kb of RAM
  • No hard disk
  • No network connection
  • No concept of users or authorisation
  • And no need to worry about threads, task schedulers, mutexes or shared state.
  • For all the PET's limitations, we were able to get Space Invaders running with sound and have some spare memory (so there!)

These Times They Are A Changin'

Now however, the relentless march of technology (which is apparently going in the right direction - or so I'm told) has brought us to the point where everything in the computing world has changed. It is now common place to use machines with:

  • Multiple, multi-core, high-speed processors
  • Huge quantities of RAM
  • Virtually unlimited disk storage
  • High-speed access to a vast interconnected network of other computers holding pretty much the sum total of humanity's accumulated knowledge
  • And all the headaches associated with getting these vastly more powerful machines to operate in the most efficient manner

This computing power applies not only to the computers we have on our desks and in our server rooms, but also to the little computers we carry around in our pockets - you know, the ones that also have the ability to make phone calls and send text messages...

Not only has the power of these devices increased exponentially, their degree of connectedness has also increased exponentially.

In addition to this, we have progressed to the point now that the quantity of data we can store is, for all practical purposes, limitless. Just to put this into perspective; all books that have ever been printed would require about 138Tb of storage (that's less storage than in the average data centre) and the output of the world's various film industries occupies about 12Pb. And CPU power is heading in the same direction.

So when considering software design, it seems pointless to me to continue focusing on the previous issues created by a lack of storage or available processor power. Nowadays, these resources can be treated as almost unlimited.

What we should be focussing on however, are the following points:

  1. The exponential growth in the number of mobile devices has placed such a huge demand on our networking infrastructure, that this aspect of the physical infrastructure is the last remaining barrier to limitless computing power. In this blog, I won't be dealing with this aspect of the situation so much, suffice it to say that shunting data from one place to another is still the most expensive and energy-hungry aspect of modern computing.
  2. What I want to deal with here in this blog however, is to examine why the speed and efficiency of our software has not improved at the same rate as the hardware upon which it runs, and what we can do about it.
    For those people that pay attention to such things, you'll probably have noticed that older software designed for single core processors runs no faster (or possibly runs slower) on your shiny new iWatchamaCallIt with its four quad-core, hyper-threading, Intel i7 processors and glow-in-the-dark go-faster stripes.

So what's the issue here?

In short: writing fault-tolerant, scalable software that handles concurrency efficiently.

At this point, many ask "Isn't that the same as all that parallel processing stuff?"

No, not exactly. Parallelism and concurrency are closely related but are not the same thing.

Concurrency means "performing multiple, separate tasks all at once".

Whereas parallelism means "breaking a single, large task into multiple, independent parts and then performing those parts simultaneously".

But I Don't Have To Care About All That Concurrency Stuff; The Hardware Will Do It For Me, Won't It...?

Yes, but only up to a point; and we passed that point several years ago. The truth is that as the number of CPU cores rises, the hardware can only perform so much optimisation for software that was written to run on a single core. After that, we have to rewrite the software in order to see a performance increase on the new hardware. So the answer changes from our previous, tentative yes, to an increasingly louder NO.

I do run the risk here of over-stating the scale of the problem, but exaggeration notwithstanding, I wish to highlight a widespread problem that I strongly believe the programming community must raise to a high priority and to which immediate attention should be given.


The one aspect of computing that seems to offer the most resistance to change
is the mindset of the people managing the software developers...


To put it bluntly, if we are to take full advantage of what I like to call the "concurrency-oriented" hardware resources now available to us, we must fundamentally change the way we think about software design. However, the managers who make the major go/no-go development decisions are trained to be risk averse - admittedly, the managers on the left-hand side of the Atlantic are generally less risk averse than those on the right-hand side - but the general tendency is to stick with the familiar because that is what has worked in the past: in fact, the very reason such solutions have become familiar is that they have worked so well in the past.

But the present is not the past, and yesterday's computers are not the same as today's computers, and more to the point, it is becoming increasingly apparent that the software designs that worked so well to solve yesterday's problems, are no longer suitable as templates for solving today's problems.

The relentless march of technological progress is making our comfort zone much smaller and increasingly uncomfortable.

This then leads to the inevitable, but unpleasant realisation that our tried, tested and well liked programming languages are becoming an increasingly large part of our inability to handle concurrency, scalability and fault tolerance.

Yes, we like these languages: yes, they're familiar, and that makes them comfortable: yes, we've developed large libraries of useful stuff to make our development experience easier. But how suitable are these software solutions for running on the concurrency-oriented hardware that now sits inside every rack of cloud servers and even everyone's laptop? How suitable are they for building fault-tolerant, always-on systems that can handle the enormous number of requests from an ever growing number of data-hungry computing devices?

I put it to you that the ways of writing software that have been so successful for the last 30+ years are not going to be able to handle the next wave of problems - because those problems relate to something our hardware has never previously been very good at: concurrency.

As we further develop this train of thought, it leads to the next realisation that in order to survive this enormous shift of problem domain, we must start by acknowledging that our (supposedly) low-risk comfort zone is becoming riskier and therefore less comfortable.

So, the train of thought now moves a little further down the line: this understanding should then help us develop a willingness to adopt the use of:

  • unfamiliar languages that employ
  • unfamiliar constructs, used in
  • unfamiliar ways

There's a recurring theme there that scares many in management: unfamiliarity. But as the old proverb says "time waiteth for no man": so its time to face up to the fact that our future success rests largely on our ability to venture into the unknown territory of what some might derisively call "exotic" programming languages.

The languages about which I'm speaking belong to a school of thought called "Functional Programming". This school of thought stands in contrast to the much more widely known and widely understood school of thought called "Imperative programming". Almost all business software is written using languages that belong to the Imperative camp: languages such as C, C++, Java, ABAP, Visual Basic, Objective C and Go (to name but a few). Languages that belong to the Functional camp are perhaps less well known and include Haskell, Scala, F#, Clojure, Erlang, Elixir, Lisp and R (again, to name but a few).

For a while, I resisted moving out of my comfort zone just as much as you might well be doing now. For me, it was not so much the risk involved in using a new language, but more the mental inertia that had to be overcome in order to learn a completely new way of thinking. However, my motivation to learn grew as I started to see that whilst the march of progress would sooner or later force these changes upon me, there was great value to be gained in starting to learn now, because I would acquire new set of mental tools that would enable me to think about software solutions in a whole new way.

Once I had seen the value of being able to solve a problem in a different way, I saw that there was a good reason to subject my mind to the necessarily large mental rearrangement needed to effect this transition. My mind is still going through this process of rearrangement even now, but the more I progress, the more value I see in continuing.

To illustrate the scale of this transformation, I'll use the analogy of the mind being like a house in which your existing habits of thought are the rooms you find warm and cozy. You like being in the kitchen in the morning, the living room in the afternoon, and have the furniture just where you want it in each of these rooms. Life is comfortable because it is familiar: but lets not confuse "familiar" with "beneficial".

For an old dog like me, learning the new tricks of functional programming thought has been a rather slow and not-always-comfortable process. Much mental furniture has had to be rearranged. In some cases, I have had to take more drastic action - analogous to knocking new doors through walls to connect my mental "rooms" in new ways, or even going so far as to add entire extensions onto my mental "house".

At this point I can hear some of you growling "Fine, you can tell me that these new exotic languages might be useful in certain edge-case scenarios, but stay away from the way I think. I like it, I'm comfortable with it and besides, it makes me look good in front of my boss because (s)he thinks the same way..."

Sorry, but the elephant in the room here is that the ubiquity of cheap, multicore computers has irrevocably changed the way software should be developed. Like it or not, the options are that you either start to embrace these changes as beneficial and voluntarily start to adapt your thought processes, or you attempt to avoid the inevitable and run the serious risk of sliding into obsolescence and redundancy.

Having been a programmer for 30+ years, I have been just as entrenched in this old (and very comfortable) way of thinking as the next developer; therefore, before speaking sternly to anyone else, I have first had to speak sternly to myself.

The Single-Process, Single-Core Developer's Mindset

As I see it, there is a single major mental obstacle that needs to be identified, deconstructed and then re-assembled in a manner better suited to this new computing environment.

As I grew and matured as a programmer, I developed the mental framework of a single-core, single-process programmer. I had assumed that my programs needed to do nothing more than fulfil the requirements of the specification; yet whilst achieving this, I had also unconsciously absorbed a deeper mode of thought - something that was part of the programming culture of the day. And in common with all cultures anywhere else in society, they are based on a small number of axioms. One of the axioms of writing imperative, single-process apps on a single-core processors is this:


A program must - at all costs - avoid crashing.


Given the hardware of the day, this axiom was quite natural and existed for the simple reason that that one process is all you have to work with: if it dies, then everything is dead. Consequently, I learned to write what is known as "defensive code". This is extra coding designed to protect the program against any manner of predictable (and possibly unpredictable) failures that might take it down: and I got pretty good at it because I saw it as a challenge to ensure that no matter what garbage my program had to deal with at runtime, it wouldn't start making rude binary noises at you as it spun out of control and nosed-dived into the ground.

The net result was that I spent a large amount of time and effort adding extra code into my business apps that was not at all related to solving the problem I was supposed to be addressing. Instead, I spent lots of time writing defensive code whose only function was obey the axiomatic belief that my program must never crash. But then, given the hardware I was working with at that time, it was quite normal practice to write code that placed all its "eggs" into a single-process "egg box" that rode along in a single-core "basket". There were no other "baskets" in which to place your "eggs".

However, as technology marched on, I discovered that my mind was becoming reluctant to keep up with the new developments. I liked my comfort zone - hence the adjective "comfort" - but then everything started to change about 5 years ago when I started playing around first with Erlang, and then more recently with Clojure.

That Well-known Typo, Erlnag

With the Erlang programming language, I discovered a small group of highly successful software developers who seemed to take a completely cavalier attitude towards their software crashing - they just didn't (seem to) care!

Their maxim was "let it crash!" At first I thought this madness might be due to the language designers possibly smoking something a lot stronger than tobacco; but as I started to learn their rationale, I discovered it was in fact a very pragmatic and sensible approach to software development.

The reasoning is simple, but not obvious. It is not obvious because the single-process, single-core mindset that I had grown up with blinded me to seeing this possibility on my own. It took someone from outside my own programming culture to point out the limitations of the culture in which I had grown up, and therefore assumed was correct:

  1. The complexity of a normal sized software system is so great that inconsistencies can neither be tested for nor realistically avoided. For example, you only need 6, 32-bit integers to create more possible states than there are atoms in the known universe. I suggest that you don't try to write 2^(6*32) test cases anytime soon.
  2. As long as your mind is not blinded by the axiomatic belief that a program should never crash (as mine was), the first point should then lead naturally to the following simple conclusion: don't waste time and effort writing miles and miles of defensive code.
  3. Hence one of the Erlang maxims is "Let it crash".

Erlang treats software failure as an inevitability and consequently doesn't get upset when it happens. It handles the situation by having a language primitive (spawn) that allows a new process to be created very quickly and cheaply. The task of solving a particular problem is then divided up across often many thousands of light-weight processes arranged in hierarchies known as supervision trees (see The Actor Model). Whenever a child process somewhere in this hierarchy encounters a problem, it makes little or no attempt to solve the problem by itself and simply throws an exception. This error is then trapped by its supervisor process that must then decide whether to 1) restart just the child process that crashed, 2) restart all its child processes, or 3) terminate itself and pass the error up to its supervisor.

This simple approach to error handling means that the only code needed in the vast majority of Erlang processes is that related to actually solving the business problem. It also provides you with a very high degree of fault tolerance.

In Erlang you rarely (if ever) write defensive code. Consequently, Erlang applications are around 40% smaller (than the corresponding functionality written in C), more focussed on the task at hand, and easier to test.

At first though, my brain rebelled at the whole "let it crash" concept - my programs didn't crash because I was good at writing defensive code. But then I realised that my imperative programming skills were not applicable in functional programming land.

And that was only one aspect of Erlang that made my brain hurt: other aspects included things like single assignment of variables, the fact that different processes can only communicate through message passing (the "shares nothing" concept), the fact that all looping is done by recursion, and getting used to structuring all units of code as functions.

Due to Erlang being developed by and for the Telecoms industry, it is best suited for problem domains in which a huge number of asynchronous events must be handled concurrently and with a very high degree of fault tolerance. So in a business environment this includes programs such web servers (see Cowboy and Yaws) or message engines (e.g. RabbitMQ) oh, and I almost forgot... WhatsApp (that regularly handles in excess of 340,000 messages per second)

I would not recommend using Erlang for an actual business processing language; its the wrong tool for that particular job. However, where Erlang is an excellent choice is in situations where you need to assemble various heterogeneous applications together into a concurrent system that offers massive scalability and fault tolerance. Here, Erlang acts as the glue that binds the disparate pieces together into a coherent whole, yet at the same time is flexible enough to give you the scalability you need.

Clojure

Clojure is a member of the Lisp family of languages and in keeping with all members of this family, all code is data, and all data is a list, so your program looks like one great big list. Here's the implementation and execution of the factorial function in Clojure:

user=> (defn factorial [n] (reduce * (range 1 (inc n))))  
#'user/factorial
user=> (factorial 10)  
3628800  

Not only is it compact, but it is also very expressive. Starting from the inner set of brackets, the parameter "n" is incremented by one, then that value is used as the upper bound of a range starting at 1. Finally, each number in the list created by "range" is reduced used the multiply ("*") function. This is then defined as a function called "factorial" that takes a single parameter "n".

In addition to its expressiveness, the inventor of Clojure, Rich Hickey, specifically states that he designed the language to make writing concurrent programs easier.

Just so that we don't completely alienate the world of imperative programming, Clojure programs are designed to run on the Java Virtual Machine. This means that when you compile a Clojure program, you end up with JVM bytecode that can be packaged into a standard JAR file and run just like any other program compiled from Java. This offers great interoperability between Clojure and a huge number of existing Java libraries.

(Alternatively, Clojure programs can also be compiled to JavaScript and run in a browser, but this necessarily excludes the possibility to interact with any existing Java functionality)

Clojure also works on the concept that all data is immutable (don't let that scare you) and that all functions are pure (that means their execution creates no side effects). These might sound like some odd constraints within which to work, but once you learn the way of thinking, they greatly improve your ability to reason about the correctness of your code.

In imperative programming (Java, C, ABAP etc), testing a program does not prove it to be correct; it merely proves that for the given input, it did what you expected. In other words, you can demonstrate that an imperative program is not wrong, but you can't prove that it is right. With functional programming on the other hand, you can state with a much higher degree of certainty, that not only is your code not wrong, but that it is actually right.

It is somewhat ironic to me that although Clojure runs on a JVM, it has a far better way of handling concurrency than Java itself does. The fact that a Java framework such as Hystrix needs to exist is testimony to the fact that concurrency and fault tolerance gets really hard when you have to work within the confines of Java's implementation of object orientation. Whilst the Hystrix framework makes Java based concurrency much easier, I am arguing that such a framework should not need to exist if the programming language had been designed with concurrency as a fundamental feature, rather than something you must build yourself. Hello Erlang...

Working directly with JVM threads from a Java program can be (and often is) a painful process that can leave life-changing scars on your mind.

Clojure on the other hand, still deals directly with JVM threads, but removes the rigid boundaries imposed by Java's use of object orientation (that is, always having to think of your code in terms of units called "objects" that can only do stuff by invoking "methods"). Instead, functions are allowed to exist as free standing units of code aggregated into libraries. This then frees you up to think directly about the concurrent tasks your program is performing with out having to care about some intermediate, and possibly unnecessary, packaging layer.

Summary

We humans are far more the creatures of habit than we might be willing to admit - and this massive shift in computer technology is one scenario that highlights our reluctance to change our way of thinking. We often lull ourselves into a false sense of security by thinking that the relentless march of technology will carry all of us along to some utopian future. However, this passive approach to the future often leads us to think that the source of our problems is external, and unrelated to the way we think.

I hope that this blog has presented a compelling argument that points in exactly the opposite direction. Whilst there may be plenty of external problems out there, in the final analysis, our software design problems are perpetuated by our own resistance to objectively examine our habits of thought. This is where the changes are needed.

Is this a comfortable realisation? Certainly not, but that does not diminish its truth or relevance.

Will we need to move out of our tried and tested comfort zone? Yes, and the sooner the better.

Is there a quick fix for this problem? No! (and why are you even asking that question?) Changing your habits of thought is a slow process that requires prolonged effort over an extended period of time - but then that's how any habit is formed.

Will we get it right first time? I doubt it!

Will there be false trails and casualties along the way? Undoubtedly.

So why should we do this? Your vision of the future critically determines the goals you set and the motivation with which you pursue those goals. This then leads to the understanding that the benefits of change outweigh the discomforts of changing. That's why we should do this.

In terms of the programming languages mentioned here, I have given examples based on Erlang and Clojure because these are two functional languages with which I have experience, and they are also ones that have built-in support for concurrency; there are of course many other functional languages you could try.

The point here is that we all need to start taking concurrent programming seriously because this topic is fast becoming an elephant in the room - and the tools offered by functional programming are far better suited to solving the problems of concurrency than the tools provided by imperative programming.

If we want to achieve the required scalability for a highly-connected, always-online world, then we must embrace this very different way of thinking about software design.

I suggest we should all take a long and careful look at the benefits offered by the functional programming mindset, and see them as a welcome solution to a problem that just will not go away.

Chris W