Every field of human activity has its unique characteristics, and
programming is no exception.
One of the unique aspects of software is how it spans such a large
number of orders of magnitude. A software engineer may be slicing and
dicing nanoseconds, or they may be trying to accelerate a computation
that will run across thousands of cores for a month… and they may
even be doing both at the same time!
A single core in a nanosecond may cover 4 cycles. A thousand cores in
a month covers about 2,600,000,000,000,000,000 cycles. Rounding a
touch, that’s a range of about 19 orders of magnitude. A large
supercomputer cluster, or if you choose to count GPU cores
differently, may stretch even another couple of orders of magnitude.
This is not an every day experience for most programmers, but even an
8-core 4GHz system covers 32,000,000,000 cycles in a second. Again
rounding a bit that’s 10 orders of magnitude between “my code runs in
a couple of cycles” and “my code takes all my CPU resources for a
second”.
I can not think of very many other disciplines that not only span that
number of orders of magnitude, but are doing engineering across the
entire range. Cosmology may care about quantum mechanics in order to
try to determine the behavior of things like neutron stars, but
there’s a vast swathe in the middle they don’t cover. Other ideas may
leap to your mind, but even 10 orders of magnitude turns out to be
really quite a bit! Thinking about things of one size in one moment,
then something a billion times bigger or smaller, and caring about
both of them and potentially also a range of magnitudes in between, is
not common.
And as such, our human brains are not very good at dealing with
this. We do not have English terminology that can account for systems
that span this range. Is something that takes 500 cycles “fast?”
50,000? 5 million? 50 million? To a human, all but perhaps that last
one are equally “instant”. Then again, try to do them a billion times
each and the differences become quite marked.
“Fast” and “slow” are often not very useful words in software
engineering because of this broad range of orders of
magnitude. Imagine trying to draw the line between “fast” and “slow”
in some sort of general sense across 19 orders of magnitude, in a
world that generally experiences performance in a very linear
manner. You can’t.
The proximal reason for this post is the general idea that “In Go,
using cgo is ‘slow’ and therefore should be avoided.” You might expect
me to now go into defending Go against the “accusation”, but the
issue I’m addressing here isn’t about its performance. Its performance
is what it is and no amount of jawboning from me will change anything.
The question I’m talking about here is, is “slow”, as a word standing
by itself with no further characterization, even an applicable
concept?
Web Frameworks and HTTP Servers
Another example that I see a lot is developers coming along and
analyzing which web framework they should use. So many developers only
seem to see those “requests per second” numbers and analyze their
choices solely in terms of what is fastest.
However, let me observe that frameworks that can, say, handle 10,000
requests per
second with
a slightly non-trivial task enough to establish that the framework is
doing something, on reasonably available hardware, are a commodity
now. You have to go to the 493rd slot to get down that far. Before
worrying about whether you need the framework that can handle ten
thousand requests per second or a million requests per second, you
need to ask whether the code you’re going to run in each request is
itself capable of handling more than 10,000 requests per second.
The odds are that it’s not. Assuming perfect parallelism for
simplicity, 10,000 requests per second is one-tenth of a millisecond
times the number of cores in your server that each request has to play
with. Call it a maximum of 5 milliseconds to account for some
overhead. It is completely normal for web requests to need more than 5
milliseconds to run. If you’re in a still-normal range of needing 50
milliseconds to run, even these very slow frameworks are not going to
be your problem. Even if you had a framework where the request
overhead was a flat zero, you’d still not be able to process more
than 20 requests per second per core at 50ms per request.
I’m counting full CPU utilization. You may object that you’re not
doing that, in which case by all means, insert your own numbers as
appropriate. These numbers are only intended as examples to warm you
up. On the other hand, you’d be surprised how quickly things
can stack up until you really are doing 50ms of full-CPU work in a web
handler.
Most people, most of the time, doing most web work, are so thoroughly
outclassed on speed by their web framework and server that the speed
of their choice is irrelevant. Which means they should be selecting
based on all the other relevant features. The very fastest choices are
often the very fastest choices precisely by virtue of optimizing on
that and leaving out all the other ways a framework can make your
life easier.
Of course, it is worth taking a moment at the beginning of a web
project and thinking this through… do you, in fact, have a system
where you need to answer hundreds of thousands of queries per second,
continuously, and you can in fact write a handler that is fast enough
to keep up with that?
Then by all means, take that into account in selecting your stack!
I think it is more common for developers to obsess over irrelevant
performance details and losing more time programming in a suboptimal
environment for what they need, but it is more consequential when
developers make the opposite mistake and choose something that they
should have known from the very beginning would not be able to meet
their performance needs. Generally this is only conclusively
demonstrated after vast work has been poured into the inadequate
solution and solving the problem is very difficult. Both mistakes can
be very expensive (don’t forget the opportunity cost of all the time
lost when working with an excessively “performance” focused platform
when the convenience would have saved a lot of developer-hours and
calendar time), but it’s the second one that involves ramming into a
wall very, very late into the development cycle, often only revealing
itself after multiple full releases.
Database Implementation Languages
My personal favorite example of this is probably people implementing
databases, especially commercial databases. Databases compete in a
space where every nanosecond counts, because every nanosecond is going
to be repeated trillions and quadrillions of times. If you want to
create a commercially-viable database, you need to be thinking from
the moment you select your implementation language about how you are
going to optimize your code.
But for some reason it has been somewhat popular of late to pick Go as
the implementation language. I consider this a poor choice. As I like
to say, Go is generally the slowest member of the fastest class of
languages, the static compiled ahead-of-time languages. Considered on
the entire landscape of langauges, Go is pretty fast. Considered in
that set of langauges that let you count nanoseconds and exert
super-deep and detailed control of your code, it has some notable
weaknesses. These weaknesses are often overstated in other contexts,
but in this context, every last one of them matters.
Sometimes certain language communities that see themselves in
competition with Go, most notably Rust, see stories of database
vendors or other super-high-performance people switching away from Go
as a sign that Go can’t handle the very top end of tasks… and there
is some truth to that view. However, what I also see in the vast
majority of those situations is a team that should have known better
from the beginning and should never have started out in Go.
But there’s a lot of programming tasks in the world where no one is
going to spend thousands of hours optimizing every loop in the
system. In that case, we end up falling back on the “Go is pretty
fast” situation.
Is Go “fast”? Is Go “slow”? Both English words obscure facts of the
situation that an engineer selecting a language must be paying
attention to, if they are going to make an informed decision.
Premature Optimization yada yada yada
Everyone knows that “premature optimization is the root of all evil”
(and the quote goes on but for my point today that wouldn’t change
anything) but the conversation about what is or is not premature
optimization is a complicated one. For myself, I find it very
effective on the cost/benefits analysis just to keep in mind the rough
order of magnitude size of the operations I’m performing.
If I want to add lots of numbers together, which is roughly at the
smallest end of our magnitude span, and I am getting those numbers via
an HTTP REST request to a resource on the other side of the world,
then I know that in terms of the final performance of the system, the
addition is completely lost in the noise. I can essentially neglect
the addition portion and I focus on making sure the HTTP API request
has enough performance to meet my needs, and I know that things like
“doing the addition two or three times” is inconsequential next to
“needing to make a dozen requests rather than one”.
I know that parsing numbers, which is a couple of orders of magnitude
slower than simply adding them but still much, much faster than my
HTTP API request, is not relevant, so the costs of parsing are almost
certainly also completely neglectable. By contrast, if I want to move
lots of numbers around the world, considering how they are encoded
on the wire may be very important… if I know I have only a
relatively low-bandwidth link I may want to look into compression,
even some very CPU-intensive compression, rather than spending any
time optimizing the parsing routines.
On the other hand, if I’m interacting with an HTTP API that is in the
same rack, now the speed of the API call is such that maybe I do need
to think about serialization costs. Here we find the many systems
where something like JSON serialization speed does become important
(and maybe JSON is not the correct choice for such systems).
If I’m doing something more complicated than “adding a bunch of
numbers” together, that can also change the balance. If I’m “sticking
them in an associative array” of some sort, well, that’s slower than
addition, but it’s pretty fast for small structures. Then again, if I
know I’m going to have many millions of such things, I know I’m going
to be working at RAM speeds instead of CPU cache speeds, and whether
or not that’s a problem depends on what else I’m doing.
A characteristic of these systems spanning so many orders of magnitude
is that it is very frequently the case that one of the things your
system will be doing is in fact head-and-shoulders completely above
everything else your system should be doing, and if you have a good
sense of your rough orders of magnitudes from experience, it should be
generally obvious to you where you need to focus at least a bit of
thought about optimization, and where you can neglect it until it
becomes an actual problem.
A further consequence of that fact is that in our world, sometimes
little things don’t add up. When you live in a world where “things”
tend to span, say, two or maybe three orders of magnitude at once, the
“little things add up”. It doesn’t take too many seconds to be
repeatedly added to something that takes a couple of minutes before
those extra seconds are a significant proportion of the job. By
contrast, if a process is going to take a few minutes, a few
nanoseconds here and a few nanoseconds there may in fact not add up
to anything significant.
The minimum noticeable amount of nanoseconds
is in the tens of millions, and to notice them slowing down a
multi-minute process the threshold is a few orders of magnitude higher
than that. It is very easy to be looking at some code and see
something that will over the course of the entire computation waste
many thousands of nanoseconds… but that’s not relevant. It does not,
in fact, “add up” to anything. What is wisdom in one context can be
folly in another.
A lot of “premature optimization” in the real world is obsessing
over nanoseconds while neglecting the milliseconds you could have been
saving with the same effort.
Many times over the years, I’ve been in a meeting where people from
various teams are having some discussion with each other. They don’t
ever define their terms. They have an extensive discussion and come to
some conclusion that everyone agrees with.
Two weeks later we all conclude our work and try to put the pieces
together, and it turns out they don’t fit together.
Everyone thought they agreed with each other, but it turns out
everyone was using different definitions for terms. If they had more
carefully explained what they were saying to each other, they would
have discovered they were not in sync after all.
“Fast” and “slow” are great examples of the sorts of terms that lead
to this result. Everyone will agree that the code needs to be “fast”
and “not slow”. But one team may mean that they can make ten million
queries per second and the other team may be thinking about being able
to complete a particular query in under ten milliseconds. When the
second team proudly presents their “fast” API to the first, which does
indeed complete the queries quickly but can’t be scaled beyond ten
thousand queries per second without an architectural overhaul,
now you’ve got the sort of problems that pushes projects back months
or gets them cancelled.
I would suggest to the reader that you try to abstain from describing
things as “fast” or “slow” in the programming world, strike the terms
from your vocabulary, and always try to be more specific about how
the thing in question is fast or slow, against what metric, against
what competition, on what order of magnitude of time the thing
operates on. The bare terms are often not only useless, but because of
their ability to convince a bunch of people who don’t agree that they
do, often of negative value.