*** This post was originally posted at http://www.greenfieldtech.net

Here’s a challenging question for the Asterisk technical savvy of you… What is the top performance you can squeeze out of an Asterisk box, running on Amazon EC2 – or to that extent, a cloud infrastructure? If you scout the Internet, you may find various answers – however, most of them aren’t backed up by real numbers or real information,made accessible in a normal readable form.
Recently, we’ve become heavily involved in a project requiring massive usage of cloud based infrastructure. I won’t go into details as to what the project is or what we are doing, however, I felt that some interesting facts about Asterisk 11.0.1 and Cloud infrastructure can be shared with the rest of you.

Before we dig deep into the actual results, let’s talk about the various measurements usually associated with performance assessments of an Asterisk box, mainly, the machines load average. In order to continue, we must first understand what the Linux Load Average actually is. Most of you know load average as the below:

Load Average Example

Most people know the load average as those 3 numbers, ranging from 0 to anything higher, and if the numbers reach a certain level – it’s bad. But the question is: “What is a good number? and what makes a number bad?” First, let’s understand what the number represents. Load average is an exponential average of all your machines processes. Running processes, sleeping processes, waiting processes and on Linux, also processes currently waiting for I/O access. Now, these number are directly correlated to the number of processors/cores your server has. In general terms, a machine with a single core, any number higher than 1 is considered bad – where 1 represents 100% of the resources being consumed. So, if your machine has 4 cores, the number 4 is your top most number – and from there it’s linear. Now, can we calculate HyperThreading into the equation, multiple CPU pipelines, SSD access – in Linux, all these come into play into that equation. In other words, we’ll never know what is the actual top limit, but working with a rule of thumb based upon the number of cores is a good practice – specifically if your operational environment is a virtualized one.

Now, there are 3 numbers in there – a 1 minute average, a 5 minute average and a 15 minute average. Technically speaking, the 1 minute average isn’t really interesting – as it is highly affected by context switches and process bootstrapping, thus, there is a good chande that its number will be higher than the “advised” number. The numbers that are more interesting are the 5 minute and 15 minute average. Technically speaking, if your machine’s load average is considerably higher than the advised at these, something is definitely wrong.