Cdr-chartslossacd[1]

Statisics, Analytics – stop whacking off

Managers! Project Managers, Sales Managers, Marketing Managers, Performance Managers – they are all obsessed! – Why are they obsessed, because we made them that way – it’s our fault! Since the invention of the electronic spreadsheet, managers relied on the same tools for making decisions – charts, tables, graphs – between us, I hate Excel or for that respect, any other “spreadsheet” product. Managers rely on charts to translate the ever complex world we live in, into calculable, simple to understand, dry and boring numbers.
Just to give a rough idea, I have a friend who’s a CEO of a high-tech company in the “social media” sector. He knows how to calculate how much every dollar he spent on Google adwords, is translated back into sales. He is able to tell me exactly how much a new customer cost him and he is very much capable of telling these number just like that. When we last met he asked me: “Say, how do you determine your performance on your network? is there proper and agreed upon metric you use?” – it got me thinking, I’ve been using ASR and ACD for years, but, have we been using it wrong?

So, the question is: What is the proper way of calculating your ASR and ACD? and is MoS a truly reputable measure for assessing your service quality.

Calculating ACD and Why is MoS so biased

ACD stands for Average Call Duration (in most cases), which means that it is the average call duration for answered calls. Normally, an ACD is a factor to determine if the quality of your termination is good – of course, in very much empirical manner only. Normally, if you ask anyone in the industry he will say the following: “If the ACD is over 3.5 minutes, your general quality is good. If your ACD is under 1 minute, your quality is degraded or just shitty. Anything in between, a little hard to say. So, in that respect, MoS comes to the rescue. MoS stands for Mean Opinion Score – in general terms in means, judging from one side of the call, how does that side see the general quality of the line. MoS is presented as a float number, ranging from 0 to 5. Where 0 is the absolute worst quality you can get (to be honest, I’ve never seen anything worse than 3.2) and 5 represents the best quality you can get (again, I’ve never seen anything go above 4.6).

So, this means that if our ACD is anything between 1 minute and 3.5 minutes, we should consult our MoS to see if the quality is ok or not. But here is a tricky question: “Where do you monitor the quality? – the client or the server? the connection into the network? or the connection going out of the network? in other words, too many factors, too many places to check, too much statistical data to analyse – in other words, many graphs, many charts – no real information provided.

If your statistical information isn’t able of providing you with concise information, like: “The ACD in the past 15 minutes to Canada had dropped 15 points and is currently at 1.8 minutes per call – get this sorted!”, then all the graphs you may have are pointless.

Calculating ASR and the Release Cause Forest

While ISDN (Q.931) made the question of understanding your release cause fairly simple, VoIP made the once fairly clear world into a mess. Why is that? Q.931 was very much preset for you at the network layer – SIP makes life easier for the admin to setup his own release causes. For example, I have a friend who says: “I translate all 500 errors from my providers to a 486 error to my customers” – Why would he do that? why in gods name would somebody deliberately make his customers see a falsified view of their termination quality – simple: SLA’s and commitments. If my commitment to a customer would be for a 90% success service level, I would make sure that my release causes to him won’t include 5XX errors that much. A SIP 486 isn’t an error or an issue, the subscriber is simply busy – what can you ask more than that?

As I see it, ASR should be calculated into 3 distinct numbers: SUCCESS, FAILURE and NOS (None Other Specified). NOS is very much similar to the old Q.931 release of “Normal, unspecified” – Release Cause 31. So what goes where exactly?

SUCCESS has only one value in to – ANSWER, or Q.931 Release cause 16 – Normal Call Clearing

FAILURE will include anything in the range of 5XX errors: “Server failure”, “Congestion”, etc.

NOS will include the following: “No Answer”, “Busy (486)”, “Cancel (487)”, “Number not found (404)”, etc

Each one of these should get a proper percentage number. You will be amazed at your results. We’ve implemented such a methodology for several of our customers, who were complaining that all their routes were performing badly. We were amazed to find out that their routes had 40% success, 15% failure and 45% NOS. Are we done? not even close.

The NOS Drill Down

Now, NOS should drilled down – but that analysis should not be part of the general ASR calculation. We should now re-calculate our NOS, according to the following grouping:

“BUSY GROUP” – Will include the number of busy release codes examined

“CANCEL GROUP” – Will include the number of cancelled calls examined

“NOT FOUND” – Will include any situation where the number wasn’t found (short number, ported, wrong dialing code, etc)

“ALL OTHERS” – Anything that doesn’t fall into the above categories

This drill down can rapidly show any of the below scenarios:

  • BUSY GROUP is not proportional – Normally will indicate a large amount of calls to similar destinations on your network. Normally, may indicate one of the following issues:
    • It’s holiday season and many people are on the phone – common
    • You have a large number of call center customers, targeting the same locations – common
    • One of your signalling gateway is being attacked – rare
    • One or more of your termination providers is return the wrong release code – common
  • CANCEL GROUP is not proportional – Normally will indicate a large number of calls are being canceled at the source, either a routed source of a direct source. Normally, may indicate one of the following issues:
    • You have severe latency issues in your network and your PDD (Pre Dial Delay) had increased – rare
    • Your network is under attack, causing a higher PDD – common
    • You have a customer originating the annoying “Missed Call” dialing methodology – common
    • One of your termination providers has False Answer Supervision due to usage of SIM gateways – common when dialing Africa
  • NOT FOUND GROUP is not proportional – Normally will indicate a large number of calls are being rejected by your carriers. Normally, may indicate one of he following issues:
    • One of your call center customers is using a shitty data list to generate calls – common
    • One of your call center customers is trying to phish numbers – common
    • One of your signalling gateways is under attack and you are currently being scanned – common
    • One of your upstream carriers is returning the wrong release code for error 503 – common

So, now the ball is in the hands of the tech teams to investigate the issue and understand the source. The most dangerous issues are the ones where your upstream carrier will change release causes, as these are the most problematic to analyse. If you do find a carrier that does this – just drop them completely, don’t complain, just pay them their dues and walk away. Don’t expect to get your money’s worth out of them, the chances are very slim for that.

 

 

=elastix_HA

Don’t Replicate – Federate

For many years, the question of high availability had always circled the same old subject of replication – how do we replicate data across nodes? how do we replicate the configuration to stay unified across nodes? Is active-active truly better than active-passive? and most importantly, what happens beyond the two node scenario?

Since the inception of the Linux-HA project (and I do believe it’s been around for years now – over 15 years), it has been the pivotal tool for creating Linux based high-availability clusters. Heartbeat, Stonith and Mon will take care of floating the IP numbers and services across – no biggy there, making sure the data is consistent across the board, that’s something completely different. Recently, one of the better known Asterisk Commercial offerings had launched an Asterisk-HA solution – it’s been long due – it’s just a shame it’s a commercial offering without an Open Source derivative, after all, it is Open Source based (I hope).

However, being a high availability solution on one hand, doesn’t mean you are truly a clustered solution – it is an active-passive solution, with a major caveat (at least as I see it), that if your data sync fails for some reason, you end up with a split-brain issue – and your entire solution is now made moot. Don’t get me wrong here, I think that for now, the solution is the next best thing to sliced bread, simply because there is no other solution out there. However, the fact this is the only solution, doesn’t make it the right solution.

What does federating mean in this respect? it means that data doesn’t need to be replicated across the board, it is automatically trickled across the network, making sure all nodes in the network have clear visibility for it. If a node fails inside the cluster, client automatically redirect themselves to a new node, no need for floating IP numbers. Call routing is automatically determined upon request and are never preset for the entire platform. And most importantly, the amount of data traversed between the nodes is as minimal as possible, preventing excessive usage of network resources and I/O.

What would it mean to federate the configuration of a PBX system? first of all, make sure each unit is capable of working on its own, information should be trickled across the nodes via two methodologies: A multicast/broadcast mechanism (for local LAN connected nodes) and a Published/Subscriber relation (for externally connected nodes). When a change is made to any of the systems, that change is then replicated to all the systems. The configuration is never fully transmitted between nodes (apart from a new node joining the cluster). Routing decisions are dynamically made across the network, they are not predetermined or preconfigured. There is no need to keep the cluster nodes in perfect physical alignment, mixing hardware specifications should be considered the norm. External devices should be able to “speak” to the cluster, without being aware of its existence.

Once we achieve all of the above, we’ll truly get to a point where we’ve clustered Asterisk (or another open source project) the right way.

united_federation_of_asterisk

Federating Asterisk – truth or myth?

During this years’ Asterisk Developers’ Conference, one of the subjects I’ve raised an issue for Asterisk is: “Federating Multiple Asterisk Instances”. Now, for the seasoned Asterisk user/developer, the answer would be simple – use Kamailio/OpenSIPS for that scalability, and use Asterisk as a Media Gateway or application server.

But I ask the following: “What if we could federate Asterisk without the need for an external component? What if we could federate Asterisk in such a way where our users aren’t event aware of the federation process, and it’s fully autonomous? What would actually be required in order to do that?”

I’m normally confronted with these questions on a day to day basis, looking at the problem from different angles – thinking to myself: “Ok, I know the normal box here – but where are the outer limits? what can I do to make it more robust on one hand, without truly making a mess out of it.”

A federated database is defined as: “A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging several disparate databases. A federated database, or virtual database, is a composite of all constituent databases in a federated database system. There is no actual data integration in the constituent disparate databases as a result of data federation.” – http://en.wikipedia.org/wiki/Federated_database_system

So, we would like to virtually create a “map-reduce” functionality for Asterisk? can we truly create a map-reduce’ish functionality for Asterisk? should it be internal? should it be external?

In order to accomplish this, we are required to create a federator – a device capable of handling the information regarding each users, device, trunk, provider and other wise SIP/IAX2 entity connected to our system. The federator for all practical purposes is a data store, be it a key-value store, a database, a shared memory environment or some other form of data distribution layer.

Here are some key issues that true federation may be required to tackle:

  1. Geo-Position Agnostic – A truly federated system should render services identically across the board, regardless of where the user is located.
  2. Services Agnostic – A truly federated system doesn’t care if the user is connected to an Asterisk server version 12 or 13, it should behave identically.
  3. Version Agnostic – A truly federated infrastructure can leverage older version and even other software, without changing the underlying federation layer.
  4. Predictable Scalability – A truly federated infrastructure will allow for growth to be planned linearly, with discrete measure methods.

So, you want a tip on how to start federating your systems? here’s step number 1 – there is no central registry, there is no SIP proxy, there is only the cloud and the services it renders. Start thinking from this point and see where you go.

draper

Dinner with Captain Crunch

It is a fairly rare occasion when one gets to meet one’s childhood (or to be more accurate, teen) hero. For me, growing up as a teenage computer geek in Israel, during the late 80’s, early 90’s, the electronic world was a bold new frontier of opportunities and challenges. I distinctively remember the original myths that were spread around the teenage geeks – there is a box, called a “blue-box”, it’s a box of wonders – enabling you to bypass the local PTT systems and call abroad for FREE. It was the early 90’s, long distance phone calls were expensive, beyond expensive – they were outrageous. Calling abroad was even worse, it could easily amount to $2-$3 per minute, doing it the normal way. The “blue-box” for us was a myth, a box of wonders that no one never get around to actually seeing one.

Then, late 1989 something happened, a friend of mine returned from the US with, what I could only call a magazine – back then it was called a zine. I can’t call it a magazine, as it was a group of dot-matrix printed pages, stapled together. My friend said: “This is a hacker’s magazine, but I can’t understand the blue-box thing”. My eyes lit, could it be, did the pages truly include description of what the blue-box was? I looked at it and replied: “Of course you don’t understand this, you are a computer science major – not electronics”. I studies electronics and the blue box made sense to me. The pages included the entire circuit diagram – I was fascinated. I built the my first “blue-box” using those diagrams, it was crude, it wasn’t pretty, but it worked – well, it worked for exactly 15 minutes, then the power regulator I used kind’a fried. That was my beginning in the world of Hacking and Computer security.

Following to reading about/building my first “blue-box”, I continued to consume information. I used the box, each time for short intervals and each time getting to download more information. I remember being connected to the Channel One BBS in the US, downloading the hacker’s chronicle and reading through like mad. I learned about the works of a man nick named: “Captain Crunch”. His work in investigating the various properties of the telephone network amazed me – at that age, for me, he was a modern day Robin Hod. Fighting the system, from within the system – showing how frail it is, and abusing it to the max. I must say something here, unlike the USA at those time, we didn’t have anti-hacker laws in Israel, thus, computer crime was so rare, they didn’t even know what to do with hackers – if they ever managed to catch them.

Fast forward 25 years, I’ll be 40 next month. Over the years I’ve learned that Captain Crunch is the alias of John Draper. I’ve met John first time in 2000, in a hackers’ convention in Israel called Y2Hack. I didn’t get to chat with him much back then, it was a busy event. This years’ Astricon was in Las Vegas, where John currently lives. After learning about John’s medical condition, I’ve decided I would like to pay the man a visit. Normally, you don’t get around to meeting people who had influenced your life in such a deep manner, but here I had a chance. So, Eric and I contacted John – who was more than happy to join us for dinner.

It is clear that John is not at his best, in severe pain from his latest surgery – and most surely medicated for his pain. However, sitting down with him for dinner, one thing is very much clear – when it comes to technology, John is as sharp as ever. The conversation rapidly moved from talking about history, to talking about modern day cellular technologies, how roaming works, phantom base stations, HTML5, WebRTC and more. At times, it would seem that the conversation would float away, but John rapidly closes in on the subject – and being in his physical condition, that isn’t simple (I guess).

John, very much like other visionaries that hadn’t been completely acknowledged by society – sorry to say, is far from what we would imagine him to be at this age. Normally, we imagine that people like John would be living a good life, after all, the computer age was very much built on much of his work and findings. But, the truth is that John’s friends started a qikfunder campaign to fund hi medical bills. Amazingly enough, John isn’t a rich man at all. For someone who was acclaimed as “If it hadn’t been for the blue box, there would have been no apple” (Steve Jobs, 1994) – it is somewhat discomforting to see him like this.

I truly wish John all the best and wish him a speedy recovery – as his mind is as sharp as ever, and I truly hope to see him back at the tech-helm as soon as he can.

astricon-2014-speaking[1]

Astricon, Vegas and Geekness

So, Astricon 2014 is over and behind us, so now I’m now sitting at the Holiday Inn in Chicago. I have to admit that moving from the RedRock resort and Casino to the Holiday Inn in Chicago – talk about a mind blowing change. Just to give a general idea, the bath room in Vegas was roughly the size of the entire room here (mental note to self – next time order something better via BA miles).

So, this years’ Astricon was, at least for me personally, one of the best I’ve been to. Various topics that I’ve started talking about years ago, had finally made their way to the public’s ear, and the community and adopters are finally picking up on these. Security, privacy, cloud computing, proper usage of Linux and virtualization – these are now become the predominant subject people are confronted with.

Unlike previous years, I decided to talk about Cloud computing and some tips from the Cloud front line. Cloud computing, specifically cloud based servers are and infrastructure that many want to use – but very few truly understand what it means. What kind of impact does SWAP have over your instances, what is the swapiness value? and why the hell would I choose one cloud over another – aren’t they all the same at the end?

This year, we had the first ever Astrion Hackathon. I’ve participated in several Hackathons in the past, but this was very special to me. While in most Hackathons I’ve participated the participants never knew each other (well, at least 95% of them), here, most participants knew each other – some on a very personal basis. As you know, my latest Open Source passion is my own pet project – phpari. My hack for the contest was a phpari sandbox, imagine it to be a cross between jsfiddle, Asterisk and PHP. A simple use playground, where you can try various parts of ARI in general and the toolkit in particular. Much to my surprise (as there were other strong candidates), the phpari sandbox won the “Asterisk Developer’s Team” Award, for best use of Asterisk during the Hackathon. To me personally, it means a whole lot. I’ve been dealing and working with Asterisk for over 12 years now, in fact, I was joking around with Corey Mc’fadden that we are currently, probably the oldest Asterisk community members around – well, probably oej, joshc and a few others are as old as us. We never had a chance to actually see how we work together, how we think about various problems and challenges. This was the first ever time we’ve got to see each other work, how we work, how we look at things – it was exciting. Looking at Tim Panton as he battles the various concepts of Respoke and he’s application – trying to figure out exactly why “Respoke” didn’t work as he expected (amusing to say the least).

So, after Astricon, we spent the last evening going out to the Vegas Strip. I have one thing to say right now: “I don’t think I like Vegas all that much”. It’s just too much of everything. Too much “Putti’n on the Ritz” facade, too much commercialism of everything and anything, just too much for me. Don’t get me wrong, it’s an interesting place to visit, but I don’t believe that being there more than 2-3 days is required in order to appreciate the place. Be it the lights that are always bright, making you believe it is day light, the hotel that literally had no windows to the outside – so you won’t know if it’s day or night, the entire system gets screwed up totally.

So, during the night of the “geeks take over Vegas”, the following group of people decided to head to the strip:

  • Allison Smith
  • Peter – Aka: Mr Allison (hey, what do you want, you’re married to the voice of Asterisk)
  • Ben Klang (Adhearsion/Mojo-Lingo)
  • Evan (sorry, can’t recall the rest)
  • Steve (Mojo-Lingo)
  • Dan Jenkins (Respoke)
  • Eric Klein (My partner in crime)
  • Correy McFadden (Venoto)
  • Beth – Correy’s Wife
  • Steve (From South Africa)

So, here we are sitting at the cosmopolitan waiting for our table to the STK. Once we got it (at 10:45PM), we sat down at the stools waiting for our table. At the table next to us, a man and two young ladies were definitely getting it on. To be more descriptive, apart from actually going at it in front of us all, they were all over the place. As they say, what happens in Vegas – stays in Vegas. But what happens at a public restaurant, don’t be surprised to find it on Twitter. Coming to think about, we should have videoed the entire thing. Now, don’t get me wrong, I’m as much a man as the other guy, and I admit that the display was interesting (so say the least) – but comm’on, we’re a public place – get a bloody room. The funny bit was that Peter came back from the rest rooms, saying that he was delayed due it being occupied. When the door opened, two girls walked out of the same compartment – and I’ll let your imagination continue from here. So, as Eric commented on Trip Avisor – the music was loud, the service was slow – but the Steak WAS PERFECT. In deed, one of the finest steaks I’ve had in a long time.

One more thing I need to mention in our dinner (Eric and Myself) with John Draper – aka: Capation Crunch, but that’s a whole different story all together.