As the world around changes, services are rapidly changing from human rendered services, to bots and applications that run on your mobile device. Ranging from your local pizza shop, to a multi-billion corporation – all are rapidly moving to the bot/application economy paradigm – in order to facilitate growth and lower their TCO.

According to SkyHigh Networks study, the following may come as a shock to most – but most  enterprises will use up to 900 different cloud applications. These require an amazing number  of over 1,500 different cloud services in order to work. Out of these 1,500 cloud services, a group of 50 top-most cloud services can be observed, normally relating directly to infrastructure – we’ll call these “Super Clouds”.

The “Super Clouds” can be divided into several “Primary” groups:

– Infrastructure Clouds (Amazon AWS, Google Compute, Microsoft Azure, etc.)
– Customer Relation Clouds (Salesforce, ZenDesk, etc.)
– Real Time Communication Clouds (Twilio, Nexmo, Tropo, etc.)

It is very common for a company to work solely with various cloud services – in order to provide a service. However, using cloud services has a tipping point, which is: “When is a cloud service no longer commercially viable for my service?” – or in other words: “When do I become Uber for  Twilio?”

Twilio’s stock recently dropped significantly, following Uber’s announcement – http://bit.ly/2rVbzxG. Judging from the PR, Uber was paying Twilio over $12M a year for their services, which means that for same cash, Uber could actually buyout a telecom company to do the same service. And apparently, this is exactly what’s going to happen, as Uber works to establish the same level of service with internal resources.

Now, the question that comes to mind is the following: “What is my tipping point?” – and while most will not agree with my writing (specifically if they are working for an RTC Cloud service), every, and I do mean EVERY type of service has a tipping point. To figure out an estimate your tipping point, try following the below rules to provide an “educated guess” of your tipping point – before getting there.

Rules of Thumb

  • Your infrastructure cloud is the least of your worries
    As storage, CPU, networking and bandwidth costs drop world-wide – so does your infrastructure costs. IaaS and PaaS providers are constantly updating prices and are in constant competition. In addition, when you commit to certain sizing, they can be negotiated with. I have several colleagues working at the 3 main competitors – they are in such competition, where they are willing to pay the migration prices and render services for up to 12 or 24 months for free, in order to get new business.
  • Customer Relation Clouds hold your most critical data
    As your service/product is consumer oriented, your customers are your most important asset. Take great care at choosing your partner and make sure you don’t outgrow them. In addition, make sure that if you use one, you truly need their service. Sometimes, a simple VTiger or other self hosted CRM will be enough. In other words, Salesforce isn’t always the answer.
  • Understand your business
    If your business is selling rides (Uber, Lyft, Via, etc), tools like Twilio are a pure expense. If your business is building premium rate services or providing custom IVR services, Twilio is part of your pricing model. Understand how each and every cloud provider affects your business, your bottom line and most importantly, its affect on the consumer.

Normally, most companies in the RTC space will start using Amazon AWS as their IaaS and services such as Twilio, Plivo, Tropo and others as their CPaas. Now, let us examine a hypothetical service use case:

– Step 1: User uses an application to dial into an IVR
– Step 2: IVR uses speech recognition to analyze the caller intent
– Step 3: IVR forwards the call to a PSTN line and records the call for future transcription

Let us assume that we utilize Twilio to store the recordings, Google Speech API for transcription, Twilio for the IVR application and we’re forwarding to a phone number in the US. Now, let’s assume that the average call duration is 5 minutes. Thus, we can extrapolate the following:

– Cost of transcription using Google Speech API: $0.06 USD
– Cost of call termination: $0.065 USD
– Cost of call recording: $0.0125 USD
– Cost of IVR handling at Twilio: $0.06 USD

So, where is the tipping point for this use case? Let’s try and separate into 2 distinct business cases: a chargeable service (a transcription service) and a free service (eg. Uber Driver Connection).

  • A Chargeable Service
    Assumption: we charge a flat $0.25 USD per minute
    Let’s calculate our monthly revenue and expense according to the number of users and minutes served.

– Up-to 1,000 users – generating 50,000 monthly minutes: $12,500 – $9,625 = $2,875
– Up-to 10,000 users – generating 500,000 monthly minutes: $125,000 – $96,250 = $28,750
– Up-to 50,000 users – generating 2,500,000 monthly minutes: $625,000 – $481,250 = $143,750

Honestly, not a bad model for a medium size business. But the minute you take in the multitude of marketing costs, office costs, operational costs, etc – you need around 500,000 users in order to truly make your business profitable. Yes, I can negotiate some volume discounts with Twilio and the Google, but still, even after that, my overall discount will be 20%? maybe 30% – so the math will look like this:

– Up-to 1,000 users – generating 50,000 monthly minutes: $12,500 – $9,625 = $2,875
– Up-to 10,000 users – generating 500,000 monthly minutes with a 30% discount: $125,000 – $48,475 = $57,625
– Up-to 50,000 users – generating 2,500,000 monthly minutes with a 30% discount: $625,000 – $336,875 = $288,125

But, just to be honest with ourselves, even at a monthly cost of $48,475 USD, I can actually build my own platform to do the same thing. In this case, the 500,000 minutes mark is very much a tipping point.

  • A Free Service
    Assumption: we charge a flat $0.00 USD per minute
    Let’s calculate our monthly revenue and expense according to the number of users and minutes served.

– Up-to 1,000 users – generating 50,000 monthly minutes: $9,625
– Up-to 10,000 users – generating 500,000 monthly minutes with a 30% discount: $48,475
– Up-to 50,000 users – generating 2,500,000 monthly minutes with a 30% discount: $336,875

In this case, there is just no case in building this service using Twilio or a similar service, because it will be too darn expensive from the start. Twilio will provide a wonderful test bed and PoV environment, but when push comes to shove – it will just not hold up the financial aspects.This is a major part why services such as Uber, Lyft, Gett and others will eventually leave Twilio type services, simply due to the fact that at some point, the service they are consuming becomes too expensive – and they must take the service back home to make sure they are competitive and profitable.

When Greenfield started working on Cloudonix – we understood from the start the above growth issue, and that’s why Cloudonix isn’t priced or serviced in such a way. In addition, as Cloudonix includes the ability to obtain your own slice of Cloudonix or even your own on premise installation – your investment is always safe.

To learn more about our Cloudonix CPaaS and our On-premise offering, click here.

I love the feeling of unboxing a brand new IP phone, specifically, when it’s one that comes from Digium. Yes, I’m a little biased, I admit it – but I’ll do my best to refrain from dancing in the rain with this post.

So, during ITExpo 2017 (Ft. Lauderdale, Florida), Digium unveiled their new D65 Color Screen IP phone. Malcolm Davenport and the good people at Digium were inclined to send me a couple of phones for testing, which I was fairly happy to do – specifically due to the addition of the Opus Codec to the hardware.

If you are not familiar with Opus – you had most probably been living under a rock for the past 3-4 years. Opus is the codec that makes tools like Skype, Hangouts and others work so well. Unlike the traditional g7xx codecs, Opus is a variable bit rate codec, provides HD voice capabilities, has superior network conditions handling (via FEC) and in all – is a far better codec for any VoIP platform. You’re probably asking what is FEC? I’ll explain later.

Consistency and simplicity are a must – and Digium phones are both. One of the things I really like about Digium phones is that they are simple to configure, even without DPMA. The screens are identical to the previous models and are so tight together, that getting a phone up and running takes no longer than a few seconds.

Minor disappointment – the phones were shipped with a firmware that didn’t include the Opus codec – so I had to upgrade the firmware. Ok, no big deal there – but a minor nuisance.

So, I proceeded to get the phone configured to work with our Cloudonix.io platform. What is cloudonix.io? Cloudonix is our home-grown Real Time Communications Cloud platform – but that’s a different post altogether. This nice thing about Cloudonix is that it utilizes Opus to its full extent. Ranging from dynamic Jitter Buffering, Forward Error Correction across the entire media stack, Variable bit rate and sample rate support (via the Cloudonix.io mobile SDK) – in other words, if the Digium phones performs as good as the Cloudonix.io mobile SDK – we have a solid winner here.

So, I hooked the phone up and then proceeded to do some basic condition testing with Opus. All tests were conducted in the following manner:

  • Step 1: Connectivity with no network quality affects
  • Step 2: Introduction of 5% packet loss (using `neteq`)
  • Step 3: Introduction of 10% packet loss (using `neteq`)
  • Step 4: Introduction of 15% packet loss (using `neteq`)
  • Step 5: Introduction of 20% packet loss (using `neteq`)
  • Step 6: Introduction of 25% packet loss (using `neteq`)
  • Step 7: Extreme condition of 40% packet loss (using `neteq`)

Test 1: Media Relay and server located under 150mSec away

  • Step 1: Audio was perfect, HD Voice was exhibited all the way
  • Step 2: Audio was perfect, HD Voice was exhibited all the way
  • Step 3: Audio was good, HD Voice was exhibited all the way, minor network reconditioning at the beginning, till FEC kicks fully in
  • Step 4: Audio was good, SD Voice was exhibited all the way, minor network reconditioning at the beginning, till FEC kicks fully in
  • Step 5: Audio was fair, SD Voice was exhibited all the way, moderate network reconditioning at the beginning, till FEC kicks fully in
  • Step 6: Audio was fair, SD Voice was exhibited all the way, major network reconditioning at the beginning, till FEC kicks fully in
  • Step 7: Audio was fair, SD Voice was exhibited all the way, extreme network reconditioning at the beginning, till FEC kicks fully in

Test 2: Media Relay and server located under 250mSec away

  • Step 1: Audio was perfect, HD Voice was exhibited all the way
  • Step 2: Audio was perfect, HD Voice was exhibited all the way
  • Step 3: Audio was good, SD Voice was exhibited all the way, minor network reconditioning at the beginning, till FEC kicks fully in
  • Step 4: Audio was good, SD Voice was exhibited all the way, moderate network reconditioning at the beginning, till FEC kicks fully in
  • Step 5: Audio was fair, SD Voice was exhibited all the way, major network reconditioning at the beginning, till FEC kicks fully in
  • Step 6: Audio was fair, SD Voice was exhibited all the way, major network reconditioning at the beginning, till FEC kicks fully in
  • Step 7: Audio was fair, SD Voice was exhibited all the way, extreme network reconditioning at the beginning, till FEC kicks fully in

Test 3: Media Relay and server located under 450mSec away

  • Step 1: Audio was perfect, SD Voice was exhibited all the way
  • Step 2: Audio was perfect, SD Voice was exhibited all the way
  • Step 3: Audio was good, SD Voice was exhibited all the way, minor network reconditioning at the beginning, till FEC kicks fully in
  • Step 4: Audio was good, SD Voice was exhibited all the way, major network reconditioning at the beginning, till FEC kicks fully in
  • Step 5: Audio was fair, SD Voice was exhibited all the way, major network reconditioning at the beginning, till FEC kicks fully in
  • Step 6: Audio was fair, SD Voice was exhibited all the way, extreme network reconditioning at the beginning, till FEC kicks fully in
  • Step 7: Audio was fair, SD Voice was exhibited all the way, extreme network reconditioning at the beginning, till FEC kicks fully in

Ok, I was willing to accept the fact that if I’m able to carry a good audio call, for almost 3-4 minutes, while `neteq` was introducing a static 20% packet-loss condition – sounds like a winner to me.

All in all, till I get my hands on the Digium D80 for testing it’s Opus capabilities, the D65 is by far my “Go To Market” IP Phone for desktop Opus support – 2 thumbs up!

Following yesterday’s post, I’ve decided to take another set of data – this time following the start of the year, with a specific data profile. What is the profile? I will describe:

  1. The honeypot server in this case was a publically accessible Kamailio server
  2. The honeypot changed its location and IP every 48 hours, over a period of 2 weeks
  3. The honeypot was always located in the same Amazon AWS region – in this case N.California
  4. All calls were replied to with a 200 OK, followed by a playback from an Asterisk server

In this specific case, I wasn’t really interested in the attempted numbers, I was more interested to figure out where attacks are coming from. The results were fairly surprising:

The above table shows a list of attacking IP numbers, the number of attempts from each IP number – and the origin country. For some weird reason, 97% of potential attacks originated in Western Europe. In past years, most of the attempts were located in Eastern European countries and the Far-East, but now this is Mainland Europe (Germany, France, Great Britain).

Can we extrapolate from it a viable security recommendation? absolutely not, it doesn’t mean anything specific – but it could mean one of the following:

  1. The number of hijacked PBX systems in mainland Europe is growing?
  2. The number of hijacked Generic services in mainland Europe is growing?
  3. European VoIP PBX integrators are doing a lousy job at securing their PBX systems?
  4. European VPS providers pay less attention to security matters?

If you pay attention to the attempts originating in France, you would notice a highly similar IP range – down right to the final Class-C network, that is no coincidence, that is negligence.

Now, let’s dig deeper into France and see where they are attempting to dial:

So, on the face of it, these guys are trying to call the US. I wonder what are these numbers for?

Ok, that’s verizon… let’s dig deeper…

Global Crossing? that is interesting… What else is in there???

 

So, all these attempts go to Landlines – which means, these attempts are being dialed most probably into another hijacked system – in order to validate success of finding a newly hijacked system.

Well, if you can give me a different explanation – I’m all open for it. Also, if any of the above carriers are reading this, I suggest you investigate these numbers.

 

 

Who would believe, in the age of Skype, Whatsapp and Facebook – telephony fraud, one of the most lucrative and cleanest form of theft – is still going strong. Applications of the social nature are believed to be harming the world wide carrier market – and carrier are surely complaining to regulators – and for a legitimate reason. But having said that, looking at some alarming fraud attempt statistics, thing will show you a fairly different story.

So, analysing fraud is one of my things, I enjoy dropping honeypots around the world, let them live for a few days and then collect my data. My rig is fairly simplistic:

  1. A have a Homer (www.sipcapture.org) server to capture all my traffic
  2. A have an amazon AWS cloudformation script that launches up instances of Asterisk, FreeSwitch and Kamailio
  3. All instances are pre-configured to report anything back to Homer
  4. Upon receiving a call – it will be rejected with a 403

Why is this a good honeypot scheme? simple – it gives the remote bot a response from the server, making it keep on hitting it with different combinations. In order to make the analysis juicy, I’ve decided to concentrate on the time period between 24.12.2016 till 25.12.2016 – in other words, Christmas.

I have to admit, the results were fairly surprising:

  1. A total of 2000 attacks were registered on the honeypot server
  2. The 2 dominant fraud destinations were: The palestinian authority and the UK
  3. All attacks originated from only 5 distinct IP numbers

Are you wondering what the actual numbers are? Here is the summary:

Row Labels 185.40.4.101 185.62.38.222 195.154.181.149 209.133.210.122 35.166.87.209 Grand Total
441224928354 19         19
441873770007       204   204
76264259990     1     1
17786514103         2 2
972592315527   1774       1774
Grand Total 19 1774 1 204 2 2000

As you can see, the number 972592315527 was dailed 1774 from a single IP – 185.62.38.222. I can only assume this is a botnet of some sort, but the mix of IP numbers intrigued me. So, a fast analysis revealed the following:

Amsterdam? I wonder if it’s a coffee shop or something. The thing that also intrigued me was the phone number, why would the bot continue hitting the same mobile phone number? I couldn’t find any documentation of this number anywhere. Also, the 97259 prefix automatically suggests a mobile number in the PA, so my only conclusion would be that this is a bot looking for a “IPRN” loop hole – which is again fraudulent.

So, if this what happens in 48 hours – you can imagine what happens over a month or a year.

DISCLAIMER:

The above post contains only partial information, from a specific server on a network of worldwide deployed honeypots. The information provided as-is and you may extrapolate or hypothesize what it means – as you see fit. I have only raised some points of discussion and interest.

Should you wish to join the lively discussion on HackerNews, please follow this link: https://news.ycombinator.com/item?id=13354693 for further discussion.

 

 

 

Managers! Project Managers, Sales Managers, Marketing Managers, Performance Managers – they are all obsessed! – Why are they obsessed, because we made them that way – it’s our fault! Since the invention of the electronic spreadsheet, managers relied on the same tools for making decisions – charts, tables, graphs – between us, I hate Excel or for that respect, any other “spreadsheet” product. Managers rely on charts to translate the ever complex world we live in, into calculable, simple to understand, dry and boring numbers.
Just to give a rough idea, I have a friend who’s a CEO of a high-tech company in the “social media” sector. He knows how to calculate how much every dollar he spent on Google adwords, is translated back into sales. He is able to tell me exactly how much a new customer cost him and he is very much capable of telling these number just like that. When we last met he asked me: “Say, how do you determine your performance on your network? is there proper and agreed upon metric you use?” – it got me thinking, I’ve been using ASR and ACD for years, but, have we been using it wrong?

So, the question is: What is the proper way of calculating your ASR and ACD? and is MoS a truly reputable measure for assessing your service quality.

Calculating ACD and Why is MoS so biased

ACD stands for Average Call Duration (in most cases), which means that it is the average call duration for answered calls. Normally, an ACD is a factor to determine if the quality of your termination is good – of course, in very much empirical manner only. Normally, if you ask anyone in the industry he will say the following: “If the ACD is over 3.5 minutes, your general quality is good. If your ACD is under 1 minute, your quality is degraded or just shitty. Anything in between, a little hard to say. So, in that respect, MoS comes to the rescue. MoS stands for Mean Opinion Score – in general terms in means, judging from one side of the call, how does that side see the general quality of the line. MoS is presented as a float number, ranging from 0 to 5. Where 0 is the absolute worst quality you can get (to be honest, I’ve never seen anything worse than 3.2) and 5 represents the best quality you can get (again, I’ve never seen anything go above 4.6).

So, this means that if our ACD is anything between 1 minute and 3.5 minutes, we should consult our MoS to see if the quality is ok or not. But here is a tricky question: “Where do you monitor the quality? – the client or the server? the connection into the network? or the connection going out of the network? in other words, too many factors, too many places to check, too much statistical data to analyse – in other words, many graphs, many charts – no real information provided.

If your statistical information isn’t able of providing you with concise information, like: “The ACD in the past 15 minutes to Canada had dropped 15 points and is currently at 1.8 minutes per call – get this sorted!”, then all the graphs you may have are pointless.

Calculating ASR and the Release Cause Forest

While ISDN (Q.931) made the question of understanding your release cause fairly simple, VoIP made the once fairly clear world into a mess. Why is that? Q.931 was very much preset for you at the network layer – SIP makes life easier for the admin to setup his own release causes. For example, I have a friend who says: “I translate all 500 errors from my providers to a 486 error to my customers” – Why would he do that? why in gods name would somebody deliberately make his customers see a falsified view of their termination quality – simple: SLA’s and commitments. If my commitment to a customer would be for a 90% success service level, I would make sure that my release causes to him won’t include 5XX errors that much. A SIP 486 isn’t an error or an issue, the subscriber is simply busy – what can you ask more than that?

As I see it, ASR should be calculated into 3 distinct numbers: SUCCESS, FAILURE and NOS (None Other Specified). NOS is very much similar to the old Q.931 release of “Normal, unspecified” – Release Cause 31. So what goes where exactly?

SUCCESS has only one value in to – ANSWER, or Q.931 Release cause 16 – Normal Call Clearing

FAILURE will include anything in the range of 5XX errors: “Server failure”, “Congestion”, etc.

NOS will include the following: “No Answer”, “Busy (486)”, “Cancel (487)”, “Number not found (404)”, etc

Each one of these should get a proper percentage number. You will be amazed at your results. We’ve implemented such a methodology for several of our customers, who were complaining that all their routes were performing badly. We were amazed to find out that their routes had 40% success, 15% failure and 45% NOS. Are we done? not even close.

The NOS Drill Down

Now, NOS should drilled down – but that analysis should not be part of the general ASR calculation. We should now re-calculate our NOS, according to the following grouping:

“BUSY GROUP” – Will include the number of busy release codes examined

“CANCEL GROUP” – Will include the number of cancelled calls examined

“NOT FOUND” – Will include any situation where the number wasn’t found (short number, ported, wrong dialing code, etc)

“ALL OTHERS” – Anything that doesn’t fall into the above categories

This drill down can rapidly show any of the below scenarios:

  • BUSY GROUP is not proportional – Normally will indicate a large amount of calls to similar destinations on your network. Normally, may indicate one of the following issues:
    • It’s holiday season and many people are on the phone – common
    • You have a large number of call center customers, targeting the same locations – common
    • One of your signalling gateway is being attacked – rare
    • One or more of your termination providers is return the wrong release code – common
  • CANCEL GROUP is not proportional – Normally will indicate a large number of calls are being canceled at the source, either a routed source of a direct source. Normally, may indicate one of the following issues:
    • You have severe latency issues in your network and your PDD (Pre Dial Delay) had increased – rare
    • Your network is under attack, causing a higher PDD – common
    • You have a customer originating the annoying “Missed Call” dialing methodology – common
    • One of your termination providers has False Answer Supervision due to usage of SIM gateways – common when dialing Africa
  • NOT FOUND GROUP is not proportional – Normally will indicate a large number of calls are being rejected by your carriers. Normally, may indicate one of he following issues:
    • One of your call center customers is using a shitty data list to generate calls – common
    • One of your call center customers is trying to phish numbers – common
    • One of your signalling gateways is under attack and you are currently being scanned – common
    • One of your upstream carriers is returning the wrong release code for error 503 – common

So, now the ball is in the hands of the tech teams to investigate the issue and understand the source. The most dangerous issues are the ones where your upstream carrier will change release causes, as these are the most problematic to analyse. If you do find a carrier that does this – just drop them completely, don’t complain, just pay them their dues and walk away. Don’t expect to get your money’s worth out of them, the chances are very slim for that.