Service Assurance (SA) for telecommunication networks is defined as the set of processes and policies in place to verify that network services meet predefined Service-Level Agreements (SLA). Service assurance is typically enforced by using one or more network monitoring tools to constantly perform checks and tests against network services and applications. Service assurance tools are often referred to as: “tests and measurements”, “active monitoring”, “network performance monitoring”, or simply “network assurance”.
Service-level agreements are established between a Service Provider and a subscriber. Organizations that outsource network connectivity services typically publish requests for bids that define the services needed. Almost all of the requests for bids that I have seen included in the deliverables section the following SLAs:
- Latency – Also called Network Delay, which corresponds to the time that it takes for one bit of data to travel across the network from one point to another. Generally, latency will be based on the distance of the link, or connection, in place (e.g. metropolitan, regional, national, or international/intercontinental).
- Packet Loss – Packet Loss corresponds to the percentage of packets that have been received malformed or not received at all at destination. Generally, the number of packet loss related SLAs that I have seen are less or equal than 0.1%.
- Jitter – Jitter is the variation in the delay of received packets. Generally jitter SLAs that I have seen vary between 2 ms. and up to 10 ms. for short periods of time.
- Throughput – Throughput is the amount of data successfully transferred between a source and a destination host. This is based on the service and bandwidth purchased.
Other important factors that should be considered when defining SLAs to be enforced by a network assurance solution are:
- Service Availability – Availability is defined as the percentage of time during which a service can be used for the purpose that it was originally designed and built for.
- Mean Time to Detect (MTTD) – The time that it takes to detect a failure, whether it is either hardware or software. Oftentimes this is defined in minutes.
- Mean Time To Respond (MTTRSP) – The time that it takes for the service provider to dispatch a repair resource. Oftentimes this is defined in minutes.
- Mean Time To Repair (MTTR) – The time that it takes to the service provider to repair the fault. Oftentimes this is defined in minutes.
Network Service Assurance
If you have never implemented network service assurance, where do you start? The open source community built many tools that can be used to run basic network assurance tests. In the following table, I list some options available to enforce the SLAs listed in the previous paragraph.
Service Level Agreement
owamp (one-way ping)
ping (ICMP echo request/reply)
|ping (ICMP echo request/reply)|
|Jitter||iperf UDP mode|
I have included links to owamp and iperf. For ping, almost any operating system nowadays includes a ping utility, so you don’t really need to install anything specific. Just open your terminal and type ping at the prompt.
To enforce other parameters such as service availability, MTTD, MTTRSP, and MTTR, you can use a network monitoring tool in conjunction with a ticketing system. Let’s take the example of NetBeez 🙂
NetBeez Service Assurance
NetBeez calculates the availability of each remote monitoring agent and reports that on the dashboard and via API call. Availability is based on the time period selected by the user. This value corresponds to the remote site’s availability where the agent is installed. It’s calculated as the percentage of time during which the agent was reachable from the NetBeez server.
For the MTTRSP parameters, NetBeez provides timestamps on notifications of when an incident is detected by the system and when the incident is acknowledged by the operator. In the below screenshot, you can see that the incident was detected at 7:11PM and acknowledged by an operator (nbadmin) at 7:14PM, for a total MTTRSP of 3 minutes.
To calculate the MTTR you can take the time difference between when the incident is detected and cleared (resolved). In the below screenshot, you can see that the incident was raised (opened) at 6:29PM and cleared (closed) at 6:42PM, for a total of 13 minutes.
Service assurance for network services is a very important function of a network monitoring tool. By defining and enforcing network services’ service-level agreements, a network operations team is able to verify that a good end-user experience is being offered to the network users and subscribers.