Service Assurance (SA) for telecommunication networks is a set of processes and policies in place to verify that network services meet predefined Service-Level Agreements (SLA). Service assurance plays a critical role in network management. This is mostly due to the fact that today’s businesses rely on digital services for communication, work, and productivity. Ultimately, service assurance enforce a satisfying user experience to digital services. This article will cover the key metrics and parameters of service assurance. It will also recommend strategies that will help guaranteeing a solid connectivity and service quality.
Service-Level Agreements
Service Level Agreements define a mutual understanding of performance between a service provider and a subscriber. Generally SLA are enforced by standalone contracts, or included in a Master Services Agreement. In both cases, most agreements will include the following sections:
- The type of service rendered to the customer
- The SLA metrics that will be monitored and enforced
- The SLA parameters that specify how detected issues will be handled
- The escalation policies and repair time in the case of an outage
SLA Metrics
Organizations that outsource network connectivity services typically publish requests for bids that define the services needed. The requests for bids include in the deliverables section all the parameters and values that must be met. We call these parameters SLA metrics.
- Latency – Also called Network Delay, which corresponds to the time that it takes for one bit of data to travel across the network from one point to another. Generally, latency will be based on the distance of the link, or connection, in place (e.g. metropolitan, regional, national, or international/intercontinental).
- Packet Loss – Packet Loss corresponds to the percentage of packets that are received malformed or not received at all at destination. Generally, the number of packet loss related SLAs that I have seen are less or equal than 0.1%.
- Jitter – Jitter is the variation in the delay of received packets. Generally jitter SLAs that I have seen vary between 2 ms. and up to 10 ms. for short periods of time.
- Throughput – Throughput defines the amount of data successfully transferred between a source and a destination host. This is based on the service and bandwidth purchased.
SLA Parameters
Other important metrics included in Service Level Agreements are:
- Service Availability – Availability is defined as the percentage of time during which a service can be used for the purpose that it was originally designed and built for.
- Mean Time to Detect (MTTD) – The time that it takes to detect a failure, whether it is either hardware or software. Oftentimes expressed in minutes.
- Mean Time To Respond (MTTRSP) – The time that it takes for the service provider to dispatch a repair resource. Oftentimes expressed in minutes.
- Mean Time To Repair (MTTR) – The time that it takes to the service provider to repair the fault. Oftentimes expressed in minutes.
Network Monitoring and Service Assurance
Network monitoring tools help ensure that SLAs are met. These tools constantly check and verify that the network is conform to the original specifications. Should a metric not be conform, the network monitoring tools will trigger an alert and issue a notification to the service provider. Network monitoring tools for service assurance sometimes are also referred to as: “tests and measurements”, “active monitoring”, “network performance monitoring”, or simply “network assurance”.
There are commonly available tools that network administrators can run for basic service assurance. In the following table, I list some options available to enforce the SLAs listed in the previous paragraph.
Service Level Agreement | Option |
---|---|
Latency | owamp (one-way ping) |
Round-Trip Time | ping (ICMP echo request/reply) |
Packet Loss | ping (ICMP echo request/reply) |
Jitter | iperf UDP mode |
Throughput | iperf |
Download/Upload speed | NDT or speed test |
To enforce other parameters such as service availability, MTTD, MTTRSP, and MTTR, you can use a network monitoring tool in conjunction with a ticketing system. Let’s take the example of NetBeez.
NetBeez and Service Assurance
NetBeez calculates the availability of each remote monitoring agent and reports that on the dashboard and via API call. Availability is based on the time period selected by the user. This value corresponds to the remote site’s availability where the agent is installed. It’s calculated as the percentage of time during which the agent was reachable from the NetBeez server.
For the MTTRSP parameters, NetBeez provides timestamps that indicate when an incident is detected by the system and when the incident is acknowledged by the operator. In the below screenshot, you can see that the incident was detected at 7:11PM and acknowledged by an operator (nbadmin) at 7:14PM, for a total MTTRSP of 3 minutes.
We calculate the MTTR as the time difference between the incident’s open and closed times. In the below screenshot, you can see that the incident started at 6:29PM and ended at 6:42PM. The TTR was 13 minutes.
Conclusion
Service assurance for network services is a very important function of a network monitoring tool. By defining and enforcing network services’ service-level agreements, a network operations team is able to verify that a good end-user experience is being offered to the network users and subscribers.