Impact of Packet Loss, Jitter, and Latency on VoIP

The challenging part of VoIP traffic is that it needs to compete with all other traffic and also be delivered in real-time in order to achieve a good audio quality level. With email or file downloads, if a packet is received out of order or delayed by a few seconds, the user probably won’t even notice. On the contrary, VoIP packets have to arrive in real-time in order to have an intelligible conversation.

There has been a great deal of research on how to encode and route voice traffic through IP networks. On the encoding side, there are several widely used algorithms that trade off compression with required bandwidth. On the routing side, QoS marking can improve the deliverability and timing of voice packets. However, all of us have experienced poor VoIP quality because latency, jitter, and packet loss can never be completely eliminated from real world networks. Let’s see how these three factors affect VoIP quality and how we can measure and monitor the audio quality we offer to our employees.

Latency and VoIP

Audio latency consists of two parts: the time it takes to encode the audio and the travel time of the packet. The latency itself doesn’t affect the quality of the delivered audio, but it can affect the interaction between the two end users. At 100 ms of latency, the users start talking on top of each other, and at 300 ms, the conversation becomes impossible to follow..

Jitter and VoIP

Jitter is the variation in the delay of received packets. High jitter results in choppy voice or temporary glitches. VoIP devices implement jitter buffering algorithms to compensate packets that arrive at high timing variations, and packets can even get dropped when they arrive with excessive delay.

Packet loss and VoIP

Telephony is all UDP based, and packets may not arrive at the destination, or get discarded if they arrive delayed or contain errors. This results in missing audio information at the destination.

The industry has adopted the Mean Opinion Score (MOS) as the universal metric to measure and classify the conversation quality that happens over a network. As the name suggests, it is based on the opinion of the user and ranges from 1.0 to 5.0 with the following classifications:

4GoodPerceptible but not annoying
3FairSlightly annoying
1Bad>Very annoying

Typically, the highest MOS score that can be achieved is 4.5 for the G.711 codec. The cutoff MOS score for calls that can be tolerated is around 2.5. Ideally, the MOS score is calculated by asking the participants to put a score to the conversation. However, this is not practical, and there are ways to estimate the call quality based on the network’s latency, jitter, and packet loss. The most popular method is based on the E-model, which calculates the rating factor, R, which then is used to derive the MOS score.

For an R-value larger than 93.2, we get the maximum MOS score. Depending on latency, jitter, and packet loss we need to deduct from 93.2. This may sound like a magic number, but if you want to learn more about how this number is derived (E-model paper).

Effective Latency

Latency and jitter are related and get combined into a metric called effective latency, which is measured in milliseconds. The calculation is as follows:

effective_latency = latency +2*jitter + 10.0

We double the effect of jitter because its impact is high on the voice quality and we add a constant of 10.0 ms to account for the delay from the codecs.

R is reduced based on effective latency as follows:

For effective_latency < 160.0 ms:

R = 93.2 - (effective_latency)/40.0

For effective_latency >= 160.0 ms:

R = 93.2 - (effective_latency - 120.0)/10.0

If the effective latency is less than 160.0 ms the overall impact to the voice quality is moderate. For larger values, the voice quality drops more significantly, which is why R is penalized more.

The packet loss (in percentage points) is taken into consideration as follows:

R = R - 2.5 * packet_loss

Finally, the MOS score is calculated with the following formula:

For R < 0:

MOS = 1.0

For 0 < R < 100.0:

MOS = 1 + 0.035*R + 0.000007*R*(R-60)*(100-R)

For R >= 100.0:

MOS = 4.5

It’s interesting to see how the three parameters affect MOS. If you do the math, you will see that for a packet loss larger than 20%, the MOS is lower than 2.5. Below we see the MOS for packet loss 0%, 10% and 15%, latency 0 to 1000 ms and jitter 0 to 500 ms.



For each 100 ms latency and 200 ms jitter increase, the MOS score drops by one point. Overall, for latency less than 200 ms and jitter less than 100 ms, the MOS is above 3 points, which is considered an acceptable value and means you can have a comfortable conversation. The impact of jitter being twice as degrading as latency is also reflected by the effective latency calculation where jitter is multiplied by two.

The VoIP systems can calculate the MOS during a placed call and report it. This gives you the ability to detect poor VoIP performance, but the catch is that you detect and measure MOS only when two users place a call and there is data to help you calculate it.

At NetBeez, we follow a proactive approach to measure MOS: we generate traffic between two or more agents and measure latency, jitter, and packet loss. This enables early detection of VoIP quality degradation without waiting for a user to place a call and be affected. Below you see the historical data of the MOS score between our San Jose and Pittsburgh offices that goes over our cloud VPN. Every five minutes we run a 30 second G.711 stream and measure latency, jitter, packet loss. From that we derive the MOS.