How to Leverage Latency Testing & Long Term Trend Collection

In the ever-present quest for uptime and quality of experience, one of the most overlooked and underestimated attributes is latency. The reason for this is not entirely obtuse, primarily because latency is very hard for the average user to quantify. In the normal problem reporting melee of user to engineer or NOC process, this characteristic of the network may be reported as vague user issues, soft failures, or general poor user experience. In many cases, users of the network and associated services may not even report the issues – and this is the real problem.

The common misconception of the network being an unreliable entity can make for poor user experience and potentially lead to hasty reactions and uninformed decisions. Where this misconception manifests itself can vary, but one of the most noticeable places where latency issues become apparent is in voice services.

Voice services are typically UDP based, meaning that they have no regard or tolerance for reliable delivery, assuming there is enough headroom in a given path to reliably deliver their payload. We all know that this assumption is just that, an assumption, therefore, as network engineers, we may employ technology configurations such as quality of service (QoS) and packet classification to ensure that the important bits are delivered and that the less important bits have a lower priority.

An area where this process can be less straightforward is in large data transfers. Network engineers may not see this type of behavior on their network regularly, but it is not uncommon and, in certain environments, can be quite complicated. Very fast host interface NICs that are equal or even significantly larger than a given enterprise backbone or upstream transit bandwidth performing transfers of large data sets such as offsite backup files, parallel stream transfers, or long-lived flows may eclipse the overhead available. If these flows are on the same infrastructure as voice and business traffic without appropriately configured QoS, this can wreak havoc on business process. A multitude of options are available for on-demand testing of one and two-way latency; applications such as OWAMP and TWAMP can perform very granular and fine-tuned testing of latency and are wonderful options.

For example, on top hosts, one could run the two-way latency test using twampy, a python based toolkit.

On the remote host, we download and run the receiver, after opening all of the appropriate ports and host-based firewall rules.

buraglio@remote.netmon.forwardingplane.net:~/twampy$ ./twampy.py receiver

On the local host we do the same but run the sender:

buraglio@local.netmon.forwardingplane.net:~/twampy$ ./twampy.py sender remote.netmon.forwardingplane.net

==============================================================================

Direction         Min Max      Avg Jitter Loss

-------------------------------------------------------------------------------

 Outbound:        1.80ms 47.63ms     11.02ms 8.30ms  0.0%

 Inbound:        16.90ms 24.32ms     18.47ms 1.28ms 0.0%

 Roundtrip:      19.10ms 65.03ms     29.49ms 8.95ms 0.0%

------------------------------------------------------------------------------

                                                   Jitter Algorithm [RFC1889]

==============================================================================

Clearly, the on-demand data is useful, especially when doing a triage of issues, but tracking this type of information over time can be immeasurably handy in determining the ageless networking question of, “What changed?”. Many engineers love the command line and enjoy the thrill of the hunt when it comes to rooting out problems, however, many times there are larger projects to attend to, as well as, a long-term analysis that can be well suited for tracking down deviations from normal network behavior. Much like creating a baseline for any other network attribute, tracking your latency statistics over time makes for quick work. Keeping working knowledge of a network particularly the latency statistics over time, is one often underrated but significant tools in the toolbox that is a larger network monitoring strategy.

About the Author

Nick Buraglio has been involved in the networking industry in varying roles since 1997 and currently works on the network planning and architecture team for a major international research network provider. Prior to his current role, Nick spent time working on network and security architecture in the higher education and research facility space, building regional networks for high-performance computing. Nick has also led network engineering teams at early regional broadband internet providers and provided strategy and guidance to many service providers around the world.

Nick specializes in service provider backbone networks but also dabbles in security, broadband technologies, instrumentation, monitoring, data centers, Linux, and everything in between. His personal blog is at https://www.forwardingplane.net and is full of esoteric ramblings and strong opinions.

How to Leverage Latency Testing and Long Term Trend Collection

Further Reading