How to troubleshoot and validate QoS

Importance of QoS

TL;DR: In this article I want to provide a simple way to troubleshoot QoS.

All networks are not created the same. Every business has a different set of needs and requirements. How do you ensure your critical services and applications are receiving the highest priority over everything else happening on your network? Historically the answer to this question was to add bandwidth. This creates a challenge in a network where some of your clientele, such as students will take every bit of bandwidth you throw at them. Quality of service, or QoS helps to provide a certain level of service across your network. A full end-to-end QoS solution provides a higher level of service to your critical applications and also provides value by keeping your circuits more affordable.

Our entire WAN is serviced by private MPLS services, provided by two carriers. Since we service both employees such as support staff and faculty, as well as a general student population, we looked for a solution that would help us validate and test our end-to-end QoS.

QoS Configuration

Our QoS configuration is pretty straightforward. We provide a priority class for Voice, and then define additional classes that we map traffic to and assign a bandwidth percent, or remaining percent depending on the carrier. Both MPLS carriers honor our markings to ensure consistent level of service.

policy-map WAN-OUT-CHILD
 class COS1-PRIORITY
  priority percent 30
 class COS2-VIDEO
  bandwidth remaining percent 30
  random-detect dscp-based
 class COS2-DATA
  bandwidth remaining percent 30
  random-detect dscp-based
 class COS3-DATA
  bandwidth remaining percent 20
  random-detect dscp-based
 class COS5-SCAVENGER
  bandwidth remaining percent 5
  random-detect dscp-based
 class class-default
  bandwidth remaining percent 15
  random-detect dscp-based
policy-map WAN-OUT-PARENT
 class class-default
  shape average 8500000
  service-policy WAN-OUT-CHILD

On the other end, we make sure to mark and trust our L2 switches to ensure consistent marking across our network.

Validate QoS

This QoS configuration has served us well but the question was always “How do we validate our QoS?” Especially when someone places a call to the help desk complaining about a particular application such as Citrix, CIFS and HTTP. Usually, we would begin by taking a look at the graphs from another tool, which would show us bandwidth utilization per class or DSCP marking. This would help us determine causes such as too many Citrix sessions/terminals at the campus, which would provide sub-standard service to Citrix and starve out non-Citrix traffic. A situation such as this only happens a few times a year and results in a bandwidth upgrade for the campus.

What if that is not the issue? Next, we tried to do a file transfer test between two machines. This is never a good test as you must account for server CPU cycles, disk latency, etc. We have also used Iperf in the past, but this required installation on a client machine and coordination with our local campus techs to get the machine online and connected. This is where NetBeez became an attractive option.

How to Troubleshoot QoS with NetBeez

To troubleshoot QoS I will use the iperf functionality in NetBeez. Each NetBeez agent is a dedicated Iperf client and server.

Iperf 1

We have several agents installed at our datacenter, as well as several deployed at our campuses. This is crucial as we can test the full end-to-end QoS between campuses as well as between campus and datacenter. By default if you just enter your source agent and your destination agent, a standard Iperf test will be launched.

Iperf 2

Where the real power comes in is by specifying the ‘TOS DSCP/PHB Class’ option.

Iperf 3

With this option, we can test a specific DSCP marking/class! We can use the NetBeez output, in conjunction with the CLI output to see exactly how our QoS and our circuits are performing. For this, we specify the bandwidth, the DSCP/PHB, Class and Test Duration.

Iperf 4

As you can see below, we have validated 15Mbps via marking af21 in our COS3-DATA class with no drops. We can repeat this process by hitting a DSCP marking in each class to fully validate not only our own end-to-end QoS, but also the carrier QoS.

Class-map: COS3-DATA (match-any)
12538232 packets, 2726764759 bytes
30 second offered rate 14887000 bps, drop rate 6698000 bps
Match: ip dscp cs2 (16) af21 (18) af22 (20) af23 (22)
 12538232 packets, 2726764759 bytes
 30 second rate 14887000 bps
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 40/726451/0
(pkts output/bytes output) 11811783/1628660892
bandwidth remaining 20% (1190 kbps)
Exp-weight-constant: 9 (1/512)
Mean queue depth: 41 packets
dscp     Transmitted          Random drop     Tail drop          Minimum Maximum Mark
         pkts/bytes           pkts/bytes      pkts/bytes         thresh  thresh  prob
cs2      11168956/1074076814  11/1056         198/19008          24      40      1/10
af21     5116/554876          0/0             0/0                32      40      1/10
af22     637567/554020262     20693/31286744  705549/1066782798  28      40      1/10
af23     144/8940             0/0             0/0                24      40      1/10

Troubleshooting Multicast

As a bonus option – Iperf can also be used to test multicast. We utilize a suite of software for software/package deployment to our client machines that depends on multicast. When there is a problem, this has historically required many hours on our part to work with the server administrators to find/fix the problem that is almost always the server configuration. With two NetBeez agents at the campus, we can validate that multicast routing is working as expected and provide this output to the server administrators to investigate their server/client multicast configuration.

On the same Iperf tab, select the checkbox for ‘Multicast IPerf’ and specify a group address. Again, we can use the NetBeez in conjunction with the CLI output to validate multicast routing.

# sh ip igmp groups
IGMP Connected Group Membership
Group Address    Interface Uptime   Expires   Last Reporter  
239.255.255.253  Vlan100   4d17h    00:02:01  10.23.128.10
239.2.3.9        Vlan100   00:00:14 00:02:45  10.23.128.203
224.0.1.40       Vlan100   3w6d     00:02:05  10.23.128.6
224.0.1.40       Loopback1 3w6d     00:02:05  10.23.143.19

(*, 239.2.3.9), 00:00:37/stopped, RP 10.23.143.19, flags: SP
Incoming interfaceL Null, RPF nbr 0.0.0.0
Outgoing interface list: Null

(10.23.133.203, 239.2.3.9), 00:00:34/00:02:58, flags: PTA
Incoming interface: Vlan103, RPF nbr 0.0.0.0
Outgoing interface list: Null
Iperf 4

I hope this post shows the real power of having dedicated NetBeez agents at all of your locations to validate bandwidth, QoS and multicast.

About the Author

Matthew D. Smith (CCIE #26439 R&S) is a network engineer with over 14 years of experience including Fortune 500 companies, Government, ISP, consulting and private sector. 

decoration image

Get your free trial now

Monitor your network from the user perspective

You can share

Twitter Linkedin Facebook

Let's keep in touch

decoration image