Importance of QoS
TL;DR: In this article I want to provide a simple way to troubleshoot QoS.
All networks are not created the same. Every business has a different set of needs and requirements. How do you ensure your critical services and applications are receiving the highest priority over everything else happening on your network? Historically the answer to this question was to add bandwidth. This creates a challenge in a network where some of your clientele, such as students will take every bit of bandwidth you throw at them. Quality of service, or QoS helps to provide a certain level of service across your network. A full end-to-end QoS solution provides a higher level of service to your critical applications and also provides value by keeping your circuits more affordable.
Our entire WAN is serviced by private MPLS services, provided by two carriers. Since we service both employees such as support staff and faculty, as well as a general student population, we looked for a solution that would help us validate and test our end-to-end QoS.
QoS Configuration
Our QoS configuration is pretty straightforward. We provide a priority class for Voice, and then define additional classes that we map traffic to and assign a bandwidth percent, or remaining percent depending on the carrier. Both MPLS carriers honor our markings to ensure consistent level of service.
policy-map WAN-OUT-CHILD class COS1-PRIORITY priority percent 30 class COS2-VIDEO bandwidth remaining percent 30 random-detect dscp-based class COS2-DATA bandwidth remaining percent 30 random-detect dscp-based class COS3-DATA bandwidth remaining percent 20 random-detect dscp-based class COS5-SCAVENGER bandwidth remaining percent 5 random-detect dscp-based class class-default bandwidth remaining percent 15 random-detect dscp-based policy-map WAN-OUT-PARENT class class-default shape average 8500000 service-policy WAN-OUT-CHILD
On the other end, we make sure to mark and trust our L2 switches to ensure consistent marking across our network.
Validate QoS
This QoS configuration has served us well but the question was always “How do we validate our QoS?” Especially when someone places a call to the help desk complaining about a particular application such as Citrix, CIFS and HTTP. Usually, we would begin by taking a look at the graphs from another tool, which would show us bandwidth utilization per class or DSCP marking. This would help us determine causes such as too many Citrix sessions/terminals at the campus, which would provide sub-standard service to Citrix and starve out non-Citrix traffic. A situation such as this only happens a few times a year and results in a bandwidth upgrade for the campus.
What if that is not the issue? Next, we tried to do a file transfer test between two machines. This is never a good test as you must account for server CPU cycles, disk latency, etc. We have also used Iperf in the past, but this required installation on a client machine and coordination with our local campus techs to get the machine online and connected. This is where NetBeez became an attractive option.
How to Troubleshoot QoS with NetBeez
To troubleshoot QoS I will use the iperf functionality in NetBeez. Each NetBeez agent is a dedicated Iperf client and server.
We have several agents installed at our datacenter, as well as several deployed at our campuses. This is crucial as we can test the full end-to-end QoS between campuses as well as between campus and datacenter. By default if you just enter your source agent and your destination agent, a standard Iperf test will be launched.
Where the real power comes in is by specifying the ‘TOS DSCP/PHB Class’ option.
With this option, we can test a specific DSCP marking/class! We can use the NetBeez output, in conjunction with the CLI output to see exactly how our QoS and our circuits are performing. For this, we specify the bandwidth, the DSCP/PHB, Class and Test Duration.
As you can see below, we have validated 15Mbps via marking af21 in our COS3-DATA class with no drops. We can repeat this process by hitting a DSCP marking in each class to fully validate not only our own end-to-end QoS, but also the carrier QoS.
Class-map: COS3-DATA (match-any) 12538232 packets, 2726764759 bytes 30 second offered rate 14887000 bps, drop rate 6698000 bps Match: ip dscp cs2 (16) af21 (18) af22 (20) af23 (22) 12538232 packets, 2726764759 bytes 30 second rate 14887000 bps Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops) 40/726451/0 (pkts output/bytes output) 11811783/1628660892 bandwidth remaining 20% (1190 kbps) Exp-weight-constant: 9 (1/512) Mean queue depth: 41 packets dscp Transmitted Random drop Tail drop Minimum Maximum Mark pkts/bytes pkts/bytes pkts/bytes thresh thresh prob cs2 11168956/1074076814 11/1056 198/19008 24 40 1/10 af21 5116/554876 0/0 0/0 32 40 1/10 af22 637567/554020262 20693/31286744 705549/1066782798 28 40 1/10 af23 144/8940 0/0 0/0 24 40 1/10
Troubleshooting Multicast
As a bonus option – Iperf can also be used to test multicast. We utilize a suite of software for software/package deployment to our client machines that depends on multicast. When there is a problem, this has historically required many hours on our part to work with the server administrators to find/fix the problem that is almost always the server configuration. With two NetBeez agents at the campus, we can validate that multicast routing is working as expected and provide this output to the server administrators to investigate their server/client multicast configuration.
On the same Iperf tab, select the checkbox for ‘Multicast IPerf’ and specify a group address. Again, we can use the NetBeez in conjunction with the CLI output to validate multicast routing.
# sh ip igmp groups IGMP Connected Group Membership Group Address Interface Uptime Expires Last Reporter 239.255.255.253 Vlan100 4d17h 00:02:01 10.23.128.10 239.2.3.9 Vlan100 00:00:14 00:02:45 10.23.128.203 224.0.1.40 Vlan100 3w6d 00:02:05 10.23.128.6 224.0.1.40 Loopback1 3w6d 00:02:05 10.23.143.19 (*, 239.2.3.9), 00:00:37/stopped, RP 10.23.143.19, flags: SP Incoming interfaceL Null, RPF nbr 0.0.0.0 Outgoing interface list: Null (10.23.133.203, 239.2.3.9), 00:00:34/00:02:58, flags: PTA Incoming interface: Vlan103, RPF nbr 0.0.0.0 Outgoing interface list: Null
I hope this post shows the real power of having dedicated NetBeez agents at all of your locations to validate bandwidth, QoS and multicast.
About the Author
Matthew D. Smith (CCIE #26439 R&S) is a network engineer with over 14 years of experience including Fortune 500 companies, Government, ISP, consulting and private sector.