Network Monitoring: Troubleshooting for Fun

Two weeks ago, we launched another Great Troubleshooting Challenge, a competition where we ask participants to troubleshoot a series of network incidents staged in a lab environment. Whoever scores the most points wins, with the completion time acting as a tiebreaker. Today, we’re sharing the behind the scenes planning that goes into creating the challenge, as well as some of the answers.

The network scenario is represented in the image below and consisted of a small, simplified, Wide Area Network (WAN) comprised of two branch offices, plus one headquarters. The two branches, and the main office, share the same Internet connection, provided by the Internet router at the company’s data center.

Hardware Used
The Branch 2 router was running a Cisco 3700 router; Branch 3, a Cisco 1751V; the main office router, a Linux router; and the Internet router, a small office router provided by Verizon.

The Network Scenario
Network troubleshooting diagram with iconsNetBeez
The following NetBeez agents were deployed in the network to verify connectivity and network and application performance: one FastE agent per branch, one FastE agent and two WiFi agents in the main office, and one FastE agent at the data center. The NetBeez deployment also included two external NetBeez agents, running on Amazon Cloud, to get an external point of view on availability and performance of third party applications, such as Google, MS Office 365, and Salesforce.

For this challenge, we set up three troubleshooting scenarios. In the first scenario, we asked participants to find out why an access list configured on Branch 3’s router was not blocking users’ access to the application Salesforce, as originally requested by the management at Branch 3, and as reported by the Salesforce target configured on NetBeez.
HTTP test results graph

Here is a snippet of that router’s configuration, where the FastEthernet0/0 interface is the external one, while FastEthernet0/1 is the private one.


The problem with this configuration is that the network administrator is blocking the wrong IP address. In fact, if you check with a DNS lookup utility, like the command 
dig, you can see that salesforce.com returns IP address 96.43.145.26, while www.salesforce.com returns both 96.43.144.26 and 96.43.145.26:

The correct solution for this first challenge was to either block the IP address 96.43.145.26, or both IP addresses 96.43.144.26 and 96.43.145.26.

In the second part of the challenge, participants were asked to find out why Branch 3 users were getting less download/upload Internet speed than users at Branch 2 and main office. This problem was reported by the speedtest configured on the respective NetBeez agents:
test results graphupload time test results graph

You can see from the above reports that both Internet speedtests run from either Branch 2 and main office can reach throughputs around 20 Mbps, and sometime even 40 Mbps.

Here is the a snippet of the router’s configuration at Branch 3. The interface Ethernet0/0 is the external one, while the FastEthernet0/0 is the internal one:

 The reason why Branch 3’s users are getting less Internet throughput than users at the other locations is because the Cisco 1751V router has an Ethernet interface, which can’t send and receive more than 8 Mbps. Participants should have recommended to upgrade the router or install a FastEthernet interface in place of the Ethernet one.

We also asked participants if this decreased throughput on Branch 3’s router was affecting real-time communications, such as VoIP calls. By generating a report on a VoIP test configured between the NetBeez agent deployed at Branch 2 and the one deployed at Branch 3, it was clear that it was not, as shown in the below graph, where jitter is very low, so latency, and the mean opinion score is 4.3, which is an excellent value for VoIP calls.
test results graph

The last section of the troubleshooting challenge was all about WiFi monitoring. As said before, two 802.11ac NetBeez agents were deployed at the Main Office. One was monitoring the acme-employees SSID, and one the acme-guest SSID. First, we asked participants to report the BSSIDs and RF channels that the two 802.11ac agents were connected  to. Then, we asked participants to find out what network change the network administrator performed on the acme-guest SSID, since users would sporadically lose connectivity. This problem was detected by the NetBeez agent monitoring that network, and revealed that on Thursday, September 29,  the administrator moved the access point from channel 1 to channel 6, which had less interference with other locally broadcasting SSIDs.
test results graphs with white background

And with that, another Great Troubleshooting Challenge is in the books. We had many participants and three winners. The 1st place winner was Steven Bos, who won an Xbox One; 2nd place was Daniel Hardy, who won a NetBeez WiFi  agent; and in 3rd place was Tim Greenwald, who won a NetBeez FastE agent. We had a lot of fun mocking up this network scenario, and reviewing the applicant’s answers. We hope they enjoyed it  as well! Keep an eye out for more troubleshooting challenges from NetBeez in the future.