Network Troubleshooting

Network troubleshooting is the process of acquiring data and evidence required to identify the root cause of a network outage or performance issue. In fact, network downtime and performance issues can cause significant losses for businesses. A well functioning network is essential for ensuring business operations in the digital era. For this reason, network troubleshooting tools are essential.

For most common network problems, there are very specific network troubleshooting tools available as well as a basic network troubleshooting process to follow. In this guide, we will review the network troubleshooting tools and techniques available to deal with the most common network issues.

Why Network Troubleshooting Tools Are Important

Network troubleshooting tools are vital assets as they enable organizations to:

Reduce network downtime – Efficient network troubleshooting reduces the amount of network downtime that digital services and information systems experience.
Cost savings – Computer network slowdowns and outages cost money due to interrupted services, SLA penalties, and customer complaints that increase cancellations.
Improved performance – Identifying bottlenecks and implementing preventive measures translate into better quality of service and overall productivity.

Troubleshooting Common Network Issues

There are all kinds of problems that impact a network infrastructure. The most common network issues are typically related to:

Network hardware and link failures due to faulty network equipment, bad cabling such as degraded connectors or fiber optics. Networking hardware can also be damaged by natural events such as wind and flooding.
Internet outages and network connectivity failures may occur outside control of a network team, as they spread beyond an organization’s network infrastructure. Also for these cases, network administrators is important to understand the origin of the problem to inform the stakeholders of an organization.
Network slowness and sluggish application performance could indicate a network congestion issue. Network congestion generally happens when the traffic transmitted by clients is more than what the network can forward.
Network configuration errors can cause total network outages, slowness due to suboptimal configuration, or other unexpected behaviors. These types of failures can be hard to troubleshoot if the organization doesn’t have a configuration change management process in place nor a network configuration audit tool.
Noise and interferences: A wireless network can experience from localized RF (Radio-Frequency) issues, such as low signal or interferences, to network infrastructure issues such as radius authentication failures.
User error or perception is one of the most common network problems. In the case where it’s not the network to be at fault, the support team still needs to show a proof of innocence.

Network Troubleshooting Tools

Network troubleshooting tools enable IT professionals to collect the required data necessary to identify the root cause of network issues. Most of the basic network troubleshooting tools are executed on a command prompt. These troubleshooting tools gather network configuration data such as a computer’s IP address, or network performance data such as network latency, and so on.

Traceroute and ping are the most common command line network troubleshooting tools used.

Ping

Ping is a network troubleshooting tool that measures the network latency and packet loss. It sends and receives packets as defined by the Internet Control Message Protocol (ICMP). This command can be executed from any computer’s console, as almost any operating system supports it. When you execute ping, by default it sends one packet every second to the destination host or IP address that was passed as argument. If the destination host is up and running, reachable, and allowed to respond, it will reply with an echo reply packet to each echo request received by the sender. If the network between the two hosts has packet loss, or total loss (e.g. disconnection), some or all of these echo request and reply packets will be lost.

The following output from a ping test against netbeez.net with 10 packets (option -c as count). The source receives only 9 out of 10 replies, so the packet loss is 10%.

In Windows, just open a command prompt by typing cmd in the search bar next to the start menu. Once the command prompt opens, type ping followed by the remote IP address or hostname. By default the windows ping runs 4 pings and then exists, reporting ping statistics such as packet loss and RTT min. If you want to discover more ping options, read the article we wrote. If you often find yourself opening multiple CMD windows to compare ping results, consider learning about vmPing as an alternative.

Traceroute

Traceroute is a network troubleshooting tool that discovers all the routers (or “routing hops”), and associated latency, between a source and a destination host. This command is valuable to discover the routes that the internet connection follows to reach a specific destination. In many cases, it helps pinpoint failure points, or suboptimal routes.

To run traceroute, open the command prompt and type the command followed by the destination host. Below are the traceroute results to www.google.com that reports each intermediate hop’s response time:

By default, traceroute uses the ICMP protocol on Windows, UDP on Linux and Mac OS.

MTR

MTR, which stands for Matt’s Traceroute, merges the functionalities of traceroute and ping. Like traceroute, it discovers all the routers between the source and destination, those at least that reply to requests. Like ping, it runs ping tests against all the discovered routers, and reports latency and packet loss.

MTR helps pinpoint network slowdown issues at various points along the path(s) to the destination. In the following screenshot you can see that hops 3, 4, and 5 are having more than 25% of packet loss. That could indicate performance issues along the Internet path.

ARP

The “Address Resolution Protocol” (ARP) enables TCP/IP hosts to identify the MAC address of a local host given it’s IP address. Networking vendors provide an arp command that enables the user to inspect the ARP cache, which lists the association between a MAC address and an IP address. Network administrators use ARP on a default gateway to check wether a host is reachable.

For instance, the following output displays all IP to MAC associations in a given network:

Nmap

Nmap is an open source network scanner. This network troubleshooting tool has many functions, including ping sweep to discover multiple devices in a specific network that respond to ping requests, or discover which TCP/IP ports a remote host has open or closed. If you’re interested in extending functionality, check out these Nmap scripts. The following output displays a port scan:

Netstat

Netstat is a command-line tool that gathers network statistics from the local host. The command is available in various operating systems, including Linux, Mac OS, and Windows. It displays network-related information such as active network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. Netstat provides details about the network connections established on a system, including the protocol used (TCP or UDP), local and foreign addresses, state of the connection, and more. It’s commonly used for troubleshooting network-related issues, monitoring network activity, and diagnosing network performance problems.

Nslookup

Nslookup is a command that resolves a given hostmame, or Fully Qualified Domain Name, to verify if DNS works. This command is essential to run a DNS check. By default, nslookup will use the DNS server configured on the host.

The user can also specify a DNS server to test as second argument:

Same nslookup command but this time with a DNS server passed as argument.

Other commands that perform a DNS check are host and dig.

Tcpdump

Tcpdump is a tool that allows you to capture network connections and do TCP/IP packet analysis. This command allows you to sniff all traffic that goes in and out of any network interface. More importantly, it has the ability to filter the traffic by interface, host, destination or source host, type of traffic, and many other criteria. In fact, when troubleshooting network performance issues, it’s important to isolate the data packets that are relevant to identify the root cause more easily.

Network Troubleshooting Process With the Layered Methodology

When troubleshooting network problems, it’s very important to keep the OSI model in mind and begin troubleshooting from the lower physical layer to the application layer.

Layer	Component
(1) Physical	NICs, cables, RF signal, bits, …
(2) Data Link	Switch, Access Point, MAC address, …
(3) Network	Router, IP address, subnet masks, ….
(4) Transport	Firewall, TCP/UDP ports,
(5) Application	Web server, online gaming, video streaming, …

This bottom-up approach helps to successfully troubleshoot network problems because each layer relies on the lower one to function properly. In the following sections, we’ll provide the basic network troubleshooting steps that a network engineer can follow to work their way up from the first four layers of the OSI model.

Troubleshooting the Physical Layer (OSI Layer 1)

The physical layer includes anything that is responsible for transferring data from point A to point B. For this reason, it’s important to verify that network resources are connected properly. When troubleshooting network issues at this layer, it’s possible to use integrated diagnostic tools that are included within network equipment. In some cases, it may be even required the use of specialized diagnostic tools, such as spectrum analyzers, in the case of WiFi networks, or optical time-domain reflectometer, in the case of fiber links.

An optical time-domain reflectometer for fiber testing.

Most WiFi adapters report the signal strength and link quality of the connection established with the Access Point. In the following screenshot you can look at the signal strength and noise of a wireless laptop. Read the article we wrote about Wi-Fi troubleshooting if you want to learn more.

Speed and Duplex Mismatch

In the case of wired connections via Ethernet cables, speed and duplex mismatches are common. Speed refers to the rate of data transfer, while duplex refers to the ability of a network interface to send and receive data simultaneously. Mismatched speed and duplex settings can result in performance issues, packet loss, or network instability.

To address speed and duplex mismatches, the troubleshooter needs to verify the configuration settings on both the switch ports and the connected devices. Ethernet cards and managed switches report this information so the network administrator can verify that devices have the same speed and duplex network configuration.

The command “show interface” on a Cisco switch will provide the initial information required to troubleshoot speed and duplex issues:

show interfaces command on a Cisco switch.

Troubleshooting the data-link layer (OSI Layer 2)

To troubleshoot the data-link layer issues, the network troubleshooter will focus on the Local Are Network. In the case of Ethernet local network connections, the engineer can access the command line of the switch to inspect the MAC address table, which provides information about the addresses learned on switched ports.

The “show mac-address-table” command on a Cisco switch returns the layer2 addresses of all devices connected to the switch:

show mac address table on a Cisco switch

To troubleshoot Layer 2 communications between hosts, network admins can use passive analysis tools such as Wireshark, which is GUI based, or tcpdump, which is command line based. Such tools provide a recording of frames, flowing across a network link, switch or host. In the following screenshot you can see the case of a Wireshark packet capture in Mac. The user can input filters to capture and display only certain traffic.

Spanning Tree Protocol

Another important thing to keep in mind when troubleshooting layer 2 issues is the spanning tree protocol. Spanning tree is a Layer 2 protocol that enables switched networks to build a loop-free topology, which happens when redundancy is introduced in a network design. When a network topology has a loop, frames flow indefinitely without reaching its destination host or getting discarded, causing broadcast storms to happen. Broadcast storms saturate network links and cause instability in the CAM (Content Addressable Table) of switches.

The spanning tree protocol avoids this scenario by disabling switch ports that cause loops. However, for this network functionality to properly work, all switches in the network must be correctly configured. Getting familiar with the spanning tree protocol and diagnostic commands on switches is a very important knowledge for network troubleshooting.

Troubleshooting the network layer (OSI Layer 3)

The top network troubleshooting tools for layer 3 issues are ping and traceroute. With ping you can verify whether a host can reach a destination network or host. With traceroute you can discover the routing hops available between a source and a destination.

When troubleshooting layer 3 problems, it’s important to consider whether the destination host is located within your organization, or not. If it does, then the troubleshooting efforts aim at figuring out whether a network misconfiguration, or something else, is causing the connectivity or performance issues. If, on the other end, the network path to the destination host traverses a third party, then it’s important to provide enough information and prove that it’s someone else’s problem.

Other tools that we find at this layer include IP calculators to ensure correct addressing of computers, as well as command line tools that display the configured IP address on a workstation (e.g. ipconfig in Windows or ifconfig in Linux and Mac OS).

Troubleshooting the transport layer (OSI Layer 4)

The transport layer is responsible for ensuring that application data is exchanged between two hosts. TCP provides a connection-oriented option, and UDP a connectionless. At this layer there are several things that could prevent applications from working, so different commands come to play. Here are some of the common causes of layer 4 network issues:

Protocol settings on the source or destination host, including host firewalls that block inbound or outbound traffic; Windows, Mac, and Linux have a netstat command that reports all open TCP/IP socket connections; to troubleshoot host firewalls, each system will have its own flavor (for instance in Linux iptables is a pretty common option).
Network firewalls between the source and the destination host that block connection attempts; to troubleshoot if a firewall is blocking a service from working, you can use a command like telnet in the case of TCP or the open source nmap, which has several capabilities and scans available (disclaimer: scanning Internet and third-party hosts without authorization can be prosecuted).
Overlay networks are sources of MTU mismatches causing some applications to function inconsistently based on the request or payload. Troubleshooting MTU can be done with a ping test by setting the Don’t Fragment bit (DF) and forcing the MTU to a required amount in line with application’s requirements. Traceroute also offers an option to test the path MTU end-to-end.

Troubleshooting the application layer (OSI Layer 7)

Network troubleshooting typically concludes at the transport layer, but it remains crucial to incorporate fundamental application troubleshooting into the procedure. This step ensures that a given issue is not potentially triggered by application behavior. Furthermore, as certain problems may solely manifest within an application context, scrutinizing error messages, application logs and conducting tests through the application interface aids in pinpointing the underlying cause of a potential network problem. In cases of application outages where the source of the issue is unclear, whether it pertains to the network or the application itself, it is advisable to engage in collaborative troubleshooting efforts involving both network and application teams, rather than pursuing isolated approaches. This collaborative approach aims to expedite issue resolution by leveraging the expertise of both teams.

Network Performance Monitoring Tools

In the context of network troubleshooting, network monitoring tools provide alerting, diagnostic data, network performance metrics, logs, and statistics. There are three main types of network monitoring tools that are used during troubleshooting. Since each tool type provides information about a specific aspect of the network, companies should have each one of them in place. Let’s briefly review them one by one.

SNMP pollers

Provide the status and diagnostic data on network devices; SNMP helps identify events such as hardware or link failures, software errors or bugs, and anything else that could affect a network component.

Passive analyzers

These tools help identify bottlenecks caused by one or more devices saturating a network’s bandwidth or specific portions of it; they can also inspect sequences of packets to pinpoint performance issues between a client and a server.

Active testing tools

Active measures include end-to-end reachability, round-trip-time, packet loss and other network metrics; tools like NetBeez alert on network performance degradation issues, and collect metrics around the end-user experience of network services and applications.

Reduce Root Cause Analysis with NetBeez

NetBeez is an active network performance monitoring platform that enables operations and support teams to quickly troubleshoot common network problems. The solution relies on distributed network monitoring agents that provide end-to-end network and application performance metrics. NetBeez has three key pillars that make it a good solution for network troubleshooting.

Granular Performance Data

NetBeez captures granular network performance metrics to applications and services.

Performance metrics up to one second interval
Help isolate with accuracy the exact time and moment when a problem occurs
Retains historical data to generate baselines, identify trends and recurring issues

Proactive Incident Detection

NetBeez agents run real-time tests, end-to-end, and from the user perspective.

Continuous active monitoring against networks and applications
Quick detection and alerting on service failures and performance degradation
Enforce and guarantee quality of service and SLAs
Verify and validate configuration changes during maintenance windows

Multi-Platform Deployment

The solution supports flexible deployment options for on-prem, cloud, and remote.

Deploy the server on-premises as a virtual appliance or in the cloud as an instance
Support Ethernet, Wi-Fi, virtual, Docker, and Linux based agents
Support Windows and Mac clients
Easily orchestrate and deploy at scale

Conclusion on Network Troubleshooting

Network troubleshooting is an essential component of effective network management, enabling organizations to quickly identify and resolve network issues that impact connectivity, performance, and overall functionality. By leveraging a combination of command-line tools and advanced network monitoring solutions, IT professionals can diagnose problems across all layers of the OSI model, from physical connections to application-level services. Investing in the right network troubleshooting tools and following a structured troubleshooting process helps reduce downtime, improve reliability, and optimize the network environment.

Ultimately, a proactive approach to network troubleshooting supports business continuity and enhances user experience in today’s complex network environments. Network troubleshooting is a key aspect of network management that requires proper investment in tools and resources. Organizations adopt a three tier approach to handle ticket response and escalation. When troubleshooting network performance issues, it’s very important to keep in mind the OSI model and its layers. Starting from the bottom layers and moving your way up will assure that the proper troubleshooting procedures with faster problem resolution. If you are a network administrator looking to automate network troubleshooting with a network monitoring tool, request a NetBeez demo.

Network Troubleshooting Tools

Network Troubleshooting

Why Network Troubleshooting Tools Are Important

Troubleshooting Common Network Issues

Network Troubleshooting Tools

Ping

Traceroute

MTR

ARP

Nmap

Netstat

Nslookup

Tcpdump

Network Troubleshooting Process With the Layered Methodology

Troubleshooting the Physical Layer (OSI Layer 1)

Speed and Duplex Mismatch

Troubleshooting the data-link layer (OSI Layer 2)

Spanning Tree Protocol

Troubleshooting the network layer (OSI Layer 3)

Troubleshooting the transport layer (OSI Layer 4)

Troubleshooting the application layer (OSI Layer 7)

Network Performance Monitoring Tools

SNMP pollers

Passive analyzers

Active testing tools

Reduce Root Cause Analysis with NetBeez

Granular Performance Data

Proactive Incident Detection

Multi-Platform Deployment

Conclusion on Network Troubleshooting

Further Reading