The traceroute tool is one of the most important network diagnostic tools that network engineers use every day to identify network issues or troubleshoot network connectivity. The most beneficial use of this utility is to identify routing loops, asymmetric routing, nodes with high latency, etc.
The traceroute program is available on most modern operating systems, such as Linux and Windows (tracert command), as well as networking vendors. When using traceroute to troubleshoot network issues, it’s important to be aware of its limitations. In some specific cases, the command may return incomplete or misleading information. In this article we will cover all that you need to know about traceroute.
What is Traceroute?
Traceroute is a network testing tool that discovers the IP addresses of the routers (or “routing hops”) between a source and a destination host. The source host is the client that starts the trace. For each router, the command returns its:
- IP address,
- Fully Qualified Domain Name (FQDN) if available, and
- Round-Trip Time (RTT) for network latency.
Advanced option could also include the Autonomous Systems traversed, Maximum Transmission Unit (MTU), and more. By default, the utility sends three probes for each hop. As a result, probes could discover more than one path, in case of multi-path routing, or return three RTT measurements for a specific hop. We’ll cover this in detail in the sections below. In the meantime, let’s review how it works ..
How to Run Traceroute
To run traceroute, open the command prompt and type the command name followed by the destination host. Below are the traceroute results to www.google.com that reports all round trip times to intermediate hops:
$ traceroute www.google.com traceroute to www.google.com (172.217.7.132), 64 hops max, 52 byte packets 1 my.meraki.net (10.1.36.1) 10.140 ms 2.565 ms 3.272 ms 2 164.52.244.85 (164.52.244.85) 5.580 ms 4.006 ms 3.104 ms 3 64.58.254.226 (64.58.254.226) 4.069 ms 2.501 ms 5.308 ms 4 * * * 5 * * * 6 google-level3-60g.washingtondc.level3.net (4.68.71.186) 85.500 ms 9.336 ms 8.873 ms 7 108.170.246.1 (108.170.246.1) 10.156 ms 10.853 ms 13.887 ms 8 216.239.54.205 (216.239.54.205) 8.865 ms 9.400 ms 9.387 ms 9 iad30s08-in-f132.1e100.net (172.217.7.132) 9.145 ms 9.527 ms 12.434 ms $
For each result, the output reports:
- the hop number,
- the fully qualified domain name if available,
- the router’s IP address in parenthesis, and
- three RTT measurements.
By default, probes are sent using ICMP on Windows and UDP on Linux and Mac OS X. Both operating systems also have the option to change the transport protocols, such as TCP and GRE (on Mac OS X).
How does traceroute work?
Traceroute uses the Internet Control Message Protocol’s (ICMP) Time Exceeded message to discover each intermediate hop to the final destination. The command works by manipulating the Time To Live (TTL) field of an IP packet.
The command completes when it reaches the final destination or the maximum of hops it was configured with. This number is generally set to 30. When a command reaches the maximum number of hops, it means it was unable to identify some of the intermediate hops, including the final one.
Time Exceeded and Time To Live (TTL)
An IP packet has a field called Time To Live (TTL) that routers use to limit a packet’s lifespan. All routers inspect this field so that packets won’t circulate indefinitely. The TTL’s maximum value is 255. Typically, most TCP/IP implementations set this field to 64.
TTL works the following way : when a router or host receives a packet, it decrements the TTL value by one. When a router receives a packet with TTL equal to 1, its time has exceeded. The router or host will discard the packet and send an ICMP error message Time Exceeded (Code 11) to the source. This mechanism prevents routing loops to cause broadcast storms, like in the case of layer two switching.
Example of a Time Exceeded packet notification as captured with tcpdump:
IP my.meraki.net > 10.1.36.5: ICMP time exceeded in-transit, length 60
Example of a traceroute run
Let’s see what happens when you run a traceroute command. To discover the first hop, the command sends a UDP packet with a TTL equal to one. The first router to receive the packet inspects the TTL, reduces by one, and sends a Time Exceeded back to the source. To discover the second hop, the utility sends a new UDP packet with the TTL set to two, and so on. Hop by hop, the command builds the list of routing hops to destination. The command terminates when it either reaches the destination host, or it reaches the maximum number of hops set. By default, the maximum number of hops is set to 30. This value can be changed via the command line.
Packet capture from a traceroute command
The following packet capture displays the first traceroute UDP packet sent by a client with the TTL field set to 1:
The following screenshot displays the first set of traceroute hops discovered by the client thanks to the “Time to live exceeded in transit” ICMP message.
MTR
MTR, which stands for Matt’s traceroute, is an utility that reports, along the usual information, also the packet loss detected at each hop. This utility works by sending continuous packets against each hop to determine packet loss and identify performance issues caused by that.
Traceroute Limits
Traceroute has known limits that, in some cases, impact its ability to draw an accurate picture of the network. There are two main limitations that a network engineer should be aware of when using this command.
Unresponsive hops
When no response is received from a router, it will display an asterisk instead of a router’s IP address or FQDN (see hops 4 and 5 of the output in the example above). This can happen for different reasons: firewalls blocking ICMP probes, virtual routers not equipped with an ICMP stack to process trace probes, etc.
In the case where a firewall is blocking the probe packets, you can change the destination UDP port or test different transport protocols (ICMP, TCP, or UDP). Some firewalls may block all traffic, so there’s very little that you can do in this case.
Equal Cost Multi Path (ECMP) networks
Networks like the internet are highly redundant networks. As a result, routers implement load balancing so they can use more than one route to reach a destination.
Consider the case of a user running a traceroute to a destination that can be reached via two redundant, upstream links. Router L that is processing the probe packets sent by the source may load balance them across the two upstream links. As a result, the command output would report an incorrect sequence of hops. In the following picture you can see an example of an incorrect traceroute.
A research team at the Sorbonne University in Paris noticed that for most routers ICMP Time Exceeded packets generated by traceroute don’t look as if they belong to the same flow. They discovered that routers don’t just use the packets’ five-tuples (source and destination IP, source and destination TCP/UDP port, and protocol) to group them in flows and do load balancing. Through experiments, they discovered that routers also use the TOS, the ICMP code and the ICMP header checksum fields. As a result, traceroute can report an incorrect network topology, reducing its efficacy in troubleshooting network performance issues.
Paris Traceroute
Paris traceroute is a utility that overcomes the ECMP detection issue that affects regular traceroute commands. This utility works by manipulating the header information of the probe packets in order to identify the multiple paths available. To overcome this limit, this variation crafts UDP packets so that the return ICMP Time Exceeded messages appear to belong to the same flow, thus avoiding the issue described. This is done by manipulating the UDP checksum of the traceroute probe packets crafted.
Traceroute and Path Analysis in NetBeez
NetBeez is a real-time network performance monitoring solution that supports both versions of traceroute. The traditional implementation is called traceroute, while the paris variance is called path analysis. To implement path analysis in NetBeez we adopted the dublin-traceroute command, which is a derivative of paris-traceroute. We picked this variant also because it reports the presence of NAT devices along the path, broken NATs, and MPLS labels if available.
The adoption of dublin-traceroute enables NetBeez users to be more accurate when discovering the real topology of ECMP networks. Path analysis simplifies the troubleshooting of Internet performance issues that remote users are experiencing.
NetBeez traceroute
In NetBeez, the regular command support the most common traceroute options, including selecting TCP, UDP, or ICMP as transport protocol to make it easier to circumvent firewall rules. The data reported from the traceroute output includes network latency, IP, FQDN and MTU per hop (when using UDP or ICMP as transport protocol).
Here’s a quick screenshot of a traceroute output in NetBeez:
The following screenshot displays all the hops and paths discovered by a NetBeez agent tracing the route to netbeez.net. Discovered routers are marked in blue, orange for moderate latency (> 100 ms), and red for high latency (> 150 ms). If a router is unresponsive it will be marked in black.
Conclusion on traceroute
Traceroute is, along with ping, one of the most important network tools engineers use every for network diagnostics. With traceroute, network engineers can identify high response times at a particular hop, or routing changes.