In the September 2015 issue of the Internet Protocol Journal, there was a very interesting article about the RIPE Atlas, an Internet performance measurement network project.
The Atlas project has always fascinated me because it’s a great collaborative effort to measure the network performance of the global IP layer with active traffic. Atlas enables the IP community to:
1- Verify and monitor reachability to a specific host or network from multiple source points (probes)
2- Troubleshoot connectivity and other network issues by executing ad hoc tests
3- Test IPv6 reachability, accelerating adoption
4- Verify the public DNS infrastructure
When the IP Journal’s article was written in July 2015, the Atlas network consisted of more than 8,300 probes, covering more than 173 countries, 11% of the IPv6 Autonomous Systems Numbers (ASN), and 6% of all IPv4 ASNs.
(As I am writing this on November 30th, the Atlas Internet measurement network consists of more than 9,000 active monitoring probes performing 9773 measurements like ping, traceroute, DNS, SSL, and NTP.)
Here I will highlight some key design principles described in the IP Journal article that relate to probe design, scalability, and security. I would also like to draw some comparisons to NetBeez, a tool that is conceptually, and by design, very similar to Atlas. The main difference between the two is that Atlas is a collaborative and public project mostly used by Internet Service Providers and NetBeez is a commercial solution primarily designed for enterprises in need of better visibility of their private IP layer.
RIPE Atlas Probe Design: Hardware vs Software
The RIPE Atlas team chose hardware probes because they are “set and forget”, more resilient to operating system upgrades, and less dependent on network changes or upgrades that could reduce their uptime. A hardware probe has minimum overhead for the hosting environment because it is just another device on the network plugged to an access port. It can run 24/7 uninterrupted and provides a predictable environment in terms of CPU and memory resources, increasing the accuracy of the measurements.
At NetBeez, we call probes agents or Beez. While NetBeez supports a variety of agent type (software, virtual, and cloud), the primary choice is for Raspberry Pi hardware agents. The reason for this choice, as Panos mentioned in “Hardware vs Software Agents” post, is very similar to the design choices I just mentioned. NetBeez customers receive preconfigured hardware agents that are “plug and play”, increasing speed of deployment while reducing the management efforts for the user (configuration and maintenance of the agent itself). Everything is taken care of by NetBeez.
RIPE Atlas Scalability
It’s clear that the number of probes supported by Atlas is much higher than the number supported by NetBeez. In fact, Atlas is a system designed to monitor a global network, the Internet, while NetBeez is designed to monitor an individual ASN.
Atlas estimates that, at full deployment, the system should be able to support 100,000 probes. This number was derived by estimating desired probes per ASN (three) multiplied by the total number of ASNs (approximately 35,000). When the article was written, Atlas was managing 8,300 probes and processing 2,700 results per second. This tells me that each probe is generating in average 3 messages per second, a number that is very close to our assumptions. In the case of NetBeez, we see this number being smaller (approximately 3,000 probes for a pretty large ASN). In an on-premises environment, NetBeez runs on a single virtual appliance or, for large deployments, a scale-out model very similar to Atlas’ is recommended.
RIPE Atlas Security
Security and reliability of the network measurement probe is something very important for both projects. Also in this case, both projects have chosen a similar approach. Atlas probes don’t have any TCP/IP port open to reduce the risk of external attacks. Also, the control channel between the probe and the server is encrypted with Transport Layer Security (TLS) with unique keys to decrease the risk of being compromised.
Conclusion on RIPE Atlas
It’s clear that network performance monitoring via active traffic has clear benefits. This is proven by the RIPE Atlas project as well as many other commercial solutions, including NetBeez. What I have personally seen is that more and more enterprises are installing hardware or software agents on their WAN, as well as data centers to increase visibility of the IP layer, decreasing time to detection and repair of network problems.
I would be very interested in continuing the discussion in the comments section and learning more about other network engineers that have either a RIPE Atlas probe deployed and/or active network monitoring solutions for gathering network performance measurements.