Which tools for monitoring network availability and performance?

so-many-tools-so-little-time.001Everybody understands the importance of having network monitoring tools in place. However, which tools should you consider to support your network infrastructure and, most importantly, your end-users?

There are many tools on the market, probably hundreds of them, and choosing the right one is not very easy. If you actually want to see a list of them, I would recommend going to the page maintained by the Stanford Linear Acceleration Center.

Before adventuring into the jungle of network monitoring tools, you might want to consider these three main categories:

SNMP based network management tools

Most of these tools use the standard based SNMP protocol to interact with the network hardware and poll real-time status and usage of their resources, like processor utilization, memory consumption, and bytes transmitted and received on their interfaces. When a network node is unreachable or its resources are overloaded or not available, the network monitoring tool generates an alert to notify the administrator of the problem.

In this class of tools there are many options available, and we can find open source solutions like Nagios and Zabbix as well as commercial ones like HP Network Node Manager (OpenView), SolarWinds, and IBM Tivoli.

Flow based performance-monitoring tools

These tools can process and capture real user data (also called traffic flow) so you can obtain aggregate statistics about the protocols and users consuming a link capacity (top talkers) or inspect a specific sequence of packets to pinpoint performance issues between a client and a server. Traffic flows are captured by an inline device (network tap), a software agent, or network element that is switching the user traffic. The captured flow is then sent to a central collector for storage and processing. You can configure a mirror port on a switch to copy traffic flows for further analysis by a flow collector. Routers can also run NetFlow, SFlow and other type of protocols that generate statistics about user traffic.

In this class, you can find source tools like NTOP and WireShark and commercial solutions like Riverbed and NetScout.

 

Active monitoring

This type of monitoring is accomplished by injecting real packets into the network to measure end-to-end reachability, round-trip-time, packet loss, bandwidth, link utilization and other network proprieties. Active monitoring is also used to test applications from the user perspective by executing real transactions and then measuring their performance like execution and response time. This technique enables you to test the end result of network and applications, without having to monitor individual components and then inferring their availability and performance. The feedback and detection of outages and performance degradation issues is much faster and more reliable.

In this class you can find open source tools like SmokePing and Iperf as well as commercial solutions like NetBeez and AppNeta.

In the following table is a list of pros and cons that in my opinion you should consider for network monitoring:

Class Pros Cons
SNMP
  • Detect hardware failures and overload of system resources
  • Provide bytes in/out network interfaces
  • Always on monitoring (24×7)
  • Fairly easy to setup
  • Lack of end-to-end visibility
  • Not appropriate for troubleshooting network performance issues
  • Hardly circumvent network complexity and virtualization
  • Cannot detect software configuration errors (routing policies, ACL, …) that affect user traffic
TRAFFIC FLOW
  • Accounting and statistics about traffic flows
  • Protocols breakdown across network links
  • Identify top talkers
  • Deep packet inspection analysis
  • High disk space consumption
  • Limited historical data
  • In-line devices (taps) introduce another point of failure
  • Taps are generally expensive so cannot be installed everywhere
  • Mirror ports consume system resources and cannot capture all the flows traversing the node
  • Require expertise and training
  • Reactive troubleshooting
ACTIVE
  • Detect performance degradation and trends
  • Always on monitoring (24/7)
  • Can hold large amount of historical data
  • Does not require real user traffic to generate KPIs
  • Test network infrastructure in the pre-deployment phase
  • Validate configuration changes
  • In performing real transactions, these tools consumes network and/or application resources
  • To be successfully implemented, several hardware or software agents must be deployed in the network

 

It is clear that each class of tools fills a gap: To successfully support today’s enterprise networks, a tool from each category is needed. The network engineer will have to make sure that the integration between different tools works out.

I also observed that, to date, many companies only have solutions from the first two classes (SNMP and Traffic Flow) and don’t have an active monitoring solution in place. This is really unfortunate because active testing should be a requirement in today’s monitoring capabilities to successfully monitor an enterprise network and reduce to the minimum network downtime.

If you are interested in learning more about this topic, I recommend previous blog posts, “Troubleshooting remote application issues” and “Distributed Network Monitoring“.

I would like to hear your opinion on the different classes of network monitoring tools that I just talked about, so please feel free to comment.