Anomaly Detection in NetBeez

By March 30, 2022NetBeez

Anomaly Detection

Anomaly detection is one of the primary functionalities of a network monitoring solution. NetBeez implements a quick and proactive detection of network performance issues that is possible thanks to the real-time testing performed by the agents in conjunction with the alert detection process running on the BeezKeeper server in real-time. The NetBeez anomaly detection system is composed of three main components: alerts, incidents, and notifications.

Alerts

The alerts are the building blocks of the NetBeez anomaly detection system and are triggered by real-time tests (e.g. ping, DNS, HTTP, …) based on certain user defined rules, which are named alert detectors. An alert is opened when a test metric meets a condition defined in an alert detector, and is closed when the condition is not true. Alert detectors are assigned to network monitoring targets. NetBeez supports three main alert detector profiles, which will be covered in the next section. 

Incidents

Incidents are generated when a certain percentage of tests within an agent, a target, or a Wi-Fi network trigger alerts. They are a good way to alert a NOC or help desk team about blackouts or brownouts affecting one network location or remote user (agent incident), an application or service (target incident), or a corporate Wi-Fi SSID (Wi-Fi incident). Incidents can be acknowledged, and users can post comments to include more information about the undergoing performance issue, or explain the reason why a specific incident was acknowledged or de-acknowledged. The integration with PagerDuty (and soon with ServiceNow), will enable users to have NetBeez open and close tickets based on the status of the resource.

Notifications

Notifications are delivered via standard protocols such as SMTP, SNMP traps, or syslog messages; NetBeez also includes third party integrations, such as Splunk, Slack, or PagerDuty; notifications are sent when an alert is triggered or an incident is raised. Since the goal of incidents is to aggregate multiple alerts, they are a good way to reduce the noise. For this reason, it’s recommended to enable notifications for incidents and not for alerts.

The chart below illustrates how the three items tie together and what are their dependencies.

Alert Profiles

Alert profiles are assigned to targets to detect problems such as loss of connectivity or performance degradation to a remote service or application. Alert profiles are test-specific, that is, are related to a specific type of tests (ping, DNS, HTTP, or traceroute). There are three main types of alert profiles that can be assigned to tests: up-down, performance baseline, and performance watermark.

Up-Down Alerts

An up-down alert is triggered by a real-time test when it fails for a given number of consecutive tries. By default, the up-down multiplier is set to five. This value can be adjusted by the user at any time. Up-down alerts are useful to detect loss of reachability to a remote host, network, or application. NetBeez also supports the reverse of an up-down alert, and it’s called down-up. Down-up alerts are triggered when a test succeeds. Down-up alerts are used to enforce security policies, such as verifying content filtering (e.g. users can’t access certain websites) or firewall rules (e.g. an isolated network can’t access the Internet).

Performance Baseline

This type of alert profile is suited when a target is applied to many agents that have different performance results against the same application, due to their geographical location or other factors. A baseline alert is triggered by a real-time test when it detects a performance degradation issue. Performance degradation is detected by comparing the short-term moving average to its long-term, which is considered the performance baseline. In fact, for each test the server calculates the following moving averages:

  1. Short-term: 1 minute, 15 minutes, 1 hour, 4 hours.
  2. Long-term: 1 day, 1 week, 1 month.

If the short-term average of a test is a certain number of times higher than its long-term average, an alert is triggered. We consider this type of alert adaptive to the local performance.

Performance Watermark

Watermark alerts are triggered when a real-time test doesn’t meet specific performance requirements and are configured by comparing a short-term average against a user-defined threshold (eg. packet loss is higher than 5% or DNS resolution time is higher than 100 ms). These alerts are used to enforce service level agreements with network services and applications. 

For instance, if you wish to monitor the quality of zoom calls for a remote user, the recommended settings by the Zoom customer success team are the following:

| Metrics     | Ideal Threshold | Notes                                                |

|————-|—————–|——————————————————|

| Jitter      | < 150ms         | Variation in time between packets arriving           |

| Latency     | < 300ms         | Delay between packets being sent/received            |

| Packet Loss | < 20%           | Number of packets failing to reach final destination |

| CPU Usage   | < 90%           | Send and Receive rate you experience during the call |

Conditions like these can be enforced with a watermark alert.

Conclusion

Since anomaly detection is a key function of NetBeez, it’s very important to be familiar with the concepts described here. If you wish to learn more about NetBeez, schedule a demo with us.