NFD28: Troubleshooting WFH Work Issues with NetBeez

By May 18, 2022NetBeez

Two weeks ago NetBeez presented at Networking Field Day 28 with a number of other companies. Each company presented how they solve different networking problems such as automation, monitoring, security, and NetOps. At this event we focused on our user experience solution around monitoring and troubleshooting in the hybrid work reality.

More specifically, my presentation addressed the most common issues employees face when they work from home and how NetBeez enables support desk. Here is a summary of my 25 minute talk.

Agent 

NetBeez captures and monitors the application and network experience of every user by installing a lightweight-agent on their Windows or macOS system. The agent collects system information such as signal strength, ISP name, and runs synthetic testing to capture latency, bandwidth, page load time, and other metrics.

PC performance

Problem: when the CPU or memory is maxed out, it can affect video conferencing quality but also webpage loading times, especially when they are heavy on Javascript, like any modern web application. 

Solution: NetBeez collects CPU, memory and Disk utilization from the user’s system and the help desk can view the real-time or historical data on our dashboard. That can show very easily if the CPU is reaching capacity and overlay that with network performance data such as latency.

Home WiFi Network

Problem: WiFi is one of the most common route causes of degraded network performance. The help desk often recommends users to plug into their routers, but when that’s not possible they are flying blind in terms of the end user’s WiFi connection. 

Solution: The WiFi related data points NetBeez agents collect are signal strength, link quality, MCS, noise, bitrate, and connection/disconnection events. These help identify if a user suffers from poor WiFi. For example, a link quality lower than 80% or MCS lower than 5 means is a strong indicator that any poor user experience is due to poor WiFi. This can be fixed by advising the user to plug into their router, move close to the router, and (let’s not forget the first line of defense in home WiFi troubleshooting) reboot their router.

Gateway Performance

Problem: For home users, the WiFi, router, and gateway are the same device, and by definition they are consumer grade. That means they don’t get updated with bugs and firmware fixes and the help desk doesn’t have visibility or control into them. 

Solution: A simple ping test to the laptop’s gateway is a health indicator. A health average latency is less than 5 ms latency for wired connections, and less than 10 for wireless. Packet loss should be close to zero. Each NetBeez agent can ping its own gateway and report the latency and packet loss on the dashboard.

Internet Service Provider Speed

Problem: Asking a user to run an Internet speedtest to a public server is one of ways often used to prove that it’s not the network. The problem is that if the user runs a manual speedtest only when you ask them to, that bandwidth figure represents the capacity of his network only at that particular time. You don’t know what’s his historical baseline and what to expect.

Solution: NetBeez can instrument the scheduling of bandwidth testing on specific times and keep historical information. This helps identify users that have historically poor bandwidth metrics and don’t meet SLAs required for a good overall user experience.

Internet Outages

Problem: The Internet has become every remote worker’s backbone network and it’s essentially a black box in terms of routing and performance. A simple traceroute can offer some visibility, but also cannot be done at scale from each users system periodically.

Solution: Apart from traceroute, NetBeez uses path analysis which is a more advanced Internet routing testing and monitoring. It gives detailed routing formation between the user and and an end host, and visualizes nodes that introduce abnormally high latency. Here is how a path analysis test looks like from a macOS system to google.com.

Application Issues

Problem: At the end of the day, if an application server itself is overloaded and has increased response times, the end user will experience the slowness, even if all network connectivity is up to par all the way to the application server. Monitoring the application performance is the last link in the long chain between the user and their mouse.

Solution: Here is where the NetBeez network agents come into play: install a virtual agent as close to the application server (in the same datacenter or cloud environment) and monitor the application response time. This gives a baseline that you can compare with application response time measured by the NetBeez agent running on the user’s system. If the virtual agent shows slow response, then you know that the root cause is the server and troubleshooting can be assigned to the application team.

Conclusions

Collecting all these data points through the NetBeez agents allows even unskilled help desk personnel to quickly identify and troubleshoot the simple, but also the most common, issues that end users face. Here is a sample flow chart that some of our customers provide to their Level One Help Desk in parallel with NetBeez:

The biggest impact in reducing ticket escalation to layer two or three engineers can come from simple guides to the first line of defense. Our customers have reported up to 90% reduction in ticket escalation.