In a previous post on how to monitor the work from home experience, we’ve talked about all the different digital end-user experience (DEX) metrics that NetBeez collects related to voice quality (latency, jitter, packet loss, MOS), endpoint performance (CPU, memory, and disk utilization), and Wi-Fi performance (signal strength, MCS, noise). In this post I want to illustrate a specific case that shows how a memory utilization issue evolved over several hours eventually causing a Windows laptop to become unresponsive with reboot being the only fix. Let’s see how it’s easy troubleshooting a memory issue with NetBeez.
The Memory Issue
If a laptop’s CPU or memory utilization approach 100% then all applications and services running on it will suffer. This might be experienced by the user as “slowness” when browsing the Internet or using web applications or as jittery and choppy video conferencing quality.
It has now become the norm for the network or the Wi-Fi to be the first to blame. However, that’s not always the case, and troubleshooting these kinds of problems can be challenging especially for unskilled users. When they are not able to figure out what’s going on, they open a ticket with their help desk, and then the burden is on the IT team to figure out what’s going on.
Although the help desk has the skills and it’s their job to resolve these kinds of issues day-in day-out, their main bottleneck is how to extract DEX data from the user’s device. That becomes even more challenging in the new hybrid work reality with users at different geographical locations and time zones.
The NetBeez agent runs in the background on each user’s laptop and solves this problem by collecting, among others, CPU and memory utilization, Wi-Fi SSID and signal strength, VPN status, ISP name and speed. In the following screenshot you can see how this information is all reported in one view on a NetBeez dashboard (under each endpoint’s status page).
The Troubleshooting Data
In this example incident, John Doe started work around 8:00 AM on Monday. Here is the sequence of events that were noted while troubleshooting the memory issue:
- 8:00 AM: First Zoom call of the day with acceptable quality
- 9:00-11:00 AM: Laptop was “slow” at loading pages
- 11:00 AM: Second Zoom call that was disconnected and laptop froze
- 11:30 AM: Called help desk and opened a ticket
- 11:35 AM: Help desk reviewed the data from John’s laptop and identified 100% RAM utilization
- 11:45 AM: Help desk advised John to reboot his laptop to resolve issues
- 11:50 AM: Laptop rebooted
Let’s see the help desk’s view on the NetBeez dashboard on a latency plot to a Zoom server, overlapped with RAM utilization:
As you can see, at 8:00 AM the memory utilization was at 80% and kept crippling up until 11:30 AM when it reached 100% and John’s laptop froze. It took the help desk just 5 minutes to check John’s data on the NetBeez dashboard to identify that all these were due to high memory utilization.
Here is another example of a similar poor DEX experience from a user’s laptop, but this time with high CPU utilization:
Before the user starts his day the CPU and latency are within acceptable limits. As soon as he logs in, both CPU and latency increase to unacceptable levels, and then drop down around 6:30 PM when the user is done with his day.
The Solution by the Help Desk
The following day the help desk kept an eye on Joe’s memory utilization and although it was staying around 80%, it looked like it had an upward trend which means that Joe might face the same issue in the following days:
The reality is that Joe’s laptop is a 6-year old 2 core/8 GB Windows 11 device that is nearing end-of-life. Ultimately, the help desk decided that Joe needed a new laptop since this issue kept recurring.