The Zoom Outage of April 16, 2025: An AI Summary

On April 16, 2025, Zoom experienced a global outage. The outage impacted thousands of users around the world. Users were not able to join meetings or log into the system. The root cause of the issue was discovered to be associated with DNS. The outage originated by the registrar where zoom stores and distributes its DNS records.

I was able to detect and analyze the zoom outage. I have a small deployment of NetBeez that I use to do demos. In this demo instance, Zoom is one of the SaaS applications that I am monitoring. In this article I want to show a bit what data my demo instance collected. I also want to take the opportunity of this outage to demonstrate a new feature that we just released in version 14.5. 

Timing of the Zoom outage

From our observation points, the zoom outage started on April 16 sometime around 2:43 PM EDT. Most probably the outage happened minutes before that. However, due to DNS propagation and caching, it didn’t get detected for a while. Always from our observation, the outage lasted approximately until 4:25 PM EDT. The outage was not a total outage where users can’t access the service. Instead, during two hours or so, some of the Zoom servers were not completely reachable. As a result, a user in California would not be able to connect to a Zoom meeting due to a session timeout. A user in Europe, on the other end, would be able to get through.

The NetBeez agents

In my demo instance I have three cloud agents, some network agents, and some remote worker agents. The cloud agents are located in three different AWS regions. One is in the US West region, one in Europe West, and one in the Asian Southpacific region. The benefit of this setup is that I get a global view of SaaS applications’ status and performance. These agents allow me to monitor any SaaS from multiple global locations.

NetBeez cloud agents

I also have several network agents. They are mostly located in the United States and a few in Europe. Unfortunately I didn’t assign these agents to monitor Zoom. For this report, I can’t use their data to analyze the outage. On the other hand, all three cloud instances detected the issue. This is still a decent number of agents to understand a bit what happened before, during, and after the Zoom outage.

The Zoom target in NetBeez

Let’s see how we are monitoring Zoom in NetBeez. There are two configurations available to monitor Zoom. In one, we can simply create test traffic to the base URL included in meetings’ links. This URL differs based on the organization. This configuration helps understand if the overall infrastructure is available and well performing or not. While this configuration may not catch all problems with Zoom, it gives at least a high overview of the service.

NetBeez zoom target

A more advanced option is monitoring different components. In the following screenshot you can see that we have four different resources that we are testing. These resources were recommended by Zoom.

NetBeez zoom advanced target

I am not going through the details of setting up this target. If you are interested in learning more about it, watch the monitoring zoom and ringcentral webinar we recently hosted. It will provide all details required to accomplish the above configuration.

Detection of Zoom errors

The Zoom monitoring data that the NetBeez cloud agents reported was instrumental. The agent located in AWS’ US West 2 region (Oregon) detected the zoom outage around 2:45 PM ET. 

NetBeez zoom errors

The very first set of alerts came from the DNS tests that the Oregon-based agent was running against Zoom. You can see that initially the DNS alert cleared, as you can also see in the above screenshot. Then, at 3:02 PM EDT, the failure happened again and lasted until the end of the outage, around 4:25 PM EDT

Zoom alerts during outage

The DNS errors other tests such as ping and HTTP. The ping test for instance failed and logged the error message “host not found”. This error is typical when there’s an issue with DNS. The agent that is issuing the ping test performs a DNS lookup to translate a certain URL or FQN, zoom.us in this case. The DNS server returns that message if it doesn’t have a valid entry for the query.

Zoom outage errors

Another agent that detected some problems with Zoom was the one in Asiapacific South 2 region (Hyderabad, India). The extension of the outage was much shorter than the US-West-2 agent.

Zoom outage reachability

No outage was detected by the European agent.

Zoom DNS results

Benefit of synthetic monitoring of Zoom

During the Zoom outage, NetBeez gave me confidence that the problem was not a network issue on my end. In fact, while the outage was developing, I had a meeting coming up with a prospect. There’s nothing worse than not showing up with a prospect. They may stop answering your emails, as they see the no show as a lack of respect for their time. It basically could damage your reputation forever. 

However, thanks to NetBeez, I was able to quickly update the invite with a new videoconferencing system. Without NetBeez, my response would have been much slower. Probably the attendees would have waited a few minutes more and then just given up. Instead, I was still able to have the call, explaining the situation. That actually was a good call as I had a real, live outage, I could use to demonstrate the value of NetBeez.

Using the NetBeez AI-based alert summaries

In the recent version 14.5 of NetBeez, we released an experimental feature called “AI-powered summaries”. What this feature does, it processes a certain number of alerts selected by the user, and provides a summary. The summary is very helpful when you need to analyse a large amount of alerts. 

Network engineers who need to report on a network problem or application outage will save a lot of time. They won’t have to spend time cross referencing tests results and alerts, as I did here in this article. The benefits are clear. Less time figuring out what happened, more time to build networks.

The feature works this way: the user selects a set of alerts that they want to process, then clicks the AI summary button. This button is represented by a brain icon. Few seconds later, a summary is printed for the user’s review. The user can also copy the content of the summary and paste it into any document.

Zoom outage summary

Conclusion: The AI-summary of the Zoom outage

In the following section I want to report the whole summary I generated by using this new feature. It provides a clear explanation, with facts, of the Zoom outage that happened on April 16, 2025.

Summary of Notable Events and Issues

Time Frame: April 16, 2025, 14:43 – 16:27 EDT

1. DNS Resolution Issues

  • Alerts:
    • Multiple alerts for “DNS server returned no results” were detected from 14:43 until 15:07.
      • Alerts for DNS failures were logged at 14:43:57, 14:48:44, 15:02:59, and 15:07:40.
  • Clear Alerts: Notifications cleared at various times through 16:25.
  • Potential Causes: These DNS failures may indicate an issue with the DNS servers being used, potentially due to misconfigurations or downtime.
  • Advice & Solutions:
    • Check the configuration of DNS servers to ensure they are correctly set up and operational.
    • Consider switching to a more reliable public DNS service (like Google DNS or Cloudflare DNS) as a temporary measure.
    • Review the service logs for the DNS server for any performance degradation or failures.

2. Host Resolution Failures via Ping

  • Alerts:
    • Several “host not found” alerts occurred on multiple occasions:
      • 15:04:52, 15:07:22, 15:08:27, among others, with a high frequency of occurrences until 15:10.
    • Notable repeated alerts for hosts us02www3.zoom.us and us02log.zoom.us at 15:10:41 and 15:10:58.
  • Clear Alerts: These alerts were subsequently cleared from 15:41 through 16:27.
  • Potential Causes: These failures likely stem from the ongoing DNS issues, which prevented successful name resolution.
  • Advice & Solutions:
    • Validate the host entries on DNS configurations to ensure they are correctly pointed.
    • Review and mitigate any firewall policies that might restrict access to the necessary hosts.
    • Verify with the upstream ISP for any reported issues that could affect DNS resolutions.

3. DNS and HTTP Resolution Combined Service Issues

  • Alerts:
    • Alerts indicating a failure to resolve web hosts were noted, including “Could not resolve host: us02www3.zoom.us” at 15:10:58 and other occurrences shortly before them.
  • Clear Alerts: Cleared by 16:27.
  • Potential Causes: These host resolution failures might also be linked back to the previous DNS issues being reported.
  • Advice & Solutions:
    • Similar to above, ensure any user agents or proxies also have the correct DNS configurations.
    • Verify connectivity to the specific Zoom services and confirm service status with Zoom.

Conclusion

Overall, the primary issues found during this timeframe relate to DNS failures which led to multiple incidents of inability to connect to the Zoom services, both via ping and HTTP. Following the proposed steps should help mitigate these issues both immediately and prevent reoccurrence in the future. It’s essential to monitor after each change to assure service reliability and user productivity.

decoration image

Get your free trial now

Monitor your network from the user perspective

You can share

Twitter Linkedin Facebook

Let's keep in touch

decoration image