Is the application down or not available? Is the root cause of the problem the network or the backend systems?
I am sure that everybody understands the difference between uptime and availability. Yet, I still see these terms used incorrectly as synonyms. Think about when you are troubleshooting a remote application issue. You have to determine if it’s a connectivity error (availability) or an application issue, like server down (uptime). The two are similar concepts, but they don’t mean exactly the same thing. For the sake of clarity, let’s briefly review these two metrics.
Uptime refers to the amount of time that a server, cloud service, or other machine has been powered on and working properly. This metric is expressed in years, days, months, minutes, and seconds. For example, all Unix computers and network equipment implement the uptime command, which has the following output:
user@unix# uptime 10:28:24 up 16 days, 1:24, 1 user, load average: 0.16, 0.03, 0.01
switch# show version | include uptime switch uptime is 2 weeks, 2 days, 2 hours, 30 minutes
Availability is the percentage of time, in a specific time interval, during which a server, cloud service, or other machine can be used for the purpose that it was originally designed and built for. The formula most commonly used to calculate uptime is the following:
Availability (%) = Uptime/Total Time
Total Time = Downtime + Uptime
With this formula we can derive the maximum amount of downtime that a service can suffer in order to meet its Service Level Agreements (SLA):
|Downtime per year
|Downtime per month
Ideally, most enterprises (and cloud service providers) are aiming to achieve five nines (99.999%). In reality few of them meet this goal. The important thing to keep in mind is that, in order to generate five nines reports of network availability, it is necessary to use a tool that provides one-second accuracy. NetBeez can monitor a specific resource down to one-second interval with PING. The central server also generates availability reports for each agent.
Network availability calculation
The formula to calculate network availability is fairly simple. However, it becomes tricky to determine what should be included in the calculation.
Let’s take the case of a medium-size enterprise that has several remote offices with their own Internet connection. Each office connects back to the headquarters via a VPN tunnel. Users at each remote office need to access a mix of internal applications across the VPN tunnel. They also need to access external services via the local Internet connection.
What is the network availability value if the VPN tunnel is down, but not the Internet connection? How does this value change if the Internet connection goes down?
We could calculate network availability based on the uptime of the Internet connection. In fact, if the Internet goes down, it will affect both internal and external applications. However, this is not realistic and precise. For example, a failure of the VPN tunnel would still impact the access to internal applications. On the other end, it won’t impact external applications. In this case, network availability can be calculated as the weighted average.
Weighted Average Network Availability
In this calculation we will factor each key component that impacts the resulting network availability. In the previous example, we will factor the availability of the Internet connection and of the VPN tunnel:
Network availability = Weightexternal_apps x Network AvailabilityInternet + WeightInternal_apps x Network AvailabilityVPN
The two weights in the formula should be based on the percentage of work that is dependent to that specific resource. These values should be set and discussed with the business architect.
For example, let’s assume the following:
- 80% of the business applications used by the users are external (reachable via the Internet)
- 20% of the remaining business applications are hosted internally (reachable via VPN)
If in one-year time interval, the Internet connection never failed, while the VPN tunnel was unavailable for 1 day, then the value of the overall network availability for that location is:
Network Availability = 80% * 100% + 20% * (99.726%) = 99.945%
I hope not to have caused too much headache with this article. I would really like to hear back from you about how you measure and monitor your network availability.