Is the application down or not available? Is the root cause of the problem the network or the backend systems?
I am sure that everybody understands the difference between uptime and availability, however I still see these terms used incorrectly as synonyms. Think about when you are troubleshooting a remote application issue: you have to determine if the application is down or if is the network that is rendering the service in accessible to remote users.
For the sake of clarity, let’s briefly review these two metrics.
Uptime refers to the amount of time that a server, cloud service, or other machine has been powered on and working properly. This metric is expressed in years, days, months, minutes, and seconds. For example, all Unix computers and network equipment implement the uptime command, which has the following output:
user@unix# uptime 10:28:24 up 16 days, 1:24, 1 user, load average: 0.16, 0.03, 0.01
switch# show version | include uptime switch uptime is 2 weeks, 2 days, 2 hours, 30 minutes
Availability is the percentage of time, in a specific time interval, during which a server, cloud service, or other machine can be used for the purpose that it was originally designed and built for. The formula most commonly used to calculate uptime is the following:
Availability (%) = Uptime/Total Time
Total Time = Downtime + Uptime
With this formula we can derive the maximum amount of downtime that a service can suffer in order to meet its Service Level Agreements (SLA):
|Availability||Downtime per year||Downtime per month|
|99.999%||5.26 minutes||25.9 seconds|
|99.995%||26.28 minutes||2.16 minutes|
|99.99%||52.56 minutes||4.32 minutes|
|99.95%||4.38 hours||21.56 minutes|
|99.9%||8.76 hours||43.8 minutes|
Ideally, most enterprises (and cloud service providers) are aiming to achieve five nines (99.999%). In reality few of them meet this goal. The important thing to keep in mind is that, in order to generate five nines reports of network availability, it is necessary to use a tool that provides one-second accuracy. NetBeez can monitor a specific resource down to one-second interval with PING. The central server also generates availability reports for each agent.
While the formula to calculate network availability is fairly simple it can be tricky to determine what should be included in the calculation.
Let’s take the case of a medium-size enterprise that has several remote offices with their own Internet connection. Each office connects back to the headquarters via a VPN tunnel. Users at each remote office need to access a mix of internal applications across the VPN tunnel and external services via the local Internet connection.
What is the network availability value if the VPN tunnel is down, but not the Internet connection? How does this value change if the Internet connection goes down?
We could calculate network availability based on the uptime of the Internet connection, because if the Internet goes down, it will affect both internal and external applications. However, this is not realistic because, on the other side, a failure of the VPN tunnel would still impact the access to internal applications but not external applications (partial outage). In this case, network availability can be calculated as the weighted average between the availability of the Internet connection and the availability of the VPN tunnel:
Network availability = Weightexternal_apps x Network AvailabilityInternet + WeightInternal_apps x Network AvailabilityVPN
The two weights in the formula should be based on the percentage of work that is dependent to that specific resource. These values should be set and discussed with the business architect.
For example, let’s assume that 80% of the business applications used by the users are external (Internet) and that the remaining 20% are internal (VPN). If in one-year time interval, the Internet connection never failed, while the VPN tunnel was unavailable for 1 day, then the value of the overall network availability for that location is:
Network Availability = 80% * 100% + 20% * (99.726%) = 99.945%
I hope not to have caused too much headache with this article. I would really like to hear back from you about how you measure and monitor your network availability.