Validating Configuration Changes
It’s not surprising that a great portion of the activities of a network engineer relate to configuration changes. Whether it’s the upgrade of the datacenter infrastructure, the deployment of a new VRF or QoS policy, network engineers are constantly planning the next configuration change that will help move forward the strategic plan and business objectives of their company.
Configuration changes are exciting but, at the same time, nerve-racking. If not planned well, they could lead to disastrous consequences for the enterprise network, with a domino effect on the networks users, business services, and potentially, the network engineer’s job. I am sure everybody in the field has heard horror stories about configuration changes with a bad outcome (who’s applied the deny all ACL to the control plane of all the routers in the network?). According to a Gartner’s analysis, “through 2015, 80% percent of network outages impacting mission-critical services will be caused by people”. It is clear that human error has a big impact on a company’s uptime mission.
So how can we avoid a configuration change having a disastrous impact on the many stakeholders inside and outside a company? Let’s review the life cycle of a configuration change. There are three major phases around a configuration change:
As a network engineer, I have always followed common sense for each of these phases. With this post, I want to summarize my best practices on network configuration changes. Please be aware that the content on this article should not replace or substitute your company’s change management processes and/or policies.
Planning Network Configuration Changes
In the planning phase, the network engineer will create all the necessary documentation, including procedures and scripts, needed to execute the change:
- Scripts and procedures – Create the procedure and scripts that will be thoroughly followed the day of the configuration change.
- Back-out scripts and procedures – Create the back-out procedure and scripts to execute in case something goes wrong and the change needs to be reverted.
- Checkpoints – Include checkpoints in your scripts and procedures to make sure that the change has the expected outcome. Checkpoints are present in the middle of the procedure, not to be confused with the final validation of the configuration change.
- Validation – The procedure should also include all the conditions that have to pass to declare the change successful and without any unexpected outcome. The last thing that you want is to cause a network outage that is discovered by the users (employees and customers) and not by you. If you have the proper validation tools and procedures in place, you can make sure that even if you have caused an outage during your configuration change, you can detect it and repair it before it has an impact on business operations.
- Group changes – If the configuration change is executed by two or more engineers, make sure to review the final plan before hand; also have the configuration scripts reviewed by many engineers and tested in the lab if possible.
- Impact on applications – If the configuration change will have an impact on specific applications or resources, make sure to work with the application group within the organization that is responsible for the application. If they don’t need to be present during the change, make sure at least to notify them and to get their on-call support number in case it’s needed.
- Snapshots – Take a snapshot of the elements and resources of your network infrastructure that will be impacted by the change. You may also find out that you have to apply fixes, and execute other configuration changes, before moving forward with the configuration change. The day of the change, you will have to make sure that the network is in the same state it was when you were planning your change.
- Timeframe of the change – This will be based on different factors, such as risk and impact level of the configuration change. The time must also be coordinated with the users and business departments that may be impacted. Make sure that the change window is large enough for you to execute the scripts as well as perform the appropriate checks and verification and potentially back-out procedure. The last thing you want to do, is rushing during the execution phase, increasing the chances of failure.
- Change approval – Once all the necessary information is collected and submitted to your organization’s change control system, you will have to wait for approval and eventually follow-up on some questions the change control review team may have.
Planning is the most important phase in the life cycle of a network configuration change. Many changes go bad because the network engineer did not document well or at all the steps that have to be taken during the execution phase of the configuration change. On the other hand, proper documentation and complete configuration change scripts will allow the engineer to feel comfortable and prepared enough for the task.
I would like to hear from you what is your approach to network configuration changes, so please feel free to comment.
Want to learn more about NetBeez? Request a demo! Gartner RAS Core Research Note G00208328, Ronni J. Colville, George Spafford, 27 October 2010, RA6 05012011