Skip to main content

Microsoft Azure: A 10-point IT maintenance plan

The best way for to prevent serious problems is to watch for early warning signs. Whether you are an Azure newbie or an experienced admin, these 10 items should be on your regular rotation.

It’s been eight years since I first began using Microsoft Azure. At the time, the cloud computing service was complicated to use, and its features made me conclude that Azure was unclear what it wanted to be. Now, with Azure’s mature functionality and its acceptance in enterprise IT—from virtual machines (VMs) to building web apps to data management—companies are using it extensively. It’s time to look at what you need to do to maintain your Azure services.

Azure, both on premises and in the cloud, has become painless in some areas. However, that doesn’t mean you can just turn on an Azure resource and let it do its thing forever. As I show below, you can do a lot using the Azure Service Health module, which is the go-to section for figuring out what’s wrong. But other areas, less easily viewed, are maintenance-heavy.

With that said, let’s look at the top things a developer or operations professional needs to regularly check. If you’re experienced, these items may be an easy checklist that confirms your knowledge. If you’re relatively new to the field and building your Azure expertise, these may serve as guideposts for further explanation.

1. Check the health of your Azure implementation

Azure veterans read on: This first checklist item is for the yet to be initiated. But we always need to start with the “Is it plugged in?” question.

One basic tool any Azure admin needs to look at constantly is Azure Service Health. Find the Service Health module and pin it to your dashboard right away. It instantly shows critical errors or events, such as VMs suddenly rebooting, so you can dig in deeper. Service Health even shows you the resource’s physical location, so you can feel a bit like a Bond villain watching over your global resources.

2. Create health alerts

If you have only a single VM or one web application running in Azure, the health monitor is enough to find out what’s going on. But chances are you’re running dozens of services, VMs, or SQL databases. Set up health alerts focused on specific resources to make troubleshooting easier—and to prevent problems before they become critical.

Fire up your dashboard, go to Service Health, and click on Health alerts, as in Figure 1.

Creating health alerts in Azure

Figure 1. Health alerts 

Limit alerts to key assets, such as a critical VM you need to keep close tabs on. Add an action group, a team, or a person (such as yourself) who should get notified via text or email when something goes wrong.

3. Learn how to use Azure Log Analytics

Azure Log Analytics (ALA) was recently revamped. It’s a key piece of Azure that makes troubleshooting easier. This unifying log heaven pulls together various logs beyond Azure’s own services. It also brings in data from Office 365 or Internet Information Services (IIS) and can summarize all those log files. In my experience, it makes the information much more readable.

Primer: How to build a Microsoft Azure hybrid cloud

Azure Stack users can also combine their local infrastructure into ALA. For example, if you’re using Azure Stack from Hewlett Packard Enterprise, you can integrate HPE’s OneView into ALA and get hardware monitoring right in your Azure dashboard, as Figure 2 demonstrates.

integrate HPE’s OneView into ALA

Figure 2. Integrating HPE’s OneView into Azure Log Analytics 

To set it up, go to the Azure portal, create a resource for log analytics, and pull the data from all your VMs, servers, etc., into one place. Then you can search it using simple queries. (This cheat sheet should help.) Once you’ve mastered the tool, you’ll blaze through your logs looking for performance problems in your VMs or bottlenecks in your web apps—or in anything, really!

Remember: Log files are every admin’s best friend.

4. Check for planned maintenance

Most of Azure’s maintenance happens without you even knowing it. For example, Azure freezes the VM temporarily, updates the underlying system, and resumes within just a few seconds.

However, there are scenarios in which an Azure VM goes down—for instance, when it’s being rebooted or moved to another host. That might happen for, say, BIOS/firmware updates or a switch to Windows Server 2016. Microsoft tries to keep those maintenance updates to once per year, but they can’t be avoided forever.

The good news: All you need to do is regularly check the Service Health dashboard item. Look at when planned maintenance is scheduled and prepare for it (or do it at a convenient time). You have 30 days to react and do it yourself.

5. Set up and control backup

Azure already backs up all your data three times; that happens automatically. But those backups protect you only from server-side problems. It doesn’t protect you, for example, from malware attacks or when you (or your users) accidentally mess things up.

When you create an Azure VM, it immediately offers to enable backup and create a Recovery Services vault: Do it! If you haven’t already enabled this service, go to your existing VM overview and enable backup.

A billion things could go wrong when trying to back up an Azure resource: The connection to the VM couldn’t be established, the VM agent crashed or isn’t responding, another operation is going on…the list goes on. But my advice isn’t just to make sure that the existing backups haven’t failed; it’s also to make sure that new resources are being backed up.

6. Watch your costs

Azure’s cost can get out of control and quickly hit your limit, especially when you’re testing new resources and you forget about them…not saying this ever happened to me *cough*. Don’t be me. Don’t ignore the subscription blade—simply have a look at cost per resources to see the current billing situation. Do this once a month.

7. Use the free Azure Maintenance Tool

The Service Health and Azure dashboard is nice to look at and easy to use, but getting to the nitty-gritty things can be difficult. For example, how do you get all public IP addresses in one view and see who can access your VMs from where?

That’s why I wholeheartedly recommend the aptly named Azure Maintenance Tool. It is a simple PowerShell script that, for example, lets you get a list of all storage accounts, all subscriptions, and all VNets. Yes, these are things you can do in Azure somehow but that sometimes require finding your way through a jungle of menus. The Azure Maintenance Tool (Figure 3) is a clean-cut command-line tool: You type 4 and tell it to give you all VMs and their current state, and it gives you just that. As an admin, I go to all the “Get” lists once a month and see what’s going on in my Azure environment.

Azure maintenance tool

Figure 3: Azure maintenance tool

8. Decommission resources you don’t need

Again, don’t be me: I use only a handful of Azure VMs and services for my day-to-day work. The rest I play around with. For example, before I deploy a new web application, I create a new VM and evaluate the application’s performance and compatibility. However, I tend to forget I ever used these resources and they eat up my Azure plan.

So make sure to decommission any VMs or SQL databases you don’t need by going through the list and asking yourself: Do I need this anymore? Delete it if you don’t.

9. Check the performance of your resources

I do this obsessively and so should you: A live VM should always be tracked manually once in a while. Yes, you can set up alerts, but first you need to understand what situations you might be experiencing and why they came to be. Watch those status charts (Figure 4).

Checking Azure resource performance

Figure 4. Checking Azure resource performance

Especially when you’re in the first days or weeks of deploying Azure VMs, you need to get a feel for why spikes like the one above are occurring before you go into panic mode. As part of your initial maintenance, I recommend you check once per week (or even daily), and check the performance stats and find out why. Better yet, pin it to your dashboard.

10. Check privileges

Make sure that only the users or the groups that should have access to your resources actually have access to it. If one department of your company suddenly no longer needs a specific VM, make sure to delete its accounts or even groups. I frequently go to the settings page of my VM and go through the identity and access control management list and hunt for users who should no longer be able to access it. This limits your attack surface drastically.

Bonus: Build your own Azure maintenance applications

Engineers can make their lives easier by using the Azure REST API and build their own maintenance and service apps using the Azure SDK. A good place to get started is the mile-high view on Azure APIs in this Azure Friday episode and the API references found here.

All images courtesy of the author.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.