Skip to main content

Web App outage during maintenance window - File Server is down. What should I do?

If you want to host your web application or a REST API inside Microsoft Azure then App Services should be on your shortlist. The SLA of the service is 99.95% availability, but we need to be careful on how to handle maintenance windows. 

PaaS and Maintenance Window
Like any other PaaS service, there are specific maintenance windows that are used by Microsoft to upgrade the software and the hardware that is behind the Azure Services. When the availability of your system is important, you need to have the application deployed in two different App Services Plans (ideally from different Azure Regions).
You might ask yourself why this is important? You might want to be protected for situations when things don't go as planned. For example, there are isolated situations when during maintenance windows things don't go as planned and your web application is unavailable until you restart it.

What happens when File Server of App Service goes down?
Let's analyze the following case and see what are the options that we have. The File Server that connects the App Service Worker to the Blob storage where the files are hosted crash. This might happen when:
  • An update to the File Server during a maintenance window does not end with success and the crash is not detected by the health monitoring system
  • The file access latency increase and the File Server is replaced with another one. During the replacement procedure, something happens and the switch is not done with success
In these situations, the end customer will not able to access the application that is hosted by App Service. In the monitoring dashboard of the App Service, there are no errors, you will just see that there is no traffic. No logs, no data, no error - nothing.
You might ask why this is happening, why I cannot see anything? It is kind of normal because the application package was not loaded from the blob storage. There is no application running anymore, no files of the system or of your application. The application cannot be loaded fully and crash during the initialization phase. 


Possible solutions
There are two action points that you can do to mitigate this situation.
1. Local Cache
The first one is to activate the local cache of the application. When you have the local cache active, a local copy of the web application content is copied on the worker role itself. In the case of a failure of the storage, you would have the content cached locally. 
There are multiple benefits of having the local cache active like:
  • Fewer application restart because of shared storage changes
  • Low latency of storage access because you don't need to access the Azure Storage anymore
  • During a maintenance window, the disruptions caused by an upgrade of the storage layer (e.g. File Server) are reduced drastically 
The default size of the cache is 1000MB. I would recommend to start with this value and only if necessary to increase it. A higher value of the cache size can increase the loading time of the application. You could reduce the cache size to improve the loading time. 
The local cache can be activated if you:
  • Per web application by adding to App Settings the WEBSITE_LOCAL_CACHE_OPTION setting with value 'Always'
  • Per App Service by adding WEBSITE_LOCAL_CACHE_OPTION to the properties list of the ARM template of the App Service. 

2. Enable stdout logs
To be able to identify why your application is not starting, because of some error that might occur during the process start, you should ensure that the stdout logs are logged. Using Kudu or other similar tools you could visualize the output.
You should ensure that these logs are written on the Worker as file not pushed to an endpoint (e.g. Application Insights) because the application might not be able to load the required packages to do an HTTP request.
      <handlers>
        <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModule" resourceType="Unspecified" />
      </handlers>
      <aspNetCore processPath="dotnet" arguments=".\"<project name="">.dll" stdoutLogEnabled="true" stdoutLogFile="\\?\%home%\LogFiles\stdout" />  
As you can see above, you can had a new handler in the system.webServer that enables the stdout for all assemblies and redirect the output to a local file that can accessed using FTP or Kudu. The logs would contain the errors related to file not found or loading issues. 

Remarks
You might ask yourself why not to define an Azure Functions that check if the application is alive every 5 minutes and force a restart of the web application. NO, you shall not do such a thing because the situation described above is not common. Even if I was unlucky to have the same issue twice in a 3 months time interval, this is an isolated exception.

Conclusion
When you use a cloud service and a specific technology stack (e.g. .NET, Java), you should all the time check what are the best practices and recommendations. The issue related to the failure of File Server that is part of an App Service during an update window could be avoided if we would activate the local cache of the website and ensure that stdout logs are persisted. 

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see