Skip to main content

Web App outage during maintenance window - File Server is down. What should I do?

If you want to host your web application or a REST API inside Microsoft Azure then App Services should be on your shortlist. The SLA of the service is 99.95% availability, but we need to be careful on how to handle maintenance windows. 

PaaS and Maintenance Window
Like any other PaaS service, there are specific maintenance windows that are used by Microsoft to upgrade the software and the hardware that is behind the Azure Services. When the availability of your system is important, you need to have the application deployed in two different App Services Plans (ideally from different Azure Regions).
You might ask yourself why this is important? You might want to be protected for situations when things don't go as planned. For example, there are isolated situations when during maintenance windows things don't go as planned and your web application is unavailable until you restart it.

What happens when File Server of App Service goes down?
Let's analyze the following case and see what are the options that we have. The File Server that connects the App Service Worker to the Blob storage where the files are hosted crash. This might happen when:
  • An update to the File Server during a maintenance window does not end with success and the crash is not detected by the health monitoring system
  • The file access latency increase and the File Server is replaced with another one. During the replacement procedure, something happens and the switch is not done with success
In these situations, the end customer will not able to access the application that is hosted by App Service. In the monitoring dashboard of the App Service, there are no errors, you will just see that there is no traffic. No logs, no data, no error - nothing.
You might ask why this is happening, why I cannot see anything? It is kind of normal because the application package was not loaded from the blob storage. There is no application running anymore, no files of the system or of your application. The application cannot be loaded fully and crash during the initialization phase. 


Possible solutions
There are two action points that you can do to mitigate this situation.
1. Local Cache
The first one is to activate the local cache of the application. When you have the local cache active, a local copy of the web application content is copied on the worker role itself. In the case of a failure of the storage, you would have the content cached locally. 
There are multiple benefits of having the local cache active like:
  • Fewer application restart because of shared storage changes
  • Low latency of storage access because you don't need to access the Azure Storage anymore
  • During a maintenance window, the disruptions caused by an upgrade of the storage layer (e.g. File Server) are reduced drastically 
The default size of the cache is 1000MB. I would recommend to start with this value and only if necessary to increase it. A higher value of the cache size can increase the loading time of the application. You could reduce the cache size to improve the loading time. 
The local cache can be activated if you:
  • Per web application by adding to App Settings the WEBSITE_LOCAL_CACHE_OPTION setting with value 'Always'
  • Per App Service by adding WEBSITE_LOCAL_CACHE_OPTION to the properties list of the ARM template of the App Service. 

2. Enable stdout logs
To be able to identify why your application is not starting, because of some error that might occur during the process start, you should ensure that the stdout logs are logged. Using Kudu or other similar tools you could visualize the output.
You should ensure that these logs are written on the Worker as file not pushed to an endpoint (e.g. Application Insights) because the application might not be able to load the required packages to do an HTTP request.
      <handlers>
        <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModule" resourceType="Unspecified" />
      </handlers>
      <aspNetCore processPath="dotnet" arguments=".\"<project name="">.dll" stdoutLogEnabled="true" stdoutLogFile="\\?\%home%\LogFiles\stdout" />  
As you can see above, you can had a new handler in the system.webServer that enables the stdout for all assemblies and redirect the output to a local file that can accessed using FTP or Kudu. The logs would contain the errors related to file not found or loading issues. 

Remarks
You might ask yourself why not to define an Azure Functions that check if the application is alive every 5 minutes and force a restart of the web application. NO, you shall not do such a thing because the situation described above is not common. Even if I was unlucky to have the same issue twice in a 3 months time interval, this is an isolated exception.

Conclusion
When you use a cloud service and a specific technology stack (e.g. .NET, Java), you should all the time check what are the best practices and recommendations. The issue related to the failure of File Server that is part of an App Service during an update window could be avoided if we would activate the local cache of the website and ensure that stdout logs are persisted. 

Comments

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Entity Framework (EF) TransactionScope vs Database.BeginTransaction

In today blog post we will talk a little about a new feature that is available on EF6+ related to Transactions.
Until now, when we had to use transaction we used ‘TransactionScope’. It works great and I would say that is something that is now in our blood.
using (var scope = new TransactionScope(TransactionScopeOption.Required)) { using (SqlConnection conn = new SqlConnection("...")) { conn.Open(); SqlCommand sqlCommand = new SqlCommand(); sqlCommand.Connection = conn; sqlCommand.CommandText = ... sqlCommand.ExecuteNonQuery(); ... } scope.Complete(); } Starting with EF6.0 we have a new way to work with transactions. The new approach is based on Database.BeginTransaction(), Database.Rollback(), Database.Commit(). Yes, no more TransactionScope.
In the followi…

GET call of REST API that contains '/'-slash character in the value of a parameter

Let’s assume that we have the following scenario: I have a public HTTP endpoint and I need to post some content using GET command. One of the parameters contains special characters like “\” and “/”. If the endpoint is an ApiController than you may have problems if you encode the parameter using the http encoder.
using (var httpClient = new HttpClient()) { httpClient.BaseAddress = baseUrl; Task<HttpResponseMessage> response = httpClient.GetAsync(string.Format("api/foo/{0}", "qwert/qwerqwer"))); response.Wait(); response.Result.EnsureSuccessStatusCode(); } One possible solution would be to encode the query parameter using UrlTokenEncode method of HttpServerUtility class and GetBytes method ofUTF8. In this way you would get the array of bytes of the parameter and encode them as a url token.
The following code show to you how you could write the encode and decode methods.
publ…