App Service Warm-Up Demystified

Problem

Initialization is a core concern of any app. It can be complex and require a long time to complete, creating headaches for users if not properly managed. A synchronous (inline) initialization is typically experienced by users as cold-start, or a long first request to the app, while an asynchronous initialization will surface errors or nondeterministic behavior during startup. In a scale-out cloud environment like App Service, the nature of initialization changes from a one-time event to a lifecycle that must be managed continuously since

  1. Apps run on multiple load-balanced workers with independent initialization
  2. Apps can be rescheduled to new (cold) workers at any time
  3. Scaling operations can add cold workers at any time

Some orchestration is needed to ensure a smooth experience for apps with long startup. Ideally, this orchestration is seamless and requires no more from developers than would be needed for a single-server app.

Solution

In 2009, Microsoft released the IIS AppInit module, a way of defining via configuration when an application has completed its initialization and is ready to serve traffic. AppInit was built to handle the needs of gracefully recycling worker processes within a single IIS server. We decided to scale it out for the cloud to coordinate over multiple instances. The overall App Service integration with IIS AppInit, known internally as AppInit Empathy, is designed to understand an app's initialization behavior and route traffic with awareness of warm and cold instances.

IIS AppInit defines a boolean server variable, APP_WARMING_UP, that it uses to store the state of the warm-up sequence. We added a probe on the server that returns the value of APP_WARMING_UP in response to a ping. When the App Service scheduler adds a new worker to the load balancer rotation, it simultaneously begins sending pings to the worker to

  1. Tell the worker to initiate the AppInit sequence
  2. Continually monitor the APP_WARMING_UP state

The data from this monitoring is collected and cached on the load balancer in order to route traffic to warm instances. The whole system looks something like this:

On the load balancer, the routing algorithm is

  1. Are there 1 or more warm instances? If so, send traffic to only warm instances
  2. Are 0 instances warm? Send traffic to all instances (default behavior)

This handles in a general way both the scale-out and VM rescheduling scenarios. During scale-out, the existing warm instances handle traffic while the new instances are starting up; in the case of VM rescheduling, the underlying platform upgrades are guaranteed not to shut down all warm VM's at the same time thanks to Upgrade Domains. The end result is a seamless experience for apps with long startup.

Tutorial

A more brass tacks tutorial for setting up AppInit can be found in How to warm up Azure Web App during deployment slots swap. In short,

  1. Create a route in your app which waits until initialization is complete before sending a response. How this is done will depend on your language and framework.
  2. Add a section to your web.config that tells IIS to ping this route (initializationPage) for status.
<system.webServer>  
  <applicationInitialization remapManagedRequestsTo="/warmup.html">  
    <add initializationPage="/wait-for-init.php" hostName="appinit-site.azurewebsites.net"/>  
  </applicationInitialization>  
<system.webServer>

You can have more than one route if desired. The remapManagedRequestsTo attribute is optional and specifies a warmup page to redirect users to during warmup. The purpose of AppInit Empathy is to make sure you never see that page, though I recommend it as a way to easily see what's going on in unsupported corner cases.

Failing vs Hanging Requests

A slightly confusing quirk of AppInit is that it does not look at response status codes and will consider even 500's as successful initialization of a route. Therefore, to block completion of initialization and keep the instance from being marked as warm, the proper behavior is to hang the ping from IIS until initialization is complete.

Known Limitations

There are some known cases when AppInit Empathy acts aloof. One is worker size change (scaling up or down). When a size change is requested, all workers of the previous size are simultaneously torn down and replaced with workers of the new size. Since all the new instances are cold, the routing algorithm falls back to routing to all instances.

Technically, it is possible for AppInit Empathy to orchestrate the size change gracefully by waiting for the new set of workers to be warm before tearing down the old ones. Please vote here to get it done if this scenario is important to you.

Another unsupported case is application restart. The reason for this is the same: when an app is restarted, all the instances are restarted at the same time. Advanced Application Restart, however, will perform a rolling restart and correctly partition traffic to warm instances during the operation.

Lastly, AppInit Empathy does not prevent the normal idle-out of worker processes on App Service during periods of zero traffic. If all instances idle-out, the next request must go to a cold instance of the app. This can be addressed with Always On, which prevents apps from idling out by periodically pinging all instances.

Timeout

Each app instance is guaranteed 10 minutes to start up gracefully before it begins receiving traffic, 10 minutes being the amount of warning time the underlying Azure fabric provides before tearing down a VM. In practice, these VM recycles are far and few between, and AppInit Empathy will normally give the application up to 30 minutes to complete initialization.

Personally, I recommend staying on the safe side of 10 minutes. If an unlikely combination of factors can happen, somewhere out there, at scale, it will.

Appendix

Deployment Slot Integration

When swapping Site Slots, we ping the workers to probe APP_WARMING_UP and ensure the swap does not complete until all instances are warm. See How to warm up Azure Web App during deployment slots swap.

Local Cache Integration

When using the Local Cache feature, construction of the instance file cache is treated exactly the same as an asynchronous app initialization. Therefore, we preferentially route traffic to workers running from cache if any are ready. If an app uses both Local Cache and AppInit, we wait for both before marking the instance as warm.

Application Crashes

If a worker process dies or crashes unexpectedly (for example, due to a fault in the application code), a notification will be sent to the load balancer within a few seconds to recalculate the routing plan, mitigating the otherwise outsized impact of process crashes in apps with long initialization.

Show Comments