Determining the size of your Provisioning Services Write-Cache

Posted on 2012/10/02

I have heard recommendations for suggested write-cache sizes from 1 GB to the maximum size of the disk. I have been on-site with clients who told me they had consultants recommend a write-cache size then lost their whole server farm in a matter of two hours when they filled up. Frequently, I am asked the question, “What is the correct size for my write-cache?” A good question; and as a good consultant, I will tell you “it depends.” In today’s blog I will provide to you the guidelines I use for determining the write-cache drive size.

What is Provisioning Services, a write-cache, and why does it matter?

Before we begin a little background on Citrix Provisioning Services (PVS) and how it works. Provisioning Services provides administrators the ability to virtualize a hard disk or workload and then stream it back out to multiple devices. The workloads, which can be server or desktop, are ripped from a physical or virtual disk into Microsoft’s virtual hard disk (VHD) format and treated as a golden master image called a vDisk. This master image is then streamed over the network from a Windows server running the stream service to multiple target devices that were PXE booted.

Device drivers (network and disk) installed on the target devices (physical or virtual) have the intelligence to route disk file requests over the network to the PVS streaming servers which in turn provide the requested files. The entire vDisk is not streamed across the network. Only the files requested by the operating system are streamed across the network. This means that a 30 GB Windows Server 2008 R2 workload that boots off a streamed vDisk may only see 200 MB of files transfer across the network.

When a vDisk is in private mode, the vDisk can be edited with all reads and writes going directly to the vDisk via the streaming server. While in private mode, only one target device may be accessing the vDisk. When a vDisk is in standard mode, it is read-only and no changes can be made to it. In standard-mode all read operations are redirected to the streaming server and all write operations are redirected to what is referred to as a write-cache file. The intelligent device drivers are smart enough to redirect writes to the write-cache file and read newly written files from the write-cache file instead of the streaming server when necessary. When a vDisk is in standard-mode, multiple target devices may be accessing the vDisk. Finally, whenever a target device is rebooted, the write-cache is deleted and recreated and the device boots in a pristine state.

What factors are important in determining the write-cache file size?

When using Citrix Provisioning Services with the vDisk in standard mode you have a write-cache drive location that holds all the writes for the operating system. If the write-cache file fills up unexpectedly, the operating system will behave the same as if the drive ran out of space without any warning, in other words it will blue screen.

The optimum size of write-cache drive does depend on several factors:

Frequency of server reboots. The write-cache file is reset upon each server boot so the size only needs to be large enough to handle the volume between reboots.

Amount of free space available on the c: drive. The space that will be used for new files written to the c: drive is considered the free space available. This is a key value when determining the write-cache drive size.

Amount of data being saved to the c: drive. Data that is written to the c: drive during operation will get stored automatically in the write-cache drive. New files will be stored in the write-cache file and decrease the amount of available space. Replacements for existing files will also be written to the write-cache file but will not marginally affect the amount of free space. For instance, a service pack install on a standard-mode disk will result in the write-cache file holding all the updated files, with very little change in available space.

Size and location of the pagefile. When a local NTFS-formatted drive is found, Provisioning Services moves the Windows pagefile off of the c: drive to the first available NTFS drive, which is also the location of the write-cache file. Therefore, in the default configuration, the write-cache drive will end up holding both the write-cache file and the pagefile. To learn more about correctly sizing your pagefile, see Nick Rintalan’s blog, “The Pagefile Done Right!”.

Location of the write-cache file. The location of the write-cache file is also a factor in determining its size. The write-cache file can be held on the target device’s local disk, the target device’s RAM, or on the streaming server.

  • Target device disk: If the write-cache file is held on the target device’s disk, it could be a local disk to client, local disk to the hypervisor, network storage to the hypervisor, or SAN storage to the hypervisor.
  • Target device RAM: If the write-cache file is held in the target device’s RAM the response time will be faster and in some cases the additional RAM is less expensive than SAN disk.
  • Streaming Server: If the write-cache file is on the server, no preset size is necessary. When using server-side write-cache file, the Provisioning Services streaming server must have enough disk space to hold the write-cache files for all target devices managed.

Determining the correct write-cache drive size is mostly a logical exercise once you understand the relationship of the write-cache file and the pagefile with the write-cache drive.

Guidelines for determining write-cache size

In the old days we would recommend running with server-side write cache for the duration of the pilot project and then find the largest write-cache file on the server before the target devices were rebooted. From there we would just double or triple the size and make that the default size for a write-cache file. That approach works most of the time, but the approach is not so efficient with disk space.

Below are the few guidelines I use when recommending a size for the client-side write-cache drive.

  1. Write-cache drive = write-cache file + pagefile (if pagefile is stored on the write-cache drive)
  2. Write-cache file size should be equal to the amount of free space left on the vDisk image. This will work in most situations, except those where servers receive large file updates immediately after booting. As a rule, your vDisk should not be getting updated while running in standard-mode.
  3. Always account for the pagefile location and size. If it is configured to reside on the c: or d: drive, include it in all size calculations.
  4. Set the pagefile to a predetermined size to make it easier to account for it. Letting Windows manage the pagefile size starts with 1x RAM but it could vary. Manually setting it to a known value will provide a static number to use for calculations.
  5. During the pilot, use server-side write caching to get an idea of the maximum size you might see a file reach between server reboots. Obviously, the server should have a full load and should be subject to the normal production reboot cycle for this to be of value.
  6.  If people die when servers blue screen, set the write-cache drive to the size of the vDisk plus the pagefile size.

In most situations, the recommended write-cache drive size will be free space available on vDisk image plus the pagefile size. For instance, if you have a 30GB Windows Server 2008 R2 vDisk with 16GB used (14GB free) and are running with an 8GB pagefile, I would recommend using a write-cache drive of 22GB calculated as 14GB free space + 8GB for the pagefile. If space doesn’t permit, you have a few options, not all of which may be available to you.

  1. If storage location for the write-cache drive supports thin-provisioning, configure thin-provisioned drives for the write-cache drive to save space.
  2. Use dynamic VHDs (instead of fixed VHDs) though this approach is generally only recommended for XenDesktop workloads. If you choose this approach, you will probably need to periodically reset the size of the dynamic VHD, which can be done with a PowerShell script.
  3. Reboot the servers more frequently which in turn will reduce the maximum size of the write-cache file.
  4. Move the pagefile to a different drive or run without a pagefile.
  5. Use the old school method mentioned earlier to select a write-cache file size that is equal to or larger than the largest write-cache file recorded during the pilot stage. Using this option though may still result in blue screen events.

Of course, if you require 100% uptime and you have the disk space available, the sure-fire write-cache drive size is to set it to the size of the vDisk plus the pagefile size when the pagefile will get placed on the write-cache drive. In other words, if the Windows Server 2008 R2 vDisk image is 30GB and you have an 8GB pagefile configured, setting the write-cache drive size to 38GB will protect against any unforeseen blue screens. However, not everyone has that kind of space available, especially when using the expensive SAN storage for the write-cache drives.

Scalability implications

Just a quick note that large-scale environments, the best practices recommendation is to place the write-cache drive on the client hard disk rather than on the server. Generally speaking, you get about 40-60% more target devices on a single Provisioning Server with client-side write-cache than you do with server-side write-cache drives.  In addition, failover works better as the client target device has its write-cache available no matter which server is streaming the vDisk.

The use of client-side write-cache provides the maximum scalability of the Provisioning Services streaming server because the server does not need to perform both reads and writes for all target devices; rather the server is only required to read the vDisk once, cache the contents, and then stream it out over the network. This saves both CPU and network bandwidth on the streaming server allowing it to manage more target devices.

Hopefully, today’s blog gives you a bit more guidance around correctly sizing your write-cache drive. If you liked this blog and want to be notified of future blogs, please feel free to follow me on Twitter @pwilson98.