You are on page 1of 10

9 Storage Considerations for VM Administrators

What a VM administrator needs to know to avoid performance problems caused by insufficient storage capacity and plan for environment growth

WHITEPAPER BY ALEX ROSEMBLAT

Table of Contents
Introduction .................................................................................................................................................. 3 Insufficient Storage Capacity Causes VM Performance Problems................................................................ 3 How VM Data Flows to Shared Storage: From Host to SAN and Back .................................................... 3 The Hardware and Design Decisions that Determine Storage Capacity ................................................. 4 Storage Space............................................................................................................................................ 5 Throughput ............................................................................................................................................... 6 Inter-connectivity/Fabric ...................................................................................................................... 6 Host Bus Adaptor .................................................................................................................................. 6 Storage Controller ................................................................................................................................. 6 Spindles ................................................................................................................................................. 6 How to Forecast Storage Capacity Requirements ........................................................................................ 7 Virtualization will Vastly Increase Storage Volume ...................................................................................... 8 Virtual Environments Employ Numerous Host to Datastore Connection Permutations.............................. 8 Virtualization Will Cause Unpredictable Throughput Concentrations ......................................................... 9 Additional Storage Abstraction Layers are Used with Virtualization.......................................................... 10 Virtualization Introduces Environment Dynamism and Automation ......................................................... 10 Conclusion ................................................................................................................................................... 10

2010 VKernel Corporation. All rights reserved

Introduction
Many VM performance issues stem from bottlenecks within a data centers storage throughput capacity. These issues originate from inadequate planning for a virtual environments storage needs and can be avoided with visibility into storage infrastructure and a planning process to address storage resource expansion in line with the application usage growth that the environment will face. Even for data centers that employ Storage Area Networks (SAN) for their physical servers before shifting to a virtual infrastructure, virtualization adds additional intricacy that must not be overlooked and needs to be factored into storage planning decisions, documentation procedures, and issue troubleshooting. This whitepaper will present nine considerations that VM administrators must know about storage to avoid VM performance problems, forecast storage capacity needs, and further understand how virtualization makes data center storage more complex.

Insufficient Storage Capacity Causes VM Performance Problems


Planning for storage in a data center is critical as environments with insufficient storage capacity will experience VM performance problems. Virtual environments that utilize a SAN for their storage needs experience a complex resource sharing arrangement as VMs and hosts connect to a separate storage area and share bandwidth within this connection, as well as the actual storage space. Disk I/O and the storage space itself are two vital resources that a VM requires to function. If a VM needs storage space and there is none left, the VM will cease to function. Without sufficient disk I/O capacity, commands will be delayed as they wait their turn to pass through the interconnective fabric to communicate to the SAN or host. When commands reach the actual disks, if the disks do not have the physical capacity or have not been configured to handle the volume of requests, commands will once again be delayed as they wait their turn to access the disk. Due to the possibility of capacity bottlenecks which will cause VM performance problems, storage implementations must be finely tuned to an organizations requirements. This can become a complicated process in and of itself as there is an immense amount of vendors and configurations to select from with a wide spectrum of costs at nearly every level of storage infrastructure. Finding the right mix for a data centers needs at an appropriate price can be challenging. To add further complication to the process, many organizations employ a separate IT department to manage the storage infrastructure. That department may not have visibility into the firms application requirements, usage growth rates, and expansion forecasts which are vital to determine the amount of storage capacity a data center needs.

How VM Data Flows to Shared Storage: From Host to SAN and Back
Shared storage requires an infrastructure with several different hardware components. Importantly, data will be processed through the entire infrastructure only at the maximum capacity of the smallest capacity component, or the weakest link, so to say. That means that if an organization has hypothetically deployed high capacity spindles (the actual disks that are being written to), but is using a much lower capacity fabric for interconnectivity between the ESX or Hyper-V hosts, the advantage of
2010 VKernel Corporation. All rights reserved

having the higher capacity spindles will be negated. Consequently, the entire storage system will only be able to handle the amount of data that can fit through the fabric. Due to the systemic nature of storage, a holistic view of an organizations application performance needs is required to define storage and storage access capacity requirements. Figure 1 illustrates how data flows from a host to a disk. As a command leaves for the datastore from the ESX or Hyper-V host, it travels to the Host Bus Adaptor, which acts as a travel agent for the command and tells it where it needs to go within the SAN and how to get there. After the data passes through the Host Bus Adaptor, it passes through the inter-connectivity layer, or fabric to arrive at the SAN. A command emerges in the Storage Controller which will point the command to the correct spindle. The command reaches the physical spindle where it executes its instructions and is given a response to deliver back to the ESX or Hyper-V host. This command then travels through the same infrastructure levels in reverse, back to the host. A bottleneck or issue in any of the levels in this flow will cause performance issues which will be detected as increased latency.

ESX/Hyper-V Host

Host Bus Adaptor

Inter-connectivity/ Fabric

Storage Controller

Spindle

Figure 1 The Hardware Components in the Storage Data Flow

The Hardware and Design Decisions that Determine Storage Capacity


As described above, there are many moving parts within the storage infrastructure. As a result, there are many decision points that go into determining the specifications and configurations of the different hardware parts that make up this resource. To determine those specifications, information for current data production, number of data transactions, and importantly, the forecasted growth in applications and application use is vital to plan appropriately for environment growth. As mentioned above, the weakest link in the storage infrastructure chain will determine the total throughput capacity for the entire storage resource. Figure 2 shows the how the hardware decisions that must be made are interconnected. The hardware supplied in each of these points and its configuration will determine total storage capacity. At its highest level, storage capacity is made up of actual storage space (i.e, the rough number of gigabytes available to store data), and the data throughput that the SAN can take in. Throughput is determined by the volume of data within the connection from host to SAN, and also within the capacity of the hardware that processes the data to read and write it to disk namely the host bus adapters, storage controllers, and spindles. More detailed descriptions of the data needed to determine a requirement at each level is included below Figure 2.

2010 VKernel Corporation. All rights reserved

Storage Capacity

Storage Space

Throughput

Inter-connectivity/ Fabric

Storage Hardware

Overhead

Spindles

Storage Controller

Host Bus Adaptor

Figure 2 The Connections in Hardware Decision Points that Determine Storage Capacity

Storage Space
Storage space refers to the actual amount of disk space needed for a virtual environment to continue running. Importantly, if a VM needs storage space and there is no more available, the VM or an application on the VM will cease functioning. Virtualized infrastructure storage needs can be much larger than the storage that was necessary when the same applications ran on physical servers as VMs will likely require more storage space than just their allocations. Each VM may have an associated snapshot for quick maintenance purposes. Also additional storage will be needed to host all templates used to quickly provision VMs, and there may be use cases where multiple VM images of the same VMDK or VHD file must exist. Additionally, as VMs can move around with vMotion, the entire storage resource is always fluid and dynamic, and a buffer must be left so that the environment has enough slack to rebalance itself when necessary. Notably, decisions on data redundancy and the RAID scheme employed will impact the amount of storage space needed. Depending on the RAID scheme, up to double the amount of physical storage space will be needed for all actual data being stored. To further add complexity, different parts of the SAN may be equipped at different RAID levels based on the criticality and performance needs of the information being stored and accessed. Accordingly, the decisions that are made at the spindle level will directly impact the amount of storage space that is needed.
2010 VKernel Corporation. All rights reserved

Throughput
Disk throughput is the most common resource in which capacity bottlenecks can arise. These issues can be difficult to pinpoint, and are often sensed through a high latency value. Thus, ensuring that the hardware in the storage solution accurately fits the expected data transaction need is critical. Inter-connectivity/Fabric The connection between the host and the datastore is a critical area to determine the amount of throughput that will be available to the SAN. Several options for both connection technology and file standards exist, with price varying almost directly to the bandwidth available, and thus the throughput capacity within such hardware. Although some high capacity solutions such as Fiber Channel may be very expensive, because storage throughput has such a high impact on VM performance, such solutions may be necessary to maintain a high level of service based on the types of applications that are running in the environment. Host Bus Adaptor The host bus adaptor is the piece of hardware that will direct the command from the host to the disk, and then catch the return message. Generally, this hardware component is specified by the host hardware vendor. Storage Controller The storage controller is the piece of hardware that receives commands from the fabric at the SAN, and integrates to the spindles. It is important to note that based on the RAID standard used, additional calculations may need to be employed with complex RAID performance or data redundancy implementations. Thus, if such RAID standards are used, additional time in milliseconds will be needed every time that a command is sent to disk or returned which could impact performance, especially if other areas of the storage infrastructure could become constricted. Spindles The spindle is the actual disk that will store the data that is being written and accessed by the virtual environment. Tremendous variability exists with the technical capabilities of the spindles, with price directly mapping to performance and amount of storage space. Additionally a fabric must exist between the storage controller and the spindles that similar to the fabric that connects the SAN to the hosts, can vary greatly in throughput capacity and price. As mentioned previously, if the throughput capacity at the spindle level does not match the other components in the storage infrastructure stack (most critically, the fabric connecting the host to the SAN), some of the capacity that a disk has will not be accessible, or vice versa. The RAID standards that will be employed must be decided on at the spindle level, which will directly affect performance of all disk read and writes as an aggregate. RAID standards will also drive the total amount of storage that will be needed as more intensive RAID standards for data redundancy or performance require higher storage overhead which translates into greater amounts of storage space. Additionally, if there is any level of deduplication of the data (i.e. redundancies shared by several data

2010 VKernel Corporation. All rights reserved

objects such as when an often-used operating system within a VM is housed in one place), this will make storage usage more efficient.

How to Forecast Storage Capacity Requirements


Without adequate storage or disk throughput resources, VM performance will be gravely affected and an environment may be unable to grow. Thus, calculating storage and disk I/O needs must be undertaken with anticipation, and if a separate department manages storage, application-side growth visibility must be provided for accurate storage capacity forecasting. To begin forecasting capacity, system administrators must first take a baseline measurement of their current environments. Importantly, the following questions must be answered: What is the total volume of data that is transacted between read and write operations per second for every second of a typical week and what is the breakdown per application? What are the high-water marks in total data volume transacted and when do they occur? What are the average and maximum data transaction sizes for the whole environment and per application? What are the average and maximum number of actual transactions at any given second for the whole environment and per application? What are the peak time periods for data volume and number of transactions? Is there a measure of VM sprawl and waste (i.e. storage being taken up by abandoned VM images, unused snapshots, unused templates, powered-off VMs that have not been deleted, and zombie VMs)? After having established a solid baseline of both the number of transactions that occur at each second, and the data size of all the transactions, and having further stratified this data by application type, a subjective assessment of application growth should be added by answering these questions: Which applications will see increased usage in the near future and by what growth margin? Which new applications or increased instances of existing applications will be provisioned in the near future? With assumptions on application and usage growth in hand, forecasting for the amount of storage space and throughput can be made. Importantly, a percentage of excess capacity should be added in as well, as forecasting assumptions should remain on the liberal side because running out of storage space or disk I/O will have grave ramifications for both VM performance and the ability to immediately provision new VMs or increase allocations on existing VMs. A storage administrator should be given a total amount of storage space that will be needed from projections for the upcoming purchase period, as well as a list of new applications, or application expansions with a requested performance level (low, medium, high) for each application. Any information on baseline data transaction volume for existing applications will be highly valuable in this process.
2010 VKernel Corporation. All rights reserved

A storage administrator will then be able to make appropriate decisions on how to increase storage and disk I/O capacity to serve the expansion of the environment.

Virtualization will Vastly Increase Storage Volume


Environments that virtualize will experience significant growth in storage needs. This stems from two sources: 1. Additional File Creation for VM Maintenance Virtual disks are nothing more than VMDK(VMWare) or VHD(Microsoft Hyper-V) files. These are large files as they include the data for operating systems and other supporting software. Each VM may also have several additional files of similar size created for maintenance purposes in snapshots or copies of the VM image. Data-intensive templates may also be needed for each kind of VM instance that is provisioned. Additionally, many environments suffer from VM data waste in abandoned VM images, powered-off VMs that are not needed and not deleted, unused snapshots, unused templates, and zombie VMs which are left powered-on and not used. These data objects can be difficult to find and clean up, and can take up large amounts of storage. 2. IT Usage Growth As VMs are fast and easy to create, virtualization often unlocks pent up demand within the organization for new applications, or extended use of existing applications. This translates into more VM images with their associated overhead in snapshots, templates and other files as well as a growth in data produced within the applications that must be stored. Also, if environments are thin provisioned, this IT usage growth can cause hard to detect additional storage growth if a backup fails or other problems occur. In such a scenario, VM rebalancing can take place for many VMs sharing the storage and log files will begin to be written at a breakneck pace until that growth is noticed, a full backup occurs or the log files completely fill up available storage. This growth in data creation should be anticipated as storage administrators must not only extend their storage capacity, but may also have to upgrade to higher capacity architectures and new hardware to handle the increased storage and storage access demands.

Virtual Environments Employ Numerous Host to Datastore Connection Permutations


Physical servers that employ shared storage feature a host to datastore architecture that is typically quite simple. There is usually a one to one mapping between host and Logical Unit Number (LUN) as is shown in Figure 3.

2010 VKernel Corporation. All rights reserved

Physical Host

LUN

Figure 3 Physical Host to LUN Connection In the virtualized world, a host can map to an unlimited amount of datastores based on the needs of the VMs within the host. Because an issue can occur within any one of these host to datastore connections, the number of areas that must be monitored will increase drastically. As Figure 4 shows, the sheer amount of permutations can add significant complexity if for example, a latency issue needs to be investigated to find the root cause. Some environments may also choose to replicate the existing connections for redundancy causing the total number of connections to grow even further.
Virtualized Hosts Datastores

Figure 4 Multiple Virtualized Host to Datastore Connections

Virtualization Will Cause Unpredictable Throughput Concentrations


As hosts and VMs share storage, they must also share the connections to the SAN. Each VM however functions independently and experiences usage peaks and increased loads for its own reasons. Additionally, as VMs can move around and gain access to different datastores, knowing how much throughput will occur at any given time in any host to datastore connection is impossible. Hence, the concentration of throughput at any part of the virtual infrastructure is unpredictable and an
2010 VKernel Corporation. All rights reserved

environment must have enough capacity to handle not only regular operating needs, but also peaks in usage. With this multiple host to datastore connection structure, issues can occur quickly with little to no warning and can be hard to track down as the entire virtual environment keeps on shifting.

Additional Storage Abstraction Layers are Used with Virtualization


While physical hosts connected to the SAN generally link to a single LUN (an abstraction layer of several physical spindles), virtualization has introduced an additional layer of abstraction: the datastore (a collection of LUNs). Although this new level of abstraction allows for flexibility and additional robustness in storage redundancy, performance tiering, deploying VMs and allowing the virtual environment to automatically balance itself, it also adds additional complexity. A VM knows which datastore it is mapped to, but then additional investigation must be employed to know which LUNs make up the datastore, and then which spindles make up the LUN. This process adds time and additional steps to maintenance and issue troubleshooting.

Virtualization Introduces Environment Dynamism and Automation


One of the great cost and time savers of virtualization is that aspects of data center maintenance can be automated, and because resources are shared, VMs can easily shift their resource usage and move around when necessary. However, this flexibility also means that additional planning and capacity must be available to enable this functionality. Slack capacity is needed in storage to ensure that DRS and vMotion will work appropriately to deploy and shift VMs or errors will occur. Also, because an environment is dynamic and ever-changing, problems can appear and disappear and trying to troubleshoot or find the root cause of an issue becomes harder and more tedious. Further, without adequate documentation and analytic abilities to piece together all circumstances that were present when an issue occurred, finding the cause of the issue can become impossible.

Conclusion
Having visibility into the capacity for storage space and disk throughput is critical to maintaining a high performing environment. Many capacity bottlenecks occur within the disk throughput resource area, cause massive performance problems and can be hard to track down. Because of the dynamic nature of shared storage, slack must also be built in to storage resource calculations to allow for additional capacity needed at peak times or when self-balancing action such as vMotion occur. As environments are always growing, storage administrators need visibility into application and application usage growth to adequately plan and purchase the necessary hardware to accommodate the growth. The infrastructure to operate shared storage is complex and has many moving parts. The total bandwidth of that infrastructure is only as robust as the weakest link of the infrastructure. Without adequate storage planning, virtualized environments run the risk of running out of disk throughput capacity and facing VM performance problems or stunted growth.

2010 VKernel Corporation. All rights reserved

10

You might also like