• info@vtechsummary.com
  • Munich, Germany

03. Planning a vSAN Cluster

This is part of the VMware vSAN Technical Deep Dives and Tutorials series. By using the following link, you can access and explore more objectives from the VMware vSAN.
VMware vSAN [v8] – Technical Deep Dives and Tutorials

    When building a vSAN cluster, several factors (number of failures to tolerate, number of VMs, number of VMDKs and sizes of the VMDKs, IOPS, etc.) determine the hardware components needed to build a vSAN cluster. This is especially important when planned outages (maintenance, patching, or hardware replacements) or unplanned outages (hardware failure, network partition, driver/firmware mismatch) occur. We want to make sure that we are properly planning for the worst-case scenarios to make sure that our data is as safe and resilient as possible. This may require additional equipment, above the bare minimum, to be added to the environment.

    Planning for Failures to Folerate – FTT

    One of the primary drivers of how much equipment is needed is how many failures you want to tolerate. vSAN can support anywhere from zero failures, meaning a failure of any device could result in data loss, all the way up to three failures, meaning three items could fail and data would remain available. As the number of failures to tolerate starts to scale up, the amount of equipment needed also scales up. For example, a RAID-1 policy that supports one failure would require a minimum of three ESXi hosts, but a RAID-1 policy that supports three failures would require a minimum of seven ESXi hosts.
    By default, RAID-1 with Failures to Tolerate (FTT) of 1 is the default storage policy.

    In addition to the hosts, the amount of storage also needs to scale. Using an example of a VMDK consuming 100 GBs, a RAID-1/FTT 1 would require 200 GBs of storage (two-way mirror). A RAID-1/FTT 3 storage policy would require 400 GB of storage (four-way mirror).
    Regarding RAID-5, there is a 33% additional amount of space consumed from the base size. For example, if we have a 100 GB VMDK, RAID-5/FTT 1 would consume 133 GB compared to RAID-1/FTT 1 which would consume 200 GB. Both support a single failure, but the way those failures are supported, striping with RAID-5 and mirroring with RAID-1, are different. There are also different performance characteristics.
    Note: By default, thin-provisioning is used when creating objects in vSAN.

    Planning for the Number of Hosts per Storage Policy

    Each storage policy requires a minimum number of hosts to satisfy the data layout. VMware (by Broadcome) recommends at least one more host than the minimum to allow for flexible administration.

    In the chart, we compare how many hosts would be required if we use vSAN OSA vs vSAN ESA. The only change between the two architectures is RAID-5. That is because RAID-5 stripes the data differently between the architectures. That will be discussed more in-depth in the storage policy module.

    A RAID-0 policy would benefit application-level clustering, an application that doesn’t need redundancy, or VDI with linked clones. While the minimum number of hosts is one, having two would allow one of the hosts to be placed in maintenance and still allow the data to be available.

    vSAN OSA: HW Requirements

    • Certified compute node
      vSAN OSA supports two different ways to do that. Method one is purchasing a ReadyNode server from a vSAN-supported vendo. Method two is a build-your-own using components that are certified for vSAN OSA. Whatever method you choose is completely fine if everything is compatible with the vSAN OSA.
    • 32 GB RAM
      Inside the host, we want to make sure we have at least 32 GB of RAM. This is the bare minimum to run vSAN OSA. We would certainly recommend more.
    • Dedicated 1 Gbps NIC for Hybrid configurations
      For a hybrid disk configuration, we want to have a dedicated 1 Gbps NIC, at a minimum. A hybrid configuration with heavy I/O can saturate a 1 Gbps NIC.
    • Dedicated or Share 10 Gbps NIC for All-Flash configurations
      For an all-flash configuration, vSAN OSA supports a dedicated or shared 10 Gbps NIC. If a shared NIC is used, we want to make sure that we use a vDS with Network IO Control (NOIC) to make sure that we allocate the appropriate amount of bandwidth for vSAN OSA.

    vSAN ESA: HW Requirements

    vSAN ESA only supports servers that are ReadyNode compliant. They require a minimum of 16 CPUs, a minimum of 128 GB of memory, a minimum of a 10 Gbps NIC for vSAN traffic, a 1 Gbps NIC for management/VM traffic, and then a minimum of two NVME-based SSDs per host. vSAN ESA has the following requirements:

    • vSAN ReadyNode Only
    • Minimum of 16 CPUs
    • Minimum of 128 GB of RAM
    • Minimum 10 Gbps NIC for vSAN traffic
    • Minimum 1 Gbps NIC for VM and management traffic
    • Minimum 2 NVMe-based SSDs per host
    • SAS and SATA devices are not supported
    • NVMe SSDs must be at least 1.6 TB in size

    vSAN Hardware Quick Reference Guide 

    vSAN ESA ReadyNode Hardware Guidance

    vSAN ESA: Storage Capacity Devices

    vSAN ESA supports two different types of drives, a low endurance drive and a high endurance drive. Reads don’t wear out the cells, but writes will wear out the cells. A Low-endurance drive is often called read-intensive, whereas a high-endurance drive is often called a mixed-use drive. The terms are a bit misleading, implying that if our environment is more read-intensive, we should use the low-endurance device. Whereas mixed-use is more appropriate for read and write-intensive environments.

    • Devices are measured in Drives Writes Per Day (DWPD). The higher the DWPD, the more endurance of a drive
    • Read-Intensive (RI) devices offer larger capacity, are less expensive per TB, and do not require a high DWPD rating
    • Mixed-Use (MU) devices offer lower capacity, are more expensive per TB, with a higher endurance (3DWPD)
    • Workload requirements will determine capacity and endurance requirements
    Cannot Mix MU and RI devices in a single Host or within a Cluster

    Storage Capacity Sizing Guidelines

    Now that we know all that goes into building our vSAN cluster, it’s important to take into consideration some storage capacity sizing guidelines.

    • Storage space required for VMs and Anticipated growth
      How much storage space do we need for the amount of VMs we’re running today, but not only today, what about in the future? What is our anticipated growth over the lifespan of this cluster? Will this cluster be used for 1 year, 4 years, 10 years, or more?
    • Failure tolerance
      We can then determine how many failures we want to tolerate in the environment. We can tolerate how many failures.
    • vSAN operational overhead
      what is our operational overhead? We want to make sure we have some extra space to support putting a host in maintenance mode to apply patches or make configuration changes. If that host goes away, do we have enough storage in the environment to move that data to a different host? Same thing if we have an outage, if we need to rebuild our data, do we have enough space available to rebuild whatever data is missing?

    Planning Capacity for VMs

    When planning the storage capacity of the vSAN datastore, consider the space required for the following VM objects:

    Designing vSAN Network

    vSAN requires a high-speed low latency network for internode communication, preferably less than 1 ms round trip time (RTT).
    If using a vSphere distributed switch, the VMkernel port group attaches to the hosts, which are enabled with vSAN. When you purchase a vSAN license, it does include distributed switches. If using standard switches, each host has its own standard switch configuration for the vSAN network.

    Note: It is recommended not to share the vSAN VMkernel adapter with other traffic types, like vMotion, management, fault tolerance, etc..

    Consider the following networking features that vSAN supports to provide availability and performance:

    • Distributed vSwitch
    • NIC teaming and failover
    • Route based on originating virtual port (Active/Passive)
    • Route based on IP hash using static EtherChannel for vSS or LACP for vDS (Active/Active)
    • Route based on physical network adapter load (Active/Active)
    • Network I/O Control
    • Jumbo frames
    • Remote Direct Memory Access (RDMA)

    References:
    VCF and vSAN ReadyNodes
    vSAN I/O Controller
    VMware (by Broadcome) Docs

    Leave a Reply

    Your email address will not be published. Required fields are marked *