VMware Cloud Foundation 9.0 Management Domain with NFS as principal storage

VMware Cloud Foundation (VCF) 9 now supports a variety of storage types as principal storage, such as vSAN ESA/OSA, Fibre Channel, and NFS v3. In this blog article, I’ll demonstrate the necessary steps to deploy the Management Domain of a VCF9 instance with NFS as principal storage.

I’m using the same design blueprint as with my previously described vSAN setup. The difference is, that the ESX hosts are only configured with a small 10 GB hard disk for the OS and no additional NVME disk, and that we’re going to use VLAN 40 (10.230.40.0/24) as the NFS network.

The NFS server is also located in this network, and is defined as nfs1.sddc.lab with IP address 10.230.40.8. It will export a share for the Management Domain called /srv/nfs/vmware-m01 for the network 10.230.40.0/24. Plus, it will export a share for the Workload Domain called /srv/nfs/vmware-w01 for the same network 10.230.40.0/24.

We will use Jumbo Frames for better performance of the NFS traffic, which requires a MTU size of 9000 bytes along the entire network path, i.e. NFS Server, pfSense Router, ESX hosts.

Table of Contents

NFS server preparation

I’ve setup a NFS server on a VM running Ubuntu 22.04. Besides the OS hard disk, this server has attached some larger disks to store the VM files. Using LVM during the initial Ubuntu installation, I’ve added these disks to a volume group called vmware-vg. From this volume group, I’ve created two logical volumes — one called m01-lv for the Management Domain m01, and the other called w01-lv for the Workload Domain w01.

Let’s create the the two logical volumes, create a filesystem on them and create their mount points:

sudo lvcreate -L 950G -n m01-lv vmware-vg
sudo lvcreate -L 400G -n w01-lv vmware-vg

sudo mkfs.ext4 /dev/vmware-vg/m01-lv
sudo mkfs.ext4 /dev/vmware-vg/w01-lv

sudo mkdir -p /srv/nfs/vmware-m01
sudo mkdir -p /srv/nfs/vmware-w01

In /etc/fstab, we reference to the two logical volumes by ID.

/dev/disk/by-id/dm-uuid-LVM-T8efkqBCU5B39wnrCzjVXN0TTkCEqV2bMD4lYYwZf94YmsKYeoTaT9uWXaqszOtC  /srv/nfs/vmware-m01 ext4 defaults 0 1
/dev/disk/by-id/dm-uuid-LVM-T8efkqBCU5B39wnrCzjVXN0TTkCEqV2b6wZ0XqoZxMQfyyiizz1xgL96n3pwlUoW /srv/nfs/vmware-w01 ext4 defaults 0 1

Now let’s mount them.

sudo mount -a

Next, we’re going to install the NFS server.

sudo apt install -y nfs-kernel-server
sudo systemctl enable --now nfs-server
sudo systemctl start nfs-server

Define the exports in /etc/exports:

/srv/nfs/vmware-m01 10.230.40.0/24(rw,sync,no_subtree_check,no_root_squash)
/srv/nfs/vmware-w01 10.230.40.0/24(rw,sync,no_subtree_check,no_root_squash)

And reload them:

sudo exportfs -ra

We can check the exports using sudo exportfs -v, it should show the following:

/srv/nfs/vmware-m01
                10.230.40.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/srv/nfs/vmware-w01
                10.230.40.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)

Now let’s change the MTU of the NFS server from the default of 1500 bytes to 9000. In Ubuntu 22.04, we can do this using netplan in /etc/netplan/50-cloud-init.yaml. We simply have to add the mtu property to our NIC device:

network:
    ethernets:
        ens34:
            mtu: 9000
            addresses:
            - 10.230.40.8/24
            nameservers:
                addresses:
                - 10.230.10.4
                search:
                - sddc.lab
                - vcf.sddc.lab
            routes:
            -   to: default
                via: 10.230.40.1
    version: 2

Apply the settings and check them:

sudo netplan apply
ip link sh ens34

The output of the ip link command should show mtu with a value of 9000:

2: ens34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:d8:f1:e9 brd ff:ff:ff:ff:ff:ff
    altname enp2s2

ESX server preparation

The ESX servers in this lab will be deployed as nested appliances on my physical lab ESX server.

For the four Management Domain ESX servers, I’ll use the following specs:

24 vCPUs. VCF Automation requires this huge amount of CPUs for the initial setup, for a single node setup. Although we’ll be able to scale down the appliance after deployment.
- Expose hardware assisted virtualization to the guest OS
192 GB Memory
2x VMXNET3 network adapter
- Connected to my Trunk portgroup on the physical ESX host (VLAN 4095)
1x VMware Paravirtual SCSI Controller
1x 20 GB Harddisk connected to the SCSI Controller
VM Boot Options
- Whether or not to enable UEFI secure boot for this VM: Disabled

For the three Workload Domain ESX servers, I’ll use the following specs:

16 vCPUs
- Expose hardware assisted virtualization to the guest OS
64 GB Memory
2x VMXNET3 network adapter
- Connected to my Trunk portgroup on the physical ESX host (VLAN 4095)
1x VMware Paravirtual SCSI Controller
1x 20 GB Harddisk connected to the SCSI Controller
VM Boot Options
- Whether or not to enable UEFI secure boot for this VM: Disabled

Now, that we have the deployment specification for our nested ESX servers, we must install and configure them.

First, get the ESX 9.0.1 installer ISO from the Broadcom Support Portal.

Now use the following procedure for each of the ESX hosts:

Mount ESX installer ISO to the ESX server you’re going to install. Follow the installation wizard as usual, just make sure to install the OS to the harddisk on the SCSI controller. Also make sure to provide a sufficient complex password for the root user.

After we’ve installed ESX, we configure the Management network and enable SSH using the ESX DCUI, e.g. for m01-esx01:

Network Adapters: vmnic0
VLAN: 20
IPv4 Address: 10.230.20.211
Subnet Mask: 255.255.255.0
Default Gateway: 10.230.20.1
Primary DNS Server: 10.230.10.4
Hostname: m01-esx01.vcf.sddc.lab
Custom DNS Suffixes: vcf.sddc.lab

Next, we must configure the ESX OS for VCF Host commissioning. First, we setup NTP:

esxcli system ntp set -s=10.230.10.4
esxcli system ntp set -e=yes

Set the vSwitch MTU size to 9000 bytes:

esxcli network vswitch standard set -m 9000 -v vSwitch0

Setup the host SSL certificate:

/sbin/generate-certificates

Reboot the ESX host:

reboot

After the ESX host has been rebooted, connect to the ESX host client using a web browser and check if the new SSL certificate has been applied and shows the correct hostname, e.g.:

Important note: We must not configure any vmkernel interface for the NFS network and we must not mount the NFS datastore yet, as it would lead into errors during VCF installer validation checks!

VCF Installer Deployment

The procedure is the same has described in my blog article “VMware Cloud Foundation 9.0 Deployment Guide“.

VCF Installer Binary Management

The procedure is the same has described in my blog article “VMware Cloud Foundation 9.0 Deployment Guide“.

VCF Installer Deployment Wizard

The procedure is almost the same has described in my blog article “VMware Cloud Foundation 9.0 Deployment Guide“.

As we’re now using NFS as storage backend, step 7 Storage differs. Here, we’re selecting NFS v3 and provide the details of the NFS server mount. Also make sure to toggle the setting Configure VMkernel Binding for NFS Datastore. Without binding, if the VMkernel adapter that ESX uses for NFS traffic fails, the network infrastructure redirects the traffic to an alternative route. As a result, the NFS traffic might unintentionally flow through a random VMkernel adapter.

Note, that during the validation tasks, the VCF installer workflows will configure the appropriate NFS portgroup and VMkernel adapter on the ESX host, and will also mount the NFS datastore accordingly.

Step 9 Network is also different, as we configure the NFS portgroup.

Also verify in Step 10 Distributed Switch, that the NFS portgroup is defined.

In my lab, I got a warning regarding the NFS datastore size, but this is ok and we can click Deploy to start the VCF installation.

Troubleshooting Tips

Initially, I had some issues during the first step of the VCF installation where the VCSA is being deployed. For me the setup failed with the following error:

vCSACliInstallLogger - ERROR - Exception message: The free space of datastore 'm01-cl01-ds-nfs01' (0 GB) in host 'm01-esx01.vcf.sddc.lab' is less than the minimum size required (25 GB). Use a different datastore, or increase the datastore size above the required minimum.

This message is quite misleading. It has nothing to do with the datastore itself or with the export settings on the NFS server, but it turned out that the problem was caused a wrong MTU size on the NFS server, where still the default size of 1500 bytes was configured. Changing it to 9000 bytes solved the issue. To check if the packet size is wrong, I pinged the ESX host NFS VMkernel adapter address from the NFS Server with a packet size larger than 1500 bytes:

 ping -M do -s 8972 10.230.40.211

If the MTU size doesn’t fit, you’ll get the following output:

PING 10.230.40.211 (10.230.40.211) 8972(9000) bytes of data.
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500

That’s all for now 😉