Debug School

Cover image for vSAN HCL Health Issues
Suyash Sambhare
Suyash Sambhare

Posted on

vSAN HCL Health Issues

vSAN HCL Health - Controller release support health check.

This health check verifies whether the vSAN controller is certified by VMware for the currently installed ESXi version.

Purpose of the vSAN HCL Health – Controller Driver

This health check assesses whether the storage I/O controller driver is compatible with the controller installed on the ESXi host and whether it aligns with the current release of ESXi.

Assuming that the Controller on vSAN HCL (Hardware Compatibility List) and Controller Release Support health checks have already passed, this additional check verifies that the driver version in use is included in the list of supported drivers.

This verification is crucial because drivers play a pivotal role in maintaining the stability and integrity of vSAN. Vendors frequently release driver updates to address critical bugs. In some cases, VMware may revoke the certification status of an older driver and only endorse the new version. Consequently, this health check could transition from a healthy green (OK) state to displaying warnings after refreshing the VCG (VMware Compatibility Guide) database.

Furthermore, there’s a possibility that a vendor updates a driver and makes it available for download from their website. However, unless VMware has certified the driver for vSAN (either because certification hasn’t occurred yet or because it failed certification), upgrading to this driver is not recommended unless it appears on the VCG.

Before proceeding with a driver upgrade, it’s advisable to refresh the VCG database, review the details section, and verify the list of supported drivers. If the new driver is indeed supported, you can confidently proceed with the upgrade.

Understanding the error state

If either the Controller on vSAN HCL or Controller Release Support check fails, this subsequent check will also fail. If the driver is absent from the VMware Compatibility Guide (VCG) for this specific device and ESXi release, the vSAN environment could be at risk. Alternatively, if the driver appears in the VCG for this device and ESXi release but still generates warnings, it might indicate that the driver lacks certification with the current firmware

Troubleshooting an error state

Typically, this indicates that the current ESXi release is not certified by VMware, or the Hardware Compatibility List (HCL) database is not up-to-date. If the health check results are empty, it could be due to the ‘vSAN HCL DB up-to-date’ or ‘SCSI controller is VMware certified’ checks not being in the green (OK) state. I recommend addressing any warnings from these two health checks first

Fixes

If the health check displays a warning, please ensure that the vSAN HCL (Hardware Compatibility List) database is up-to-date. Additionally, upgrading ESXi to a supported release might resolve the issue.

If the health check result is empty, it could be due to the ‘vSAN HCL DB up-to-date’ or ‘SCSI controller is VMware certified’ checks not being in the green (OK) state. I recommend addressing any warnings from these two health checks first.

In the event of a test failure, I recommend searching the VMware Compatibility Guide (VCG) to identify a desired driver and firmware pair. Update the driver or firmware accordingly. It’s possible that the hardware (or its firmware) or the driver release was recently added to the VCG.

Keep in mind that vSAN requires both the controller driver and firmware to be a certified pair for compliance. If the driver is not currently certified with the current controller firmware, an error will be reported. In such cases, you should either update the controller driver or update the controller firmware to ensure they form a certified pair.

vSAN

vSAN HCL Health – vSAN HCL DB up-to-date check

The vSAN HCL Health – vSAN HCL DB up-to-date check ensures that the VMware Compatibility Guide (VCG) database used for hardware compatibility checks remains current. Unlike direct checks against the HCL on the VMware website, these VCG checks are performed against a local copy stored on the vCenter Server.

Initially, when the health feature is released, it includes a snapshot of the HCL database that was up-to-date at that time. However, over time, this local copy becomes outdated. This is particularly relevant as new certifications with partners are added to the VCG.

Hardware vendors frequently update their drivers and submit them for VMware certification. Older drivers may even be removed from the VCG due to identified issues. Therefore, maintaining an up-to-date local copy of the HCL database is crucial for stability and compatibility in vSAN environments

Purpose of the vSAN HCL Health – SCSI Controller on vSAN HCL check

This health check provides details about the local storage I/O controller on the ESXi hosts participating in the vSAN cluster.

The purpose of this check is to verify that this crucial hardware component is listed in the VMware Compatibility Guide (VCG). The lookup process is based on the PCI ID information, including Vendor ID, Device ID, SubVendor ID, and SubDevice ID

Understanding the error state

This health check employs thresholds of 90 or 180 days of age to display a warning or error, respectively. While these thresholds are relatively high, it is strongly recommended to keep the database updated as frequently as operationally feasible.

Additionally, starting from version 6.7 U3, this warning will also trigger if one or more hosts are running an ESXi build that is not listed on the Hardware Compatibility List (HCL). This includes scenarios where a custom hot patch or a pulled build is in use

Troubleshooting an error state

  • If your vCenter Server has Internet access:
  • Click the “Get Latest Version Online” button.
  • Note that vCenter Servers without DNS support will encounter an error message: “Unable to get the latest HCL database version online.”
  • If your vCenter Server doesn’t have Internet access:
  • Obtain the latest HCL (Hardware Compatibility List) database.
  • Save the downloaded file as a .json file locally.
  • Use the “Update From File” button in vCenter Server to upload the local copy of the HCL database.
  • If this check fails:
  • Begin by updating the VCG database.
  • It’s possible that the hardware (or its firmware) was recently added to the VCG.
  • If the device is still not listed in the VCG (and the health check fails), the vSAN environment may be at risk.

Ref: https://knowledge.broadcom.com/external/article?legacyId=2109262

Top comments (0)