So, first let me define some terms….the Cisco VIC is also called Palo card/adapter. The current generation of Palo is the VIC-1240 and VIC-1280. The VIC-1240 is a built-in option on the M3 blades.
The VIC itself will show you if zoning and LUN masking are correct however the command LUNLIST is a very powerful debugging tool and one that should always be used when troubleshooting a boot from SAN problem. There are prerequisites that must be met for the VIC to show success during POST, and when POST does not show you what you expect, you don’t always know where to start!? Is the problem in the profile, the Fabric Interconnect, the upstream FC switch, or the array itself? It’s this kind of thing that makes san engineer frustrated with boot from SAN and why many people tend to stay away from using non-persistent storage.
Have no fear though, debugging boot from SAN isn't that hard or complex if you know the general basics of how HBAs work, switch zoning, and switch masking.
Cisco UCS certainly makes boot from SAN tremendously easier but I believe there is always room for improvement, and this area is no different. So with UCS 2.0 Cisco introduced LUNLIST.
LUNLIST only works prior to the OS HBA driver loading. Once the driver loads, the VIC BIOS is no longer in control and will not return valid data.
To get to the command, you need to gain access to the UCS CLI and run the following:
- connect: connects to the VICs management processor
- adapter: the chassis/blade/interface card (example: connect adapter 1/1/1 will connect you to chassis 1, blade 1, adapter 1)
- attach-fls: attaches to the fabric login service of the adapter
- vnic: displays the vnic adapters for the Palo
Once you run lunlist, you see output similar to the below. This one is from a server where the end-to-end configuration is correct and the server can boot from SAN:
Now let’s break it apart and describe what you are seeing:
Incorrect LUN masking:
Here is the LUNLIST output from a server that is having an issue with incorrect LUN masking. The host has not been allowed access to the assigned LUN. The same problem would likely result if the host is not setup in the array at all, or if it was created on the array but someone mis-typed the host’s WWPN. Zoning is correct because the Nameserver Query Response succeeds (line 11) and returns a WWPN target that matches the WWPN target in the boot policy (line 5). The HBA successfully logged into the fabric and was able to see that a LUN of ID 0×00 is visible (line 9). But when the LUN is queried for additional information, it fails with “access failure” (line 7).
Incorrect Zoning:
The host is not zoned correctly. It is either in a zone by itself or not zoned at all. This is an easier one to troubleshoot because the host cannot see a LUN nor can it see any available WWPN targets. Look at lines 8 and 9 and notice that there is no response returned for either of these queries. Note that the PLOGI is unsuccessful (fc_id in line 5 is 0×000000) because the host was unable to successfully establish a session with the target.
Incorrect SAN Boot Target in the boot policy:
You can clearly see that the WWPN configured in the boot policy (line 5) does not match the available target found on the fabric (line 10). In this situation, the PLOGI (line 5) is once again unsuccessful because a session cannot be established between the host and the target.
Incorrect LUN ID in the boot policy
You can see the incorrect LUN ID into the boot policy for the server (line 7) and it does not match the LUN ID found on the fabric (line 9).
This example displays a properly configured host that has access to multiple LUNs presented.
6. When you run LUNLIST and the OS is up and running with the driver for the VIC loaded [which means LUNLIST won’t work] you will see the following: