CloudBoot and Integrated Storage Checklist
Package install checks:
- Are the integrated storage, cloudboot images and CP server RPMs installed on the CP server?
rpm -qa | egrep "onapp-store-install|onapp-cp-[4,5,6]|onapp-ramdisk"
- Are they the most up to date versions? https://docs.onapp.com/rn/updates-and-packages-versions
Pre-Checks of Cloudboot environment:
- Does CP server have an additional NIC configured for HV management?
- What is the IP address of this HV management NIC?
- Is cloudboot enabled in the Settings -> Configuration wizard?
- Do the fields 'Static Config Target' and 'CP Server Cloudboot Target' have the IP address as determined in step 2?
- Have the cloudboot IP addresses + netmask been entered in the Hypervisor Settings window? (Settings -> Hypervisors + Cloudboot IPs menu)
- Does address of step 2 fall within the network IP address range?
Pre-check of dhcp, tftp and nfs server settings:
- Verify the contents of the /home/onapp/dhcpd.conf. Does it contain the correct network settings as identified above? (later versions may point to /onapp/configuration/dhcp/dhcpd.conf instead, should include statement in /etc/dhcp/dhcpd.conf).
- Is dhcpd running? (service dhcpd status)
- Verify that tftpboot service is enabled and running ('grep disable /etc/xinetd.d/tftp' should be 'no', and 'service xinetd status').
- Verify that ifs exports are setup correctly:
PROMPT> cat /etc/exportswhere <MGT SUBNET> should match the HV management subnet setup in the first instance.
/onapp/templates <MGT SUBNET>(ro,no_root_squash)
/tftpboot/export <MGT SUBNET>(ro,no_root_squash)
/tftpboot/images/centos5/diskless/snapshot <MGT SUBNET>(rw,sync,no_root_squash)
- Verify that nfs service is running correctly (service nfs status).
Pre-check of PXE boot templates:
- Do the default template files exist?
PROMPT> ls -1 /tftpboot/pxelinux.cfg/template-*
- Do the servers boot off a NIC other than eth0? If so, you should edit the default template and add the following:
ETHERNET=<ETH DEV> parameter: PROMPT> cat /tftpboot/pxelinux.cfg/default
append initrd=images/centos7/ramdisk-default/initrd.img NFSNODEID=default NFSROOT=192.168.1.1:/tftpboot/export/centos7/default CFGROOT=192.168.1.1:/tftpboot/images/centos5/diskless/snapshot ADDTOBRIDGE=mgt pcie_aspm=off selinux=0 cgroup_disable=memory net.ifnames=0 biosdevname=0
We recommend that you make the same change to the /tftpboot/pxelinux.cfg/template-default file to make the change persistent across CP UI.
Boot time Visual check of a server from the server console:
- Do you see the server attempting to PXE boot and acquire a DHCP address before looking for an internal storage drive?
- Is it trying to DHCP off the correct ethernet device that is attached to the CP server management subnet?
- Does it successfully acquire an IP address that matches the one you entered in the UI?
- Run 'grep DHCP /var/log/messages' and verify whether there is a recent entry in there for the new HV, its mac address and the assigned IP address.
- Does the MAC address appear in the /var/lib/dhcpd/dhcp.leases file on the CP server?
Enabling additional debug for the PXE boot process:
- Edit the /etc/xinetd.d/tftp file and adjust server args to match the following line:
server_args = -v -v -s /tftpboot
- service xinetd restart.
You should now see additional logging submitted to the /var/log/messages log in the event of server bootup.
All logs from the HV are available on the CP in /var/log/messages.
Post-bootup analysis once server has successfully booted to a prompt:
- What is the output from ifconfig at the terminal?
- Does the MAC address show up in the drop down menu in the 'Add new cloud boot hypervisor' wizard?
- Can you login to the server over ssh from the CP server? Using IP address of NIC in output from 1 above, issue:
ssh root@<IP ADDRESS>after accepting the host key, it should login without requiring a password.
- If you can select the MAC address and go through to the next stage of the wizard, try booting the node as both XEN and KVM in different tests, check that both nodes come up cleanly with the correct IP address assigned. Check also that after boot up they show as active in the CP server UI.
Verifying Integrated Storage configuration and status
- Verify that all HVs in the same zone have all NICs assigned to the SAN attached to the same logical subnet.
- Ensure that all HVs in the same zone are of the same type (XEN or KVM).
- Use a different channel for the storage SAN between different zones to ensure connections are logically separate.
- When adding new HVs, remember to select the 'Format disks' option in the UI to initialize all new drives.
- Check the 'Diagnostic page' in 'Integrated Storage' section in UI whether it has no errors.
- Logon to one of the Hypervisors and use the storage CLI to list the storage nodes that are visible:('onappstore nodes')
- Make sure that all the nodes are accessible over IP.
Storagenodes not visible in the UI or from the CLI:
- Start by disabling all hardware passthrough for the HVs and see if the drives appear. If they do, then the issue probably lies in the hardware configuration.
- Try each of the hardware enablement options: memory alignment, no_pci_disable and see if that makes a difference for detecting drives.
- Contact the OnApp Support team supplying the detailed hardware description including motherboard, storage controllers, drives attached.
- When booted as KVM, logon to the Hypervisor and verify what drives are visible locally ('ls -lh /dev/sd; ls -lh /dev/cciss/').
- When booted as KVM, logon to the hypervisor and verify whether the IO controller VM is running ('virsh dominfo STORAGENODE').
Storage nodes only partially visible:
- Verify all network connections, make sure they are connected to the correct logical subnets.
- Verify any bonded connections and make sure that all members of the bond are connected to the same logical subnet.
- Make sure all HVs are utilising the same storage channel.
- Check if the same MTU set for all HVs and jumbo frames are enabled on switches environment.
- Check if multicast is allowed on your network equipment.
- Another possible root of this case might be not enough free space inside the storage controllers, especially if over-commit is enabled.