Issue —
- All VM's on a hypervisor are showing offline in the CP interface but they are actually online.
Environment—
- OnApp version - 3.2.2
- Hypervisor - Xen, KVM - Static - Federation HV
- Storage Type - Local storage
Resolution—
Verify SNMP is running properly on the HV and reporting status to the CP
1. Check the /onapp/interface/log/production_snmp_stats_runner.log file to see if the hypervisor is checking in
tail -f /onapp/interface/log/production_snmp_stats_runner.log
2. If the output from the above command shows the following error then there is likely a problem with the snmp process
[INFO][28784] 2014-05-29 14:14:14 +0100 L1 [HV: 1] undefined method `split' for nil:NilClass
3. To verify the snmp issue ssh into the hypervisor and check the snmp process to see if it is running
ps aux | grep snmp
4. There should be an snmpd and snmptrapd process running. If they are not running then they will need to be started by running
/etc/init.d/snmpd restart
/etc/init.d/snmptrapd restart
5. After restarting the daemons go back to the CP and see if the VM's are showing online. If not check the /onapp/interface/log/onapp.err file for any errors
tail -f /onapp/interface/log/onapp.err
6. If the following error shows then there is a network issue between the CP and hypervisor.
Timeout: No Response from udp:10.25.0.5:161.
7. Telnet from the CP to the HV on port 161 to verify the port is open
telnet <HV_IP> 161
8. If it connects then check the MTU setting on the NIC that is used to connect to the HV
# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:16:3E:6F:F7:9E
inet addr:10.25.0.4 Bcast:10.25.0.255 Mask:255.255.255.0
inet6 addr: fe80::216:3eff:fe6f:f79e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3104352 errors:0 dropped:0 overruns:0 frame:0
TX packets:3319144 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:587530312 (560.3 MiB) TX bytes:1412152936 (1.3 GiB)
Interrupt:32
Then log into the HV and check the same on the nic that snmp is using to report
# ifconfig eth2
eth2 Link encap:Ethernet HWaddr AC:16:2D:B9:21:C1
inet addr:10.25.0.5 Bcast:10.25.0.255 Mask:255.255.255.0
inet6 addr: fe80::ae16:2dff:feb9:21c1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:19863436 errors:1 dropped:0 overruns:0 frame:0
TX packets:19592604 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8089028444 (7.5 GiB) TX bytes:4331136805 (4.0 GiB)
9. If the MTU is different change them so they match. If the HV is not using the NIC for any other purpose then the MTU on the HV can be set to 1500. Otherwise if the CP NIC supports it you can change the MTU on the CP NIC to 9000.
To change MTU on the fly run "ifconfig eth2 mtu 1500"
To change the NIC setting so the MTU gets set to 1500 on reboot, edit the /etc/sysconfig/network-scripts/ifcfg-eth2 file and update the MTU setting in the file appropriately.
Cause—
- The differing MTU values can cause communication errors between the servers. The values should match to ensure there isn't any packet corruption, data loss.
Comments
1 comment
If the issue is not resolved using alll above magic, please change snmp protocol to tcp in /onapp/interface/config/on_app.yml
https://docs.onapp.com/display/32AG/Advanced+Configuration+Settings
Just add snmp_stats_protocol: tcp to /onapp/interface/config/on_app.yml and restart onapp daemon.
Please sign in to leave a comment.