Hello Everybody,
I am facing a strange issue with my Vsphere environment and VMware Technical support hasn't been able to help me so far. The case is still pending with them for about 2 weeks now.
We have ESXi 5.1 running on 10 HP blade servers. Vcenter server is a virtual machine. We were getting intermittent alarms of "host connection failure". The esxi host would go into NO Responding state for 2 or 3 seconds and then come back. Last week the problem got worse. The virtual machines on esxi server were losing network connectivity. All the esxi servers had this alarm "Host connection failure". I tried to ping 1 esxi server from another through ssh but it was showing host down.
We have Management network and storage (NFS) on VSS and VM network and VMotion on VMware VDS. From physical switch side, there was link aggregation (LACP) enabled for VMotion and Virtual Machine network. After reading about it, I found out that the Load Balancing protocol on VMNICS should be "Load based on IP Hash" when having LACP on physical Switch. I changed it to IP hash on my VSS and VDS and the problem got away for 2 days and it started to happen again.
Finally with no resolution from tech support, We removed LACP from the physical switch. I also changed the Load balancing protocol back to "Based on originating port id". It worked for 3 more days and the problem started to happen again. I am again getting "Host connection failure" alarms in vcenter. The host goes in not responding state for 2 or 3 seconds and comes back. It is intermittent. I am afraid it might again affect the entire production and would really appreciate any suggestion to fix this.
Now we do not have LACP on PSwitch. We have Management network and storage (NFS) network configured as Access Ports. We have Vmotion and VM network ports configured as Trunk Ports and i have assigned vlan id on the port group. Just a point to note, between the vmware virtual switches (VSS or VDS) and Physical cisco switches, there is an HP Switch (HP Virtual Connect). We are using NetApp NFS for shared datastores.
Many Thanks in advance.
AG