Hi all,
this is possibly more for my own reference, but if it helps others then i'm happy to share.
Recently i've been experiencing issues with the following, inherited, system:
IBM XIV SAN storage
IBM BladeCenter H with HS22 & HS23 blade servers
QLogic 81xx Converged Network Adapter (CNA)
Fibre Channel over Ethernet (FCoE)
vSphere 5.0
As it was something inherited from another team that merged after an internal reshuffle, it was pointed out that the entire VM farm experienced lags & latency, server pauses & all sorts of slow performance. Investigation also showed that we were experiencing datastore disconnects as the FCoE connection effectively disappeared. What ensued was 2 solid months of investigative work with IBM (as we originally believed the fault to lie with the XIV) and VMware.
To cut a long story short, we eventually found that the QLogic driver (934.5.6.0-1OEM.500.0.0.472560) was at fault, as per http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2014323
We had been advised to do an upgrade to the latest version of vSphere 5.0, but as we were also planning an upgrade to 5.1 in the near future we brought that forward and skipped ahead. Without realising it we also solved the primary issues.
5.1 uses QLogic driver version 902.k1.1-9vmw.510.0.0.799733, provided by VMware themselves rather than being an OEM version. Since the upgrade we've not experienced any performance issues, LUN disconnects, and all appears to be working much better. The only custom parameter we've deployed is to increase the queue size from the default of 32 up to 64.
Hope this helps someone else too!