Monday, November 26, 2012

Poor iscsi Performance to Equalogic Sans

Last week before the Thanksgiving break, I had to deal with an issue on our VCenter that was sending me chasing ghosts.  I had first heard about a problem when some of the users of a particular VM mentioned that they were experiencing poor performance.  I could not find anything initially wrong.  The next day, the VM was locked up on the console.  It is a Linux VM that was showing hardware errors on the console.  I initiated a reboot to no avail.  I thought about moving the machine files to another SAN to eliminate that possibility since it was the only VM running on our 2 and a half year old Dell Equalogic.  What I ended up finding was that the storage VMotion was terribly slow.  It eventually would bomb out after a couple hours.  This particular VM is only 300gb.  We found that the disk utilization on this VM was sky high... 2000ms and higher.

I started all sorts of various troubleshooting.  We found that on the network on one particular port that dealt with the SAN this VM was on, that Jumbo frames was not turned on.  After setting this to on I initiated the storage VMotion again.  This time it was successful... after 8 hours.  Obviously a problem still loomed.  But at least the highly utilized VM was working properly again.

What I found after a few days of troubleshooting was doing a storage Vmotion to our two Dell Equalogic sans was fine, but doing a storage Vmotion from the Equalogics was terribly slow.  I knew we had a disk reading problem on these sans.

This is where I contacted Dell and downloaded SAN HQ from Dell.  I highly recommend having SAN HQ on your network and monitoring your Dell Equalogics.  It has so many metrics that it is measuring live and archiving that it really gives you a clear picture of what is happening on your Equalogics.  It is a bit daunting at first but take your time and learn it... it quickly becomes your friend.

What SAN HQ told me was that there was a high disk latency on the read cycle of the files from the sans.  It was averaging 102ms.  I did some more digging and found a couple good articles that lead me back to the VMware side of things.  Basicll, by turning off the Delayed Acknowledgement (ACK) on each Host in the VCenter, it resolved the problem!  Average disk latency dropped to 5ms or less!  Steps on how to turn off Delayed ACK and a good explanation lie in the following articles.

http://www.modelcar.hk/?p=5768

http://virtualgeek.typepad.com/virtual_geek/2011/03/performance-art-not-science-with-delayedack-10gbe-example.html

Happy Virtualizing!

No comments:

Post a Comment