6.20.2009

VMware vSphere 4 / Veeam Backup 3.1 Followup

We experienced some drastic speed problems trying to backup our ESX 4 virtual machines with Veeam Backup. Where we'd previously had backup speeds around 40MB/sec for a full backup and easily 3-4x that on the follow-up incrementals, we were now stuck at 20-30MB/sec for ALL backups.

A quick post on the Veeam forums, and a week's worth of troubleshooting and test later, we're finally back to backup speeds that we previously saw. One further change yielded even better overall performance.

Here's what we know:


  1. Backups that use Service Console agents are throttled by the Service Console disk reads. Any time you go through the service console, you're going to top out at 20-30MB/sec

  2. VCB backups using the SAN are extremely fast if you've got the backup storage speed to handle it.

  3. Use Direct to Target mode in Veeam Backup to workaround some Samba/CIFS issue that Veeam's still trying to fix



We use a Linux / Samba server as our backup target. It's a whitebox 16-drive RAID50 array connected to a 3ware 9650SE controller. CentOS 5.3 is installed on the box and we've done no tuning to the box (no TCP settings, no Samba settings, etc).

The VCB Backup mode in Veeam can be configured to backup to a Samba share just by entering the typical \\server\share. When we backed up using that method, we found that after the data portion of the backup had completed there was a long hang as logs and config files were transferred. The larger the guest, the longer the pause.

That pause DOESN'T seem to happen if you use Direct to Target mode.

In this mode, add the Samba server and point to a file system path on that server (so, /mnt/backups/veeam). Once that was done, the hang was completely gone.

We're using the EVA Management server (a required server if you're using EVA SANs, and largely doing nothing useful) since it had Fibre Channel HBAs already installed in it.

After all of this was done, we're now seeing full backups being completed at around 85MB/sec (gigabit ethernet is the bottleneck at this point), and incrementals are finishing up at around 120-200MB/sec. The speeds vary because of the various amounts of unused disk space in the guests.

Veeam does a terrific job with both their product, and their support. The product manager was one of the first people to respond in the thread and got us doing useful tests that narrowed down where the problem was. Seems we're all waiting on VMware to fix it or announce "that's the way it is", but it's not really something we're too concerned about any longer.

9 comments:

Anton Gostev said...

Steve, great post! Best thing about using Linux box as backup target, is that in this case Veeam Backup automatically deploys non-persistent agent on the target Linux box which allows for network traffic compression. Also, when using "Best" compression, processing gets offloaded to this remote agent too, instead of loading Veeam Backup server! Distributed computing at its best ;)

ElGreco said...

hi!
if understood well, you installed veeam backup and VCB on the evamanager server, so you could use fiber to backup and then lan transfer to the linux box?

Steve Philp said...

ElGreco - Exactly right.

The only "special" thing we did was to use Veeam's direct connection to the Linux server instead of a Samba share on that Linux server.

When we used the Samba share, we encountered a lengthy stall at the end of the backup as it read through files. That stall doesn't happen when using the direct connection.

ElGreco said...

Thanks for your reply!

i have 2 more Q

1 Do you have to present the SAN disks on the evamanager server as local disks (without automount of course)?

2. Im using a opensuse with 5 1gb sata disks in LVM.
i read that you have 16disks on centos. can you suggest me what to change on my linux box for the disk to be a bit faster e.g what controler should i buy etc etc.

Steve Philp said...

ElGreco - Yes, you'll need to present the VMFS LUN to the EVA management server. It's a quick two-step process in CommandView EVA (create the host, present the LUN to the host).

To prevent the LUN from showing twice in Windows, you'll want to download/install the MPIO DSM and control applet from HPs site. Prior to installing those pieces, we saw two copies of the LUN on the Windows host, both in black. Once they were installed, the LUN reverted to a single copy (in Disk Management) and was then colored blue (healthy).


We used a 3ware 9650SE-16 controller on our machine because of the number of disks. If you decide to use that (or similar cards) DEFINITELY buy the battery for the cache. It enables a couple speed options that result in about a 2x increase in write speed.

We did the hardware controller simply because of the number of drives we put in the machine.

Our backups are NETWORK limited, and I would think that once you get past 2 or 3 disks, you'll see the same. A full backup with Veeam runs at about 85MB/s -- about 85% utilization of the network adapter on either the EVA Management server or the Linux box.

NIC Bonding doesn't help in this case because you're only talking between the two hosts, so there's no load balancing that can happen.

With disks hitting 50-60MB/s each, two or three would handle it. Anything more than that and you'll putting them in for RAID levels or capacity needs.

I will say that we saw a dramatic decrease in the capacity requirements for our backups. Our SAN holds approx 1TB of ESX4 guests that are backed up nightly with Veeam. We're currently setup to retain the last 60 days worth of backups. In total, those backups take up 3.6TB of space! The de-dupe and compression help a bunch.

ElGreco said...

>>60 days worth of backups. In total, those backups take up 3.6TB ?
Only 3.6TB ? u do full backup or incremental?

Since i installed a new vsphere4 my backups-restores are really slow, even clone the VM between hosts is slower.

I will try to do backups using VCB and vranger vcb plugin.

a good link how to install vcb with some backup products
http://viops.vmware.com/home/docs/DOC-1133

ElGreco said...

Ok
I installed veeam backup and fastSCP ver 3.1.1 trial

on the server i installed multipath and presented a storage i have on my EVA4x

Veeam is doing a backup of a windows xp and its really slow....
0 of 1 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 10,00 GB
Processed size: 4,70 GB
Avg. performance rate: 17 MB/s

Start time: 7/9/2009 4:27:38 μμ
Time remaining: 00:05:32
+++++++++++++++++++++++++++++++++++====
8 of 8 files processed

Total VM size: 10,00 GB
Processed size: 4,80 GB
Avg. performance rate: 17 MB/s
Backup mode: VCB SAN

Start time: 7/9/2009 4:27:44 μμ
Time remaining: 00:05:33

Completing current object backup process

================
what can i do to improve from 17MB/s to your stats?

Steve Philp said...

ElGreco -

Which EVA model do you have? What's the Fibre Channel bandwidth (1Gb/2Gb/4Gb/8Gb)?

A couple things that helped us:

* Defragment the guest disks. We used JkDefrag (now called MyDefrag) for this.
* Run sdelete on the guest disks. This will zero out the unused blocks so that they're not backed up by Veeam Backup.

Both of those things gave us a speed increase on the backups.

Next, try copying a file using SMB from the EVA management server to the backup target server (the Linux/Samba box). What kind of speed do you get?

It looks like I forgot to answer your previous question about the backup sizes. Using Veeam Backup, only the initial backup is a "traditional" full backup.

My understanding of the backups that occur after that is that they update the original backup file (so you have just one BIG file), and write differences to the nightly file. So, it's a sort of incremental backup.

The incrementals will go MUCH faster that the initial backup -- we see speeds 2-3x faster on the nightlies than on the initial backup.

Hope this helps!

ElGreco said...

I have eva 4x and on my evamanager i have 2Gb FC

I will do some more tests to see the performance. Im still disappointed on how vmware changed the backup policy...

i will let you know what my tests show asap