Hi!
I administer a vSphere 5.5 environment using IBM Tivoli Storage Management for Virtual Environments (TSM-VE) as our backup solution. TSM-VE uses CBT to make incremental-forever backups of our VMs, meaning that it only backs up the entire VM once; all subsequent backups are incrementals. We backup our VMs once a day.
I'm currently investigating why certain Windows VMs in our vSphere environment generate huge incremental backups. During my investigation I hit on something I don't understand.
As a quick-and-dirty test, I did a file scan on a VM, to list all files that had been modified on the VM during the last 24 hours. I then added up the total file size of all those modified files. Then, I compared this combined file size with the size of the incremental TSM-VE backup for that day. I repeated this test on a number of VMs, smaller as well as larger ones.
I had expected the combined file size to be much larger than the size of the incremental backup. After all, the fact that a file is modified does not mean that the entire file has changed (meaning that all CBT blocks need to be backed up). I expected a CBT backup to be more efficient, size-wise, than an incremental file-based backup.
Instead, I found out that on all VMs, the incremental TSM-VE backup was consistently 1.5 to twice the size of the combined modified file size, exactly the opposite of the result I expected.
I've tried to think of a few things that could cause this discrepancy.
1) In-guest disk defrag. This would change the blocks without changing the files, messing up the way CBT works. However, there are no scheduled or unscheduled defrags on our VMs.
2) The files on the VM are smaller than the CBT blocks. That could cause a small file to mark a larger CBT block as changed. However, as I understand it, CBT blocks are usually quite small (and not the same as VMFS blocks)
What am I missing here? Is there some other process that changes the VMDK storage blocks of my Windows VMs without changing the actual files? Is my quick-and-dirty file scan too simplistic? I really hope that someone can explain this to me, thanks!