lvs reports different utilization sizes on peers with thin-lvm

lvs works at the block layer, so at best, its utilization reporting is simply a "guess" with thinly provisioned volumes. DRBD and its trim/discard policies can lead to discrepancies and confusion between nodes.

A thin-lvm volumes starts out completely "unmapped". If we read back from a block that is "unmapped" we get back "virtual zeros". Back before DRBD® 8.4.7 we would have replicated those "virtual zeros" to genuine zeros on the peer disk. This behavior would have resulted in the SyncTarget node reporting 100% utilization as we would have written to every block on the peer disk. This behavior would have been undesirable.

With 8.4.7 an newer we updated DRBD to be trim/discard aware. So now, instead of writing genuine zeros to the peer disk, we have a zero data detection, and when we detect a zero data payload, we instead simply pass a trim/discard request to the peer. If for some reason there is an actual written/allocated range of zeros written to the primary, the peer would only get a discard for that data. This is likely the main reason for the discrepancy between the nodes.

Additionally, it can occur where thin-lvm has a "chunk size" of, for example, let's say 64k. Now we write 4k to some particular offset. Only 4k was actually written, but thin-lvm is going to regard that whole 64k chunk as allocated. This may also help explain why the thin-lvm usage reports possibly much higher than actual. 

To try and get things again consistent you can attempt to run a `fstrim` against the mounted filesystem on the primary when all nodes are connected and healthy. However, there is no guarantee that this will improve things. It all depends on the file-system and other parts of the storage subsystem.

Created by DJV 2021-10-20