lvs reports different utilization sizes on peers with thin-lvm

lvs works at the block layer, so at best, its utilization reporting is simply a "guess" with thinly provisioned volumes. DRBD and its trim/discard policies can lead to discrepancies and confusion between nodes.

A thin-lvm volumes starts out completely "unmapped". If we read back from a block that is "unmapped" we get back "virtual zeros". Back before DRBD 8.4.7 we would have replicated those "virtual zeros" to genuine zeros on the peer disk. This behavior would have resulted in the SyncTarget node reporting 100% utilization as we would have written to every block on the peer disk. This behavior would have been undesirable.

With 8.4.7 an newer we updated DRBD to be trim/discard aware. So now, instead of writing genuine zeros to the peer disk, we have a zero data detection, and when we detect a zero data payload, we instead simply pass a trim/discard request to the peer. If for some reason there is an actual written/allocated range of zeros written to the primary, the peer would only get a discard for that data. This is likely the main reason for the discrepancy between the nodes.

Additionally, it can occur where thin-lvm has a "chunk size" of, for example, let's say 64k. Now we write 4k to some particular offset. Only 4k was actually written, but thin-lvm is going to regard that whole 64k chunk as allocated. This may also help explain why the thin-lvm usage reports possibly much higher than actual. 

To try and get things again consistent you can attempt to run a `fstrim` against the mounted filesystem on the primary when all nodes are connected and healthy. However, there is no guarantee that this will improve things. It all depends on the file-system and other parts of the storage subsystem.

Created by DJV 2021-10-20