Sometimes you might want to examine what happens to a system when an I/O error occurs on the backing storage. In these cases, it is useful to know how to create a testing disk with which you can mount and simulate failures and errors. This will prepare you for when these events happen on a disk in your production deployments. This article describes one method that you can use to do this.
-
Add your testing disk, either physically or virtually depending on your use case, and take note of the device name assigned to it. For ease of reference, define that name as a shell variable. The following example uses
DEV
.[root@linbit-0 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 19.5G 0 disk └─sda1 8:1 0 19.5G 0 part / sdb 8:16 0 8G 0 disk sdc 8:32 0 4G 0 disk [root@linbit-0 /]# DEV=/dev/sdc
-
Using a
blockdev
command to determine the device size, create a linear target device on top of the backing disk with the device mapper, passing the size into its table. The following example uses the namefaildev
as the simulated device name.[root@linbit-0 /]# dmsetup create faildev --table "0 $( blockdev --getsz $DEV ) linear $DEV 0"
-
After creating the device mapper device, load another table that contains some number of
error
target sectors, which will trigger an I/O error when they are accessed.The table in the following example uses 100,000 error sectors; specify an amount that is appropriate for your device size and use case. This example table also places these sectors in the middle of the device mapper device, but you can place error sectors anywhere that is most applicable to the failure scenario that you are trying to simulate.
For a 4GiB (8388608 sector) disk named
/dev/sdc
, with 100000 error sectors in the middle of the device, the table for the device mapper would be structured as follows:0 4194303 linear /dev/sdc 0 4194303 100000 error 4294303 4094305 linear /dev/sdc 4294303
In the example above, the linear segments are structured with the starting sector first, then the number of sectors in that segment, and the offset (the same as the starting sector in this case) as the last column. The error segment is similar except it omits the offset column.
You can create a text file of your table to direct to the
dmsetup table
command, or pipe viaSTDIN
todmsetup load
such as in the example below:echo -e 0 "$(( $( blockdev --getsz $DEV )/3 )) linear $DEV \ 0 $(( $( blockdev --getsz $DEV )/3 )) 100000 \ error $(( $(( $( blockdev --getsz $DEV )/3 ))+ \ 100000 )) $(( $( blockdev --getsz $DEV ) - \ $(( $(( $( blockdev --getsz $DEV )/3 ))+ 100000 )) )) \ linear $DEV $(( $(( $( blockdev --getsz $DEV )/3 ))+ 100000 ))" \ | dmsetup load faildev
📝 NOTE: While the new table has been loaded, it is not yet active. You can show the table by using the
--inactive
flag in thedmsetup table
command. Output from thedmsetup
command also shows the device major and minor numbers, 8 and 32.[root@linbit-0 /]# dmsetup table faildev --inactive 0 2796202 linear 8:32 0 2796202 100000 error 2896202 5492406 linear 8:32 2896202
-
For some test cases, it might be necessary to prevent the backing storage that the device mapper device is on top of from being accessed. This ensures that only the device you created, with the error targets included, will be used by any LVM-related entity during your testing.
You can do this by modifying the filters in
lvm.conf
. First, create a copy of the original configuration file for easy rollback, then create the file with the configuration changes, and then replacelvm.conf
with the updated configuration.[root@linbit-0 /]# cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.copy [root@linbit-0 /]# lvmconfig --type current \ --mergedconfig --config 'devices \ { filter=["a|/dev/mapper/faildev|", \ "r|.*|"] global_filter=["a|/dev/mapper/faildev|", \ "r|.*|"] }' > /etc/lvm/lvm.conf.tmp [root@linbit-0 /]# mv /etc/lvm/lvm.conf.tmp /etc/lvm/lvm.conf mv: overwrite '/etc/lvm/lvm.conf'? y
-
Now you are ready to use the device mapper device. In this case you will simply create a file system on the device and mount it locally. This created device can be referenced in any configuration just as you would with the backing disk. So you might use the device in the disk section of a DRBD resource configuration file, or add it to an LVM volume group which you then add to a LINSTOR storage pool. In any case, you will complete your intended configuration using the device mapper device so you can begin to use the disk.
[root@linbit-0 /]# mkfs.xfs /dev/mapper/faildev meta-data=/dev/mapper/faildev isize=512 agcount=4, agsize=262144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=0 inobtcount=0 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [root@linbit-0 /]# mkdir /mnt/io_test [root@linbit-0 /]# mount /dev/mapper/faildev /mnt/io_test
-
Now that you have mounted the device, perform a
dmsetup resume
to make the previously-loaded error target table active.[root@linbit-0 /]# dmsetup resume faildev
📝 NOTE: Activating this table will allow disk I/O errors to begin to occur, so enter this command only when your test environment has been configured to your satisfaction and you are ready to encounter such errors.
-
So that an I/O error happens in a timely fashion, you can use a
fio
command to perform random writes to the device, over a time period of 60 seconds. But any preferred means of accessing the storage should work here provided that the I/O will reach the disk (as opposed to occurring in a system buffer or cache).[root@linbit-0 /]# fio --filename=/mnt/io_test/write_test.out --rw=randwrite \ > --direct=1 --bs=8k --ioengine=libaio --runtime=60 --numjobs=1 --time_based \ > --name=ran_write --iodepth=8 --size=$( blockdev --getsize64 $DEV )
-
If the
fio
command encounters one of theerror
target sectors on the disk during a write operation, it will output accordingly:fio: pid=14279, err=5/file:io_u.c:1803, func=io_u error, error=Input/output error
At this point, you can examine your configured environment to learn the impact of the I/O error.
-
If you want to toggle the disk back to a working state, simply reload and resume the fully
linear
target table.[root@linbit-0 /]# dmsetup reload faildev --table "0 \ $( blockdev --getsz $DEV ) linear $DEV 0" [root@linbit-0 /]# dmsetup resume faildev
-
Finally, to remove the device mapper device, use the
dmsetup remove
command. Be sure to roll back to your originallvm.conf
file to remove the filter that you created earlier.[root@linbit-0 /]# mv /etc/lvm/lvm.conf.copy /etc/lvm/lvm.conf mv: overwrite '/etc/lvm/lvm.conf'? y
Created 2024/05/20 - JAI
Reviewed 2024/05/20 - DJV, MAT