diff --git a/web/support/kb/ceph/ceph-backfill-impact.rst b/web/support/kb/ceph/ceph-backfill-impact.rst index d0e1b9eb87e659b37eb24716594366eadd60a6ef..f060b022a4aa83d7897e60c034de901ec7b23d5e 100644 --- a/web/support/kb/ceph/ceph-backfill-impact.rst +++ b/web/support/kb/ceph/ceph-backfill-impact.rst @@ -1,5 +1,5 @@ -Limit impact of backfill and recovery -===================================== +Throttle impact of backfill and recovery +======================================== Whenever a problem arises, like a broken disk or a pool in degraded state, Ceph tries to recover as quickly as possible. This, however, may have an @@ -15,6 +15,25 @@ or server node, you can execute:: $ ceph tell osd.* injectargs '--osd-client-op-priority 63' $ ceph tell osd.* injectargs '--osd-recovery-max-active 1' -You can verify the value of current settings with the command:: +Note that settings applied this way will be lost upon OSD server reboot +or restart of the ceph-osd service for a specific disk. If you need to +make settings permanent, you will have to act on Ceph configuration file. - $ ceph --show-config +You can verify the value of current settings (the default ones, active at +boot time) with the command:: + + $ ceph config dump + +Quick way to throttle speed of backfill and recovery is to combine some +of the settings previously show, and also act on `recovery sleep` parameters, +which are used to introduce a small lag between recovery operations (the longer +the value, the nicer you will be to your cluster users): default +value for `hdd` is `0.1s` while for `ssd` the default value is `0.0s`. +For example:: + + # full-steam for hdd quick recovery + $ ceph --cluster ceph tell osd.* injectargs --osd-recovery-sleep-hdd=0.0 --osd-max-backfills=8 --osd-recovery-max-active=8 --osd-recovery-max-single-start=4 + # kind-and-gentle recovery + $ ceph --cluster ceph tell osd.* injectargs --osd-recovery-sleep-hdd=0.1 --osd-max-backfills=2 --osd-recovery-max-active=1 --osd-recovery-max-single-start=1 + + diff --git a/web/support/kb/ceph/ceph-extend-rocksdb-partition.rst b/web/support/kb/ceph/ceph-extend-rocksdb-partition.rst new file mode 100644 index 0000000000000000000000000000000000000000..8773bd4e6e6df776646f9a157ab774dbbebf702d --- /dev/null +++ b/web/support/kb/ceph/ceph-extend-rocksdb-partition.rst @@ -0,0 +1,32 @@ +Extend RocksDB partition on OSD +=============================== + +Ok, so you have a nice Ceph cluster, based on Nautilusor later, and took +advantage of SSD/NVMe disks to create a logical volume to host RocksDB/WAL +for your spinning disks. You also followed the well known suggestion to have +such a LVM sized ~40GB, as larger sizes would be useless unless you scale up +to ~300GB. + +But things evolve, and Nautilus introduced a better way of handling extra +space possibily allowed to RocksDB/WAL. Indeed, ``ceph health detail`` shows +a lot of ``spillover`` so you decide to use all available space on fast disks +and you would like to go from 40GB RocksDB LVM size to, say, 80GB size. + +The trickiest part is identifying which LVM is associated to which OSD-ID: +the output of ``ceph-volume lvm list`` will provide you with this information. +In my case, I opted to upgrade all existing OSDs so I did not have to perform +actions serially for each OSD and used a bit of bash/awk/grep to make loops. +However, for each OSD here the steps: + + * extend LVM volume:: + lvextend -L80G <VGname>/<LVname> + * IMPORTANT: I needed to reboot my server, to make the OS aware of the change. + May be the same result could be obtained some other way, but I did not mind + rebooting. + * Stop the OSD, play some magic, resume OSD:: + $ systemctl stop ceph-osd@<OSDnumber> + # check following command really claims to be extending the partition + $ ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-<OSDnumber> + $ systemctl restart ceph-osd@<OSDnumber> + * force an OSD compact, to get rid of ``spillover`` message:: + $ ceph daemon osd.<OSDnumber>