Skip to content
Snippets Groups Projects
Commit 1378554d authored by Alberto Colla's avatar Alberto Colla
Browse files

Update restore-ceph-from-mon-disaster.rst

parent 1638e2c3
No related branches found
No related tags found
No related merge requests found
......@@ -206,4 +206,60 @@ Another command is `lsblk`::
disk
├─nvme2n1p1 259:2 0 3.9T 0 part
└─ceph--5fd8f96e--2ccb--460f--87b8--359ff81cff8a-osd--block--5fd8f96e--2ccb--460f--87b8--359ff81cff8a 253:0 0 3.9T 0 lvm
└─ceph--5fd8f96e--2ccb--460f--87b8--359ff81cff8a-osd--block--5fd8f96e--2ccb--460f--87b8--359ff81cff8a 253:0 0 3.9T 0 lvm
In our case we need to zap (delete everything including partition table) device nvme3n1p1, the device that was added but not initialized, and then re-add it to ceph.
To recover this, firstly fix the keys on osd units.
On a mon, get client.bootstrap-osd key and client.osd-upgrade::
ceph auth get client.bootstrap-osd
ceph auth get client.osd-upgrade
If not present, create them using the commands::
ceph auth get-or-create client.bootstrap-osd mon "allow profile bootstrap-osd"
ceph auth get-or-create client.osd-upgrade mon "allow command \"config-key\"; allow command \"osd tree\"; allow command \"config-key list\"; allow command \"config-key put\"; allow command \"config-key get\"; allow command \"config-key exists\"; allow command \"osd out\"; allow command \"osd in\"; allow command \"osd rm\"; allow command \"auth del\""
Replace the key value in following files on EACH osd unit::
/var/lib/ceph/bootstrap-osd/ceph.keyring <---- client.bootstrap-osd key
/var/lib/ceph/osd/ceph.client.osd-upgrade.keyring <----- client.osd-upgrade key
Those keys were created when new mons were installed.
Now, FOR EACH OSD:
- ssh to the OSD's and zap nvme3n1p1::
!!! Please note this will DESTROY all data on nvme3n1p1 and CANNOT be recovered. !!!
!!! Please double check before proceeding. !!!
ceph-volume lvm zap /dev/nvme3n1p1 --destroy
- recreate the partition::
parted -a optimal /dev/nvme3n1 mkpart primary 0% 4268G
- Now go back to juju client machine and use juju to remove it from juju's internal db::
juju run-action --wait ceph-osd/X zap-disk devices=/dev/nvme3n1p1 i-really-mean-it=true
Then check juju status.
juju status can be forced to update with::
juju run --unit ceph-osd/X 'hooks/update-status'
At this stage we should see the osd status is back to normal (green).
If not, let's run the following commands::
juju run-action --wait ceph-osd/21 zap-disk devices=/dev/nvme3n1p1 i-really-mean-it=true
juju run-action ceph-osd/21 add-disk osd-devices=\"/dev/nvme3n1p1\""
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment