From 7cbed75c46d7a0a92f70049353564958871bebed Mon Sep 17 00:00:00 2001 From: Alberto Colla <alberto.colla@garr.it> Date: Wed, 6 Oct 2021 11:11:17 +0000 Subject: [PATCH] Update restore-ceph-from-mon-disaster.rst --- .../ceph/restore-ceph-from-mon-disaster.rst | 202 ++++++++++++++++-- 1 file changed, 183 insertions(+), 19 deletions(-) diff --git a/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst b/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst index 4f8645c6..402cbf82 100644 --- a/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst +++ b/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst @@ -35,23 +35,187 @@ On the juju client machine, rebuild old mon map using existing OSDs with the fol .. doc:: ../../images/restore-osd.sh> -* create keyring with following from new MONs: (this is my example [3]) -- admin key, same on all mons, get one copy -- mon key, same on all mons, get one copy -- mgr key, different on all mons, get one copy from each -- put all the above into one file called keyring -- add proper permissions to mgr keys -ceph-authtool keyring -n mgr.juju-74bc65-default-40 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' -ceph-authtool keyring -n mgr.juju-74bc65-default-41 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' -ceph-authtool keyring -n mgr.juju-74bc65-default-42 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' -* copy local ./mon-store and keyring from juju client machine to a MON unit, e.g. under /tmp -* on that MON unit rebuild mon store, note that mon ids are new MONs' hostnames. also the order is important, the order needs to match "mon host" in /etc/ceph/ceph.conf on MON unit, otherwise mons cannot start and say it cannot bind IP etc -ceph-monstore-tool /tmp/mon-store/ rebuild -- --keyring /tmp/keyring --mon-ids juju-74bc65-default-42 juju-74bc65-default-40 juju-74bc65-default-41 -* on all MON units rename existing /var/lib/ceph/mon/ceph-*/store.db to /var/lib/ceph/mon/ceph-*/store.db.bak -* copy store-db in /tmp/mon-store to /var/lib/ceph/mon/ceph-* on all MON units, and chown ceph:ceph -R /var/lib/ceph/mon/ceph-*/store.db -* start all mon and mgr services -* start all osds -juju run-action --wait ceph-osd/17 start osds=all -juju run-action --wait ceph-osd/16 start osds=all -juju run-action --wait ceph-osd/15 start osds=all +keyring creation +----------------- + +Create a keyring file with the following data from the new mon's: + +- admin key: copy content of /etc/ceph/ceph.client.admin.keyring from any mon; +- mon key: copy content of /var/lib/ceph/mon/ceph-*/keyring for any mon; +- mgr key, copy content of /var/lib/ceph/mgr/ceph-*/keyring for *each* mon. + + +the keyring file will look like the following:: + + [mon.] + key = AQAr9TphSxvCFxAACOS8KkIROPsvgVCfcFjh1Q== + caps mon = "allow *" + [client.admin] + key = AQAmBTthVM5wEhAALJS9IEVTuRKiHYRUztxgng== + caps mds = "allow *" + caps mgr = "allow *" + caps mon = "allow *" + caps osd = "allow *" + [mgr.juju-74bc65-21-lxd-49] + key = AQAvBTth4HObLhAAlM140CBeuYrLRhxnuSwdKQ== + caps mds = "allow *" + caps mon = "allow profile mgr" + caps osd = "allow *" + [mgr.juju-74bc65-21-lxd-50] + key = AQAvBTthqwQ5LRAAISUJt9j4Qb3MZ5jn2B1SwQ== + caps mds = "allow *" + caps mon = "allow profile mgr" + caps osd = "allow *" + [mgr.juju-74bc65-21-lxd-51] + key = AQAtBTthU36NJBAAjMhYPoPcdDh5L6Coj2grqw== + caps mds = "allow *" + caps mon = "allow profile mgr" + caps osd = "allow *" + +(N.B.) if needed add proper permissions to mgr keys:: + + ceph-authtool keyring -n mgr.juju-74bc65-21-lxd-49 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' + ceph-authtool keyring -n mgr.juju-74bc65-21-lxd-50 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' + ceph-authtool keyring -n mgr.juju-74bc65-21-lxd-51 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' + + +Rebuild mon store +----------------- + +Copy local ./mon-store directory and keyring file from juju client machine to a mon unit, e.g. under /tmp:: + + juju scp -- -r mon-store.new keyring ceph-mon/51:/tmp + + +Log on that mon unit and rebuild mon store with the following command: + + ceph-monstore-tool /tmp/mon-store/ rebuild -- --keyring /tmp/keyring --mon-ids juju-74bc65-21-lxd-49 juju-74bc65-21-lxd-50 juju-74bc65-21-lxd-51 + +**Important** The order of --mon-ids is important: it needs to match "mon host" order in `/etc/ceph/ceph.conf~ on mon units, otherwise mon's won't start! + +Therefore, look for the IP addresses in `/etc/ceph/ceph.conf`, e.g.:: + + mon host = 10.7.5.131 10.7.5.132 10.7.5.133 + +Compare the order with the IP addresses shown by "juju status ceph-mon":: + + # juju status ceph-mon + ... + Unit Workload Agent Machine Public address Ports Message + ceph-mon/49 active idle 21/lxd/29 10.7.5.132 Unit is ready and clustered + ceph-mon/50* active idle 22/lxd/21 10.7.5.133 Unit is ready and clustered + ceph-mon/51 active idle 24/lxd/21 10.7.5.131 Unit is ready and clustered + + +In this example, 10.7.5.131 is the address of 24/lxd/21, 10.7.5.132 is 21/lxd/29, 10.7.5.133 is 22/lxd/21, so --mon-ids must be:: + + --mon-ids juju-c1a2b7-24-lxd-21 juju-c1a2b7-21-lxd-29 juju-c1a2b7-22-lxd-21 + + +Copy mon-store on all mon's +--------------------------- + +The ceph-monstore-tool command will rebuild the data in /tmp/mon-store. + + +Now copy the rebuilt /tmp/mon-store to all other mon's units (e.g. again in /tmp). + +On all mon's units:: + + - mv /var/lib/ceph/mon/ceph-xxx/store.db /var/lib/ceph/mon/ceph-xxx/store.db.bak + - cp -r /tmp/mon-store/store.db /var/lib/ceph/mon/ceph-xxx/store.db + - chown ceph:ceph -R /var/lib/ceph/mon/ceph-xxx/store.db + +Start services +----------------------- + +Start all mon and mgr services from mon units:: + + systemctl start ceph-mon@juju-c1a2b7-24-lxd-21.service + systemctl start ceph-mgr@juju-c1a2b7-24-lxd-21.service + +Start all OSDs +------------- + +Start all osd's from the client:: + + juju run-action --wait ceph-osd/17 start osds=all + juju run-action --wait ceph-osd/16 start osds=all + juju run-action --wait ceph-osd/15 start osds=all + + +Troubleshooting +--------------- + + +Check the status of the cluster:: + + # ceph -s + cluster: + id: 5d26e488-cf89-11eb-8283-00163e78c363 + health: HEALTH_WARN + mons are allowing insecure global_id reclaim + 67 pgs not deep-scrubbed in time + 67 pgs not scrubbed in time + + services: + mon: 3 daemons, quorum juju-c1a2b7-24-lxd-21,juju-c1a2b7-21-lxd-29,juju-c1a2b7-22-lxd-21 (age 3m) + mgr: juju-c1a2b7-21-lxd-29(active, since 4m), standbys: juju-c1a2b7-22-lxd-21, juju-c1a2b7-24-lxd-21 + osd: 6 osds: 6 up (since 78s), 6 in (since 3M) + + data: + pools: 19 pools, 203 pgs + objects: 5.92k objects, 22 GiB + usage: 69 GiB used, 23 TiB / 23 TiB avail + pgs: 203 active+clean + + +It happened to us that the ceph-osd's were blocked with the message: "non-pristine devices detected". This was because we tried to add new devices to ceph-osd:osd-devices while the cluster was down. + +Here are some commands to check the status of the devices in the osd hosts:: + + # ceph-volume lvm list + + ====== osd.0 ======= + + [block] /dev/ceph-9eff9b57-a7d7-497d-b0bc-a502e5658e32/osd-block-9eff9b57-a7d7-497d-b0bc-a502e5658e32 + + block device /dev/ceph-9eff9b57-a7d7-497d-b0bc-a502e5658e32/osd-block-9eff9b57-a7d7-497d-b0bc-a502e5658e32 + block uuid oCfbdk-nKLh-yH2G-R82v-WhbI-cg3Y-Wo3L5A + cephx lockbox secret + cluster fsid 5d26e488-cf89-11eb-8283-00163e78c363 + cluster name ceph + crush device class None + encrypted 0 + osd fsid 9eff9b57-a7d7-497d-b0bc-a502e5658e32 + osd id 0 + osdspec affinity + type block + vdo 0 + devices /dev/nvme1n1p1 + + + ... + +Another command is `lsblk`:: + + # lsblk +NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT +loop0 7:0 0 55.5M 1 loop /snap/core18/2074 +loop1 7:1 0 72.5M 1 loop /snap/lxd/21497 +loop2 7:2 0 61.8M 1 loop /snap/core20/1081 +loop3 7:3 0 73.2M 1 loop /snap/lxd/21624 +loop4 7:4 0 69M 1 loop +loop5 7:5 0 55.4M 1 loop /snap/core18/2128 +loop6 7:6 0 32.3M 1 loop /snap/snapd/12883 +loop8 7:8 0 32.3M 1 loop /snap/snapd/13170 +nvme2n1 259:0 0 11.7T 0 disk +├─nvme2n1p1 259:2 0 3.9T 0 part +│ └─ceph--5fd8f96e--2ccb--460f--87b8--359ff81cff8a-osd--block--5fd8f96e--2ccb--460f--87b8--359ff81cff8a 253:0 0 3.9T 0 lvm +├─nvme2n1p2 259:3 0 3.9T 0 part +└─nvme2n1p3 259:4 0 3.8T 0 part +nvme3n1 259:1 0 11.7T 0 disk +├─nvme3n1p1 259:22 0 3.9T 0 part +│ └─ceph--eedc5ae8--eab5--4640--9ed7--46726a37b320-osd--block--eedc5ae8--eab5--4640--9ed7--46726a37b320 253:1 0 3.9T 0 lvm +│ └─zaWz94-2Ggu-HA5i-zepj-OkfQ-50ZG-XaznBn 253:4 0 3.9T 0 crypt -- GitLab