diff --git a/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst b/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst new file mode 100644 index 0000000000000000000000000000000000000000..9e2d5172743d8deaa9f02286788bb976e92608c6 --- /dev/null +++ b/web/support/kb/ceph/restore-ceph-from-mon-disaster.rst @@ -0,0 +1,56 @@ +Ceph: restore cluster after loosing all ceph-mon's +================================================ + +Here we will describe how to restore a Ceph cluster after a disaster where all ceph-mon's are lost. Of course we assume that the data on the OSD devices are preserved! + +The procedure refers to a Ceph cluster created with Juju. + +Suppose that you have lost (or removed by mistake) all ceph-mon's. We start recreating them, i.e. we have three new ceph-mon units. + + +Stop all OSDs +------------- + +N.B. change the unit IDs accordingly to your cluster:: + + juju run-action --wait ceph-osd/15 stop osds=all + juju run-action --wait ceph-osd/16 stop osds=all + juju run-action --wait ceph-osd/17 stop osds=all + + +Stop all MONs and MGRs +----------------------- + +SSH to each MON and use systemctl to stop mon and mgr services, e.g:: + + systemctl stop ceph-mon@juju-c1a2b7-24-lxd-21.service + systemctl stop ceph-mgr@juju-c1a2b7-24-lxd-21.service + + +Rebuild OSD mon map +---------------------- + +On the juju client machine, rebuild old mon map using existing OSDs with the following script, which will save result to ./mon-store:: + + + +* create keyring with following from new MONs: (this is my example [3]) +- admin key, same on all mons, get one copy +- mon key, same on all mons, get one copy +- mgr key, different on all mons, get one copy from each +- put all the above into one file called keyring +- add proper permissions to mgr keys +ceph-authtool keyring -n mgr.juju-74bc65-default-40 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' +ceph-authtool keyring -n mgr.juju-74bc65-default-41 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' +ceph-authtool keyring -n mgr.juju-74bc65-default-42 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *' +* copy local ./mon-store and keyring from juju client machine to a MON unit, e.g. under /tmp +* on that MON unit rebuild mon store, note that mon ids are new MONs' hostnames. also the order is important, the order needs to match "mon host" in /etc/ceph/ceph.conf on MON unit, otherwise mons cannot start and say it cannot bind IP etc +ceph-monstore-tool /tmp/mon-store/ rebuild -- --keyring /tmp/keyring --mon-ids juju-74bc65-default-42 juju-74bc65-default-40 juju-74bc65-default-41 +* on all MON units rename existing /var/lib/ceph/mon/ceph-*/store.db to /var/lib/ceph/mon/ceph-*/store.db.bak +* copy store-db in /tmp/mon-store to /var/lib/ceph/mon/ceph-* on all MON units, and chown ceph:ceph -R /var/lib/ceph/mon/ceph-*/store.db +* start all mon and mgr services +* start all osds +juju run-action --wait ceph-osd/17 start osds=all +juju run-action --wait ceph-osd/16 start osds=all +juju run-action --wait ceph-osd/15 start osds=all +