2023-04-03: FG; Ceph added instructions to upgrade to Pacific, updated k8s dashboard address.

3b3bb1db · Fulvio Galeazzi · cfedd762 · 3b3bb1db · 3b3bb1db
Commit 3b3bb1db authored 2 years ago by Fulvio Galeazzi
--- a/web/containers/dashboard.rst
+++ b/web/containers/dashboard.rst
@@ -3,7 +3,7 @@ Dashboard Access
 You can access a `Kubernetes` dashboard for controlling your cluster through a GUI
 at theURL::
-    https://k8s.cloud.garr.it
+    https://container-platform-k8s.cloud.garr.it
 To log in to the dashboard you need to authenticate. Follow this procedure:

--- a/web/support/kb/ceph/ceph-upgrade-to-pacific.rst
+++ b/web/support/kb/ceph/ceph-upgrade-to-pacific.rst
+=======================================
+ Ceph upgrade from Nautilus to Pacific
+=======================================
+It is possible to upgrade from Nautilus directly to Pacific, skipping
+the intermediate Octopus release.
+We followed the `official documentation <https://docs.ceph.com/en/pacific/releases/pacific/#upgrading-from-octopus-or-nautilus>`_
+In the following we will proceed with:
+- prepare cluster
+- upgrade MON
+- upgrade MGR (in our setup, colocated with MON)
+- upgrade OSD
+- final steps
+It is assumed the cluster is managed via ``ceph-ansible``, although some
+commands and the overall procedure are valid in general.
+Prepare cluster
+===============
+Set the `noout` flag during the upgrade::
+  ceph osd set noout
+Upgrade MON
+===========
+Perform the following actions on each MON node, one by one, and
+check that after the upgrade the node manages to join the cluster::
+   sed -i -e 's/nautilus/pacific/' /etc/yum.repos.d/ceph_stable.repo
+   yum -y update
+   systemctl restart ceph-mon@<monID>
+Verify the mon has joined the cluster::
+   ceph -m <monIP> -s
+   ceph -m <monIP> mon versions
+Verify all monitors report the `pacific` string in the mon map::
+  ceph mon dump | grep min_mon_release
+Upgrade MGR
+===========
+Proceed as for MON, upgrading packages and restarting Ceph daemons.
+In our setup, MON and MGR are co-located so by the time you are here, MGR nodes
+have already been upgraded, as can be checked with::
+  ceph versions
+Upgrade OSD
+===========
+Proceed as above with MON, one node at a time, by first updating the package 
+manager configuration file, and then doing a package upgrade::
+   sed -i -e 's/nautilus/pacific/' /etc/yum.repos.d/ceph_stable.repo
+   yum -y update
+   systemctl restart ceph-mon@<monID>
+Finally, restart all OSD daemons with::
+   systemctl restart ceph-osd.target
+Check with::
+   ceph versions
+Note that after upgrade to Pacific, OSD daemons need to perform some sort of
+initialization (read doc for more info), which takes some time: this results
+in some PGs being `active+clean+laggy`.
+The consequence is that at some point you may see "slow ops": if this is the
+case, pause any OSD restart until your cluster is quiet, and wait for it to
+calm down before proceeding.
+Final steps
+===========
+OSD omap update
+---------------
+After the upgrade, ``ceph -s`` will show ``HEALTH_WARN`` with message similar to::
+   116 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats
+To fix that you will need to log into each OSD server and execute (more info
+`here <https://docs.ceph.com/en/latest/rados/operations/health-checks/#bluestore-no-per-pool-omap>`_)
+something similar to::
+  df | grep ceph | awk '{print $NF}' | awk -F- '{print "systemctl stop ceph-osd@"$NF" ; sleep 10 ; ceph osd set noup ; sleep 3 ; time ceph-bluestore-tool repair --path "$0" ; sleep 5 ; ceph osd unset noup ; sleep 3 ; systemctl start ceph-osd@"$NF" ; sleep 300"}' > /tmp/do
+  . /tmp/do
+Please note the above command may cause `slow ops`, both during the "repair" and during OSD restart,
+so ensure you allow enough time between OSDs and carefully pick the time when you perform the upgrade.
+OSD enable RocksDB sharding
+---------------------------
+This needs to be done once, if OSD disks are upgraded from previous versions, also read
+`here <https://docs.ceph.com/en/pacific/rados/configuration/bluestore-config-ref/#bluestore-rocksdb-sharding>`_
+As it requires the OSD to be stopped, it may be useful to combine with step with the one above.
+The operation needs to be performed only on OSD disk which have not yet been sharded. check the
+output of the following command::
+  systemctl stop ceph-osd@##
+  ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-## --command show-sharding
+  systemctl start ceph-osd@##
+If the OSD needs to be sharded, execute::
+  systemctl stop ceph-osd@##
+  ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-## --sharding=\"m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P\" reshard 
+  systemctl start ceph-osd@##
+MON insecure global id reclaim
+------------------------------
+This warning comes from Ceph addressing a security vulnerability.
+The warning can be silenced please check, for example,
+`this page <https://www.suse.com/support/kb/doc/?id=000019960>`_
+If you want to address the issue, note that there are two sides of it: clients
+using insecure global id reclaim and MONs allowing insecure global id.
+The output of `ceph health detail` will clearly show whether you are affected by
+either one.
+Clients using insecure global id need to be updated, before proceeding. They
+are clearly shown in the output of `ceph health detail`.
+Once all clients are updated and `ceph health detail` only complains about MONs like this::
+  [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
+    mon.cephmon1 has auth_allow_insecure_global_id_reclaim set to true
+you can disable the insecure mechanism with::
+  ceph config set mon auth_allow_insecure_global_id_reclaim false
+Tidying it up
+-------------
+Please take a minute to check the official docs: we assume that other suggested
+configurations have already been applied to your cluster (e.g., mon_v2 or straw2
+buckets), so we won't discuss them here.
+Finally, disallow pre-Pacific OSDs and unset `noout` flag::
+   ceph osd require-osd-release pacific
+   ceoh osd unset noout