From 066bbcc3471095ed1af25fd26ca43655f692d94c Mon Sep 17 00:00:00 2001 From: Matteo <matteo.difazio@garr.it> Date: Wed, 13 Jul 2022 11:00:50 +0200 Subject: [PATCH] 20220713 Matteo: edit gpu doc --- web/containers/gpus.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/web/containers/gpus.rst b/web/containers/gpus.rst index e323439a..0129193d 100644 --- a/web/containers/gpus.rst +++ b/web/containers/gpus.rst @@ -41,10 +41,7 @@ Therefore, we recommend that you make a backup of your most important data befor Getting a GPU ------------- -In order to abtain access to one or more GPUs, please send us a request via email to `cloud-support@garr.it` specifying the following: - -1. The research activity/purpose that requires the use of one or more GPUs -2. The number of GPUs requested +In order to obtain access to one or more GPUs, please send us a request via `web portal <https://support.garr.it/jira/plugins/servlet/loginfreeRedirMain?portalid=49>`_ (Common requests -> Reserve GPU) Each user request will be queued and satisfied in cronological order as long as the GPUs requested are free and can be allocated. @@ -52,7 +49,8 @@ are free and can be allocated. Users will then receive an email that confirms that they can access the GPU(s) along with informations regarding the time period in which the GPU(s) will be exclusively reserved to the user. -Once the confirmation email has been received, it is sufficient to require the resource `nvidia.com/gpu` in the `Pod` deployment. +Once the confirmation email has been received, it is sufficient to require the resource `nvidia.com/gpu` in the `Pod` deployment and add the tolerations section. +The key in the tolerations section can be either 'vgpu' or 'gpu'. For example, to deploy the `digits` container, put this into file `digits.yaml`:: apiVersion: v1 @@ -60,12 +58,17 @@ For example, to deploy the `digits` container, put this into file `digits.yaml`: metadata: name: gpu-pod spec: + tolerations: + - key: "vgpu" + operator: "Equal" + value: "true" + effect: "NoSchedule" containers: - name: digits-container image: nvcr.io/nvidia/digits:19.12-tensorflow-py3 resources: limits: - nvidia.com/gpu: 1 # requesting 1 GPUs + nvidia.com/gpu: Now you can deploy it with:: -- GitLab