This topic describes how to manage nodes on kURL clusters. It includes procedures for how to safely reset, reboot, and remove nodes when performing maintenance tasks.
See the following sections:
Before you manage a node on a kURL cluster, you must install the Embedded kURL Cluster Operator (EKCO) add-on on the cluster. The EKCO add-on is a utility tool used to perform maintenance operations on a kURL cluster.
For information on how to install the EKCO add-on to a kURL cluster, see EKCO Add-on.
Resetting a node is the process of attempting to remove all Kubernetes packages and host files from the node.
Resetting a node can be useful if you are creating and testing a kURL specification in a non-production environment. Some larger changes to a kURL specification cannot be deployed for testing by rerunning the kURL installation script on an existing node. In this case, you can attempt to reset the node so that you can reinstall kURL to test the change to the kURL specification.
Warning: Do not attempt to reset a node on a cluster in a production environment. Attempting to reset a node can permanently damage the cluster, which makes any data from the cluster irretrievable. Reset a node on a cluster only if you are able to delete the host VM and provision a new VM if the reset script does not successfully complete.
To reset a node on a cluster managed by kURL:
Run the kURL reset script on a VM that you are able to delete if the script is unsuccessful. The kURL reset script first runs the EKCO shutdown script to cordon the node. Then, it attempts to remove all Kubernetes packages and host files from the node.
Online:
curl -sSL https://kurl.sh/latest/tasks.sh | sudo bash -s reset
Air Gapped:
cat ./tasks.sh | sudo bash -s reset
If the reset does not complete, delete the host VM and provision a new VM.
The reset script might not complete successfully if the removal of the Kubernetes packages and host files from the node also damages the cluster itself. When the cluster is damaged, the tools used by the reset script, such as the kubectl command-line tool, can no longer communicate with the cluster and the script cannot complete.
Rebooting a node is useful when you are performing maintenance on the operating system (OS) level of the node. For example, after you perform a kernel update, you can reboot the node to apply the change to the OS.
To reboot a node on a cluster managed by kURL:
Run the EKCO shutdown script on the node:
/opt/ekco/shutdown.sh
The shutdown script deletes any Pods on the node that mount volumes provisioned by Rook. It also cordons the node, so that the node is marked as unschedulable and kURL does not start any new containers on the node. For more information, see EKCO Add-on.
As part of performing maintenance on a multi-node cluster managed by kURL, it is often required to remove a node from the cluster and replicate its data to a new node. For example, you might need to remove one or more nodes during hardware maintenance.
This section describes how to safely remove nodes from a kURL cluster that uses the Rook add-on for Rook Ceph storage. For more information about the Rook add-on, see Rook Add-on.
For information about how to remove a node from a cluster that does not use Rook Ceph, see kubectl drain in the Kubernetes documentation.
Review the following requirements and considerations before you remove one or more nodes from Rook Ceph and etcd clusters:
etcd cluster health: To remove a primary node from etcd clusters, you must meet the following requirements to maintain etcd quorum:
Complete the following prerequisites before you remove one or more nodes from a Rook Ceph cluster:
Upgrade Rook Ceph to v1.4 or later.
The two latest minor releases of Rook Ceph are actively maintained. It is recommended to upgrade to the latest stable release available. For more information, see Release Cycle in the Rook Ceph documentation.
Attempting to remove a node from a cluster that uses a Rook Ceph version earlier than v1.4 can cause Ceph to enter an unhealthy state. For example, see Rook Ceph v1.0.4 is Unhealthy with Mon Pods Not Rescheduled under Troubleshoot Node Removal below.
isBlockStorageEnabled
to true
. This is the default for Rook Ceph v1.4 and later.Ensure that you can access the ceph CLI from a Pod that can communicate with the Ceph Storage Cluster. To access the ceph CLI, you can do one of the following:
rook-ceph-tools
Pod to access the ceph CLI.
Use the same version of the Rook toolbox as the version of Rook Ceph that is installed in the cluster.
By default, the rook-ceph-tools
Pod is included on kURL clusters with Rook Ceph v1.4 and later.
For more information about rook-ceph-tools
Pods, see Rook Toolbox in the Rook Ceph documentation.kubectl exec
to enter the rook-ceph-operator
Pod, where the ceph CLI is available.(Optional) Open an interactive shell in the rook-ceph-tools
or rook-ceph-operator
Pod to run multiple ceph CLI commands in a row. For example:
kubectl exec -it -n rook-ceph deployment/rook-ceph-tools -- bash
If you do not create an interactive shell, precede each ceph CLI command in the rook-ceph-tools
or rook-ceph-operator
Pod with kubectl exec
. For example:
kubectl exec -it -n rook-ceph deployment/rook-ceph-tools -- ceph status
Verify that Ceph is in a healthy state by running one of the following ceph status
commands in the rook-ceph-tools
Pod in the rook-ceph
namespace:
Rook Ceph v1.4.0 or later:
kubectl -n rook-ceph exec deployment/rook-ceph-tools -- ceph status
Rook Ceph v1.0.0 to 1.3.0:
kubectl -n rook-ceph exec deployment/rook-ceph-operator -- ceph status
Note: It is not recommended to use versions of Rook Ceph earlier than v1.4.0.
The output of the command shows health: HEALTH_OK
if Ceph is in a healthy state.
This procedure ensures that the data held in Rook Ceph is safely replicated to a new node before you remove a node. Rebalancing your data is critical for preventing data loss that can occur when removing a node if the data stored in Ceph has not been properly replicated.
To manually remove a node, you first use the Ceph CLI to reweight the Ceph OSD to 0
on the node that you want to remove and wait for Ceph to rebalance the data across OSDs.
Then, you can remove the OSD from the node, and finally remove the node.
Note: The commands in this procedure assume that you created an interactive shell in the rook-ceph-tools
or rook-ceph-operator
Pod. It also helps to have another shell to use kubectl
commands at the same time.
For more information, see Rook Ceph Cluster Prerequisites above.
To manually rebalance data and remove a node:
Add the same number of new nodes to the cluster that you intended to remove. For example, if you intend to remove a total of two nodes, add two new nodes.
Ceph rebalances the existing placement groups to the new OSDs.
After Ceph completes rebalancing, run the following command to verify that Ceph is in a healthy state:
ceph status
Run the following command to display a list of all the OSDs in the cluster and their associated nodes:
ceph osd tree
Example output:
[root@rook-ceph-tools-54ff78f9b6-gqsfm /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.97649 root default
-3 0.19530 host node00.foo.com
0 hdd 0.19530 osd.0 up 1.00000 1.00000
-7 0.19530 host node01.foo.com
2 hdd 0.19530 osd.1 up 1.00000 1.00000
-5 0.19530 host node02.foo.com
1 hdd 0.19530 osd.2 up 1.00000 1.00000
-9 0.19530 host node03.foo.com
3 hdd 0.19530 osd.3 up 1.00000 1.00000
-11 0.19530 host node04.foo.com
4 hdd 0.19530 osd.4 up 1.00000 1.00000
Run the following command to reweight the OSD to 0
on the first node that you intend to remove:
ceph osd reweight OSD_ID 0
Replace OSD_ID
with the Ceph OSD on the node that you intend to remove. For example, ceph osd reweight 1 0
.
Ceph rebalances the placement groups off the OSD that you specify in the ceph osd reweight
command.
To view progress, run ceph status
, or watch ceph status
. Ceph may display a HEALTHWARN state during the rebalance, but will return to HEALTHOK once complete.
Example output:
[root@rook-ceph-tools-54ff78f9b6-gqsfm /]# watch ceph status
cluster:
id: 5f0d6e3f-7388-424d-942b-4bab37f94395
health: HEALTH_WARN
Degraded data redundancy: 1280/879 objects degraded (145.620%), 53 pgs degraded
...
progress:
Rebalancing after osd.2 marked out (15s)
[=====================.......] (remaining: 4s)
Rebalancing after osd.1 marked out (5s)
[=============...............] (remaining: 5s)
After the ceph osd reweight
command completes, run the following command to verify that Ceph is in a healthy state:
ceph status
Then, run the following command to mark the OSD as down
:
ceph osd down OSD_ID
Replace OSD_ID
with the Ceph OSD on the node that you intend to remove. For example, ceph osd down 1
. Note: it may not report as down until after the next step.
In another terminal, outside of the rook-ceph-tools
pod run the following kubectl command to scale the corresponding OSD deployment to 0 replicas:
kubectl scale deployment -n rook-ceph OSD_DEPLOYMENT --replicas 0
Replace OSD_DEPLOYMENT
with the name of the Ceph OSD deployment. For example, kubectl scale deployment -n rook-ceph rook-ceph-osd-1 --replicas 0
.
Back in the rook-ceph-tools
pod, run the following command to ensure that the OSD is safe to remove:
ceph osd safe-to-destroy osd.OSD_ID
Replace OSD_ID
with the ID of the OSD. For example, ceph osd safe-to-destroy osd.1
.
Example output:
OSD(s) 1 are safe to destroy without reducing data durability.
Purge the OSD from the Ceph cluster:
ceph osd purge OSD_ID --yes-i-really-mean-it
Replace OSD_ID
with the ID of the OSD. For example, ceph osd purge 1 --yes-i-really-mean-it
.
Example output:
purged osd.1
Outside of the rook-ceph-tools
pod, delete the OSD deployment:
kubectl delete deployment -n rook-ceph OSD_DEPLOYMENT
Replace OSD_DEPLOYMENT
with the name of the Ceph OSD deployment. For example, kubectl delete deployment -n rook-ceph rook-ceph-osd-1
.
Repeat the steps in this procedure for any remaining nodes that you want to remove. Always verify that Ceph is in a HEALTH_OK state before making changes to Ceph.
You can use EKCO add-on scripts to programmatically cordon and purge a node so that you can then remove the node from the cluster.
Warnings: Consider the following warnings about data loss before you proceed with this procedure:
Ceph health: The EKCO scripts in this procedure provide a quick method for cordoning a node and purging Ceph OSDs so that you can remove the node. This procedure is not recommended unless you are able to confirm that Ceph is in a healthy state. If Ceph is not in a healthy state before you remove a node, you risk data loss.
To verify that Ceph is in a healthy state, run the following ceph status
command in the rook-ceph-tools
or rook-ceph-operator
Pod in the rook-ceph
namespace for Rook Ceph v1.4 or later:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status
Data replication: A common Ceph configuration is three data replicas across three Ceph OSDs. It is possible for Ceph to report a healthy status without data being replicated properly across all OSDs. For example, in a single-node cluster, there are not multiple machines where Ceph can replicate data. In this case, even if Ceph reports healthy, removing a node results in data loss because the data was not properly replicated across multiple OSDs on multiple machines.
If you are not certain that Ceph data replication was configured and completed properly, or if Ceph is not in a healthy state, it is recommended that you first rebalance the data off the node that you intend to remove to avoid data loss. For more information, see (Recommended) Manually Rebalance Ceph and Remove a Node above.
To use the EKCO add-on to remove a node:
Verify that Ceph is in a healthy state before you proceed. Run the following ceph status
command in the rook-ceph-tools
or rook-ceph-operator
Pod in the rook-ceph
namespace for Rook Ceph v1.4 or later:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status
Run the EKCO shutdown script on the node:
/opt/ekco/shutdown.sh
The shutdown script deletes any Pods on the node that mount volumes provisioned by Rook. It also cordons the node, so that the node is marked as unschedulable and kURL does not start any new containers on the node. For more information, see EKCO Add-on.
On another primary node in the cluster, run the EKCO purge script for the node that you intend to remove:
ekco-purge-node NODE_NAME
Replace NODE_NAME
with the name of the node that you powered down in the previous step.
The EKCO purge script For information about the EKCO purge script, see Purge Nodes in EKCO Add-on.
This section includes information about troubleshooting issues with node removal in Rook Ceph clusters.
After you remove a node from a Rook Ceph v1.0.4 cluster and you run kubectl -n rook-ceph exec deployment.apps/rook-ceph-operator -- ceph status
, you see that Ceph is in an unhealthy state where a Ceph monitor (mon) is down.
For example:
health: HEALTH_WARN
1/3 mons down, quorum a,c
Additionally, under services
, one or more are out of quorum:
services:
mon: 3 daemons, quorum a,c (age 5min), out of quorum: b
When you run kubectl -n rook-ceph get pod -l app=rook-ceph-mon
, you see that the mon pod is in a Pending state.
For example:
NAME READY STATUS RESTARTS AGE
rook-ceph-mon-a 1/1 Running 0 20m
rook-ceph-mon-b 0/1 Pending 0 9m45s
rook-ceph-mon-c 1/1 Running 0 13m
This is caused by an issue in Rook Ceph v1.0.4 where the rook-ceph-mon-endpoints ConfigMap still maps a node that was removed.
To address this issue, you must return the Ceph cluster to a healthy state and upgrade to Rook Ceph v1.4 or later.
To return Ceph to a healthy state so that you can upgrade, manually delete the mapping to the removed node from the rook-ceph-mon-endpoints ConfigMap then rescale the operator.
To return Ceph to a healthy state and upgrade:
Stop the Rook Ceph operator:
kubectl -n rook-ceph scale --replicas=0 deployment.apps/rook-ceph-operator
Edit the rook-ceph-mon-endpoints ConfigMap to delete the removed node from the mapping
:
kubect -n rook-ceph edit configmaps rook-ceph-mon-endpoints
Warning: Ensure that you remove the correct rook-ceph-mon-endpoint from the mapping
field in the ConfigMap. Removing the wrong rook-ceph-mon-endpoint can cause unexpected behavior, including data loss.
Find the name of the Pending mon pod:
kubectl -n rook-ceph get pod -l app=rook-ceph-mon
Delete the Pending mon pod:
kubectl -n rook-ceph delete pod MON_POD_NAME
Replace MON_POD_NAME
with the name of the mon pod that is in a Pending state from the previous step.
Rescale the operator:
kubectl -n rook-ceph scale --replicas=1 deployment.apps/rook-ceph-operator
Verify that all mon pods are running:
kubectl -n rook-ceph get pod -l app=rook-ceph-mon
The output of this command shows that each mon pod has a Status
of Running
.
Verify that Ceph is in a healthy state:
kubectl -n rook-ceph exec deployment.apps/rook-ceph-operator -- ceph status
The output of this command shows health: HEALTH_OK
if Ceph is in a healthy state.
For more information about these steps, see Managing nodes when the previous Rook version is in use might leave Ceph in an unhealthy state where mon pods are not rescheduled in Replicated Community.