Upgrading OS to Oracle Linux 8
The Big Data Service (BDS) cluster OS is upgraded from Oracle Linux 7.9 to Oracle Linux 8.10. All packages in the cluster are upgraded to the equivalent package built for OL8.
Leapp utility is used for the upgrade. For more information, see Upgrading Systems With Leapp.
Prerequisites
OL8 patch is an OS patch (ol8.10-x86_64-2.0.0.0-0.0) that shows up after the following version prerequisites are fulfilled:
| Version type | Required version before upgrade | Version after upgrade |
|---|---|---|
| bds version | 3.0.29.5 | 3.1.0.2 |
| odh version | 2.0.10 | 2.0.10 |
| os version | ol7.9-x86_64-1.29.1.999-0.0 | ol8.10-x86_64-2.0.0.0-0.0 |
Other OS patch prerequisites also apply. See Updating Big Data Service Clusters.
OL8 Upgrade Restrictions
OL8 upgrade doesn't support the following scenarios:
- FIPS enabled clusters.
- Clusters with bare metal shape nodes.
- Clusters with Cloud SQL nodes.
Features Supported for OL8 Upgrade
Dryrun only mode: Allows customer to dry run the upgrade and stop to identify potential risks and blockers related to the upgrade.
Dry run checks prerequisites such as network and storage requirement and any configuration that renders the node unusable after the upgrade. When dryrunOnly option is selected, patching stops after the dry run so that customer can review warnings and actual package upgrade that's performed. If dryrunOnly isn't selected, patching continues if no error is detected during the dry run, else patching stops.
After the dry run is successful, an aggregated report is generated in MN0 node under /opt/oracle/bds/ospatch/ol7-8-upgrade/dryrun/:
leapp-preupgrade-report-aggregated.json: Shows potential risks for the upgrade, classified by severity and aggregated by nodes affected. These risks needs a review but it's not required to resolve them unless it's classified as an "high (inhibitor)" level risk. Customer must resolve upgrade inhibitors and warnings they deem important.package-diff-aggregated.json: Shows the difference between actual package upgrade plan and the prepared upgrade plan as shown in the Console or API response.
Patching by batch of nodes: OL8 upgrade supports patching by batch. Allows minimum downtime patching and controls blast radius when patching fails.
On an average, one batch of nodes takes an hour to complete patching. Supported configurations are:
| Patching configuration | Patching behaviour | Number of patches | Approximate time to complete patching |
|---|---|---|---|
| Downtime patching | All nodes are taken down for patching | 1 | 1 hour |
| Patching by Availability Domain (AD) or Fault Domain (FD) |
Patch the following nodes individually in a sequence: mn0, un0, wn0, mn1, un1 Next patch nodes by the AD they're assigned to (or FD if in a single AD region). |
8 | 9 hours |
| Patch by a specified batch size |
Patch the following nodes individually in a sequence: mn0, un0, wn0, mn1, un1 Next patch nodes based on the specified batch size. |
Customer decides | NOT APPLICABLE |
Failure or Rollback Behavior
During the OL8 upgrade, the nodes being patched reboots. If any failure happens before the node reboot, an automatic rollback is performed and the node isn't affected. After the reboot, rollback isn't possible.
After patching of each batch, BDS does a health check for the cluster regardless of whether the batch succeeded or not. If the number of nodes failed exceeds customer specified failure tolerance, patching stops, else patching continues to the next batch.
After patch work request is finished, if no failure occurs, the work request is marked as successful, cluster bds version is updated to 3.1.0.2.
Failure might occur because of the following reasons:
| Scenario | Action required |
|---|---|
| Cluster is healthy |
If all nodes are healthy: Retrigger patching. Patching continues on nodes that aren't updated. If some nodes are unhealthy: Retrigger patching. BDS retries patching on failed nodes to bring them up. If nodes still fail after the retry, remove nodes from the Oracle Cloud Console and retry patching. |
| Cluster fails |
If some nodes fails, fix issue on the node or remove the nodes from the cluster in the Oracle Cloud Console, and then try to fix the service health in Ambari.
|
OL8 Upgrade Log locations
| Location | Purpose |
|---|---|
| OCI work request log/error | Shows patching progress and error encountered at high level. |
| MN0
|
MN0 orchestrates patching across the cluster. For orchestration related errors see the log file provided. |
| On each node:
|
Leapp upgrade utility log. |
| On each node:
|
BDS configuration update log. |