Exadata Patching – Best Practices and Lessons Learned

“With Great Power Comes Great Responsibility”

One of the biggest ongoing responsibilities that comes after commissioning an Exadata appliance is keeping the firmware and software of the various components of the machine up to date. As you’ve probably construed, we are talking about patching an Exadata Appliance.

An Exadata appliance has three layers that requires software maintenance. The bottom and top sections of the rack hold the Exadata Storage servers followed by the compute/database nodes and lastly the InfiniBand switches as depicted from the image below.

Let us take the case of a recent client who had all its business-critical application databases running in a gamut of environments from Sandpit to End to End and Production on Exadata machines in varying configurations starting from a Quarter Rack of Exadata X4-8 in sandpit to a Full Rack of Exadata X5/6-2 in production. We successfully patched the Exadata server to the latest patchset levels as approved by the business.

Meticulous planning in patching of these components is Important to ensure a low risk change. That is; maintaining the continuity of services while also making the process as time efficient as possible and producing a predictable outcome – a successful patching process.

Based on our years of experience some of the Best Practices we employ are as follows:

1.Patching Approach: Clearly delineate the components that could be patched online and those that require a downtime. E.g. Online-Patching for Storage and InfiniBand and Outage for Db Nodes OS and Grid Infrastructure.

2. Patch Staging Area: Setup of a standard and uniform patch staging area on the first compute node of each Exadata machine being patched which contains patches for various components.

3. Proactive SR: Always open a proactive SR with Oracle well in advance of sharing your patching schedule, your patching procedure, prechecks like Exachk reports, Patchmgr precheck reports and any issues encountered. A proactive SR also ensures that Oracle support personnel have been pre-allocated and are on standby while you patch your Exadata machine.

4. Integrated Lights-out Management (ILOM) Access:  Always ensure ILOM access is enabled for each component being patched i.e. Storage cells, InfiniBand Switches and Compute nodes.

5. SSH Passwordless Access: Ensure a successful passwordless SSH to each component being patched from the first compute node.

6. Exachk Reports: Ensure the latest version of the Exachk utility has been used and the health issues (if any) have been carefully been reviewed, discussed with Oracle support and fixed (where applicable) before you proceed with patching.

7. Preparing Exadata Components: Ensure that a patchmgr pre-check is run and that it is successful without any issues before the actual patching of any component like Cell Server, InfiniBand Switch and Compute Nodes.

8. Maintenance Mode: Ensure that the component being patched is in maintenance mode i.e under a Blackout to avoid unwanted notifications and repeated alerts during patching.  

9. Component Patching: Only after ensuring all the above points 1 to 8 have been carried should the actual patching of the Exadata component be commenced based on the patching approach using the patchmgr utility in a non-rolling/rolling fashion as approved by the business.

10. Review Patching Outcome: Carefully review all patchmgr run output and logs. Report any unexpected errors and deviations to Oracle support using the Proactive SR.

11. Post Patching Checks: Crosscheck the imageinfo version of the various components patched for the Exadata appliance.

Ensure that a health check using the Exachk utility is run and carefully analyse and compare the Exachk report to the one taken right before the patching. Report any concerns to Oracle Support.

12. End Maintenance Mode: Ensure that maintenance mode/blackout has been ended immediately post patching.

An Exadata patching cycle is usually full of experiences that really widen your understanding of how the different components in an Exadata machine behave or may behave under varying environmental factors and the practices that are required while commissioning them.

We do have some lessons to share from our experience as well, enlisted as below:

Failure During Compute Node OS Patching:

While patching the OS on compute nodes of an Exadata X4-8 machine in an End to End Environment we encountered a fatal timeout issue during the actual patchmgr run that was updating the libraries at the OS level.

In the hours that followed, the immediate need was to roll back the patch and bring the DB Node OS back to its image version before patching so that the disrupted application/database services could be restored. We faced another setback as we couldn’t rollback the patch that had been partially applied by the patchmgr utility using the ‘rollback’ option.

During the next few days and after numerous hours of consulting with Oracle support experts it was identified that a custom File system layout “/var” on the Exadata machine caused the backups initially taken by the patchmgr utility to be overwritten during the actual patching cycle.

The customised filesystem layout appeared something like the below on the compute nodes:

A Filesystem standardization activity was conducted on the compute nodes of the Exadata machine and the custom layout was merged with the root “/”. The Correct file system layout would then appear as followin:

Subsequent patching and rollback attempts on the compute node succeeded.

Always be cautious of any customised system Filesystem (FS) layouts you may have on your compute nodes as this may create aberrations in the behavior of the patchmgr utility and lead to failures.

Prechecks failed for InfiniBand Patching:

While preparing for InfiniBand patching on an Exadata X6-2 machine in the End-To-End environment the Prechecks using the patchmgr utility failed at the stage of “verifying the network topology”

This check ensures that every Cell Node and/or Compute node in the Exadata stack is redundantly connected to the available IB leaf switches.

A Sample of the exception that surfaces during the Precheck using the patchmgr utility is as follows:

On explicitly running the “Verify Topology Check” the following exceptions will be seen:

The verify topology check output also identifies the compute/cell nodes that are not consistent with the above exceptions (not shown in the above image).

The solution was an Oracle Field Engineer visit to the datacenter to fix the IB cabling. A precheck and verify-topology check for InfiniBand succeeded after the cabling on the IB switches was fixed.

It is often very worthwhile to run the following utility to check the IB topology in advance even before you plan to run pre-checks using the patchmgr utility and subsequently plan to patch your InfiniBand Switch:

/opt/oracle.SupportTools/ibdiagtools/verify-topology

In a nutshell meticulous planning and attention to small details and exceptions encountered will really payoff in the long run. This is especially true when you are planning to patch your Exadata appliance by delivering expected and predictable results in a “No Scope For Error” zone, like a production environment.

Hope we have added of some insight to your Exadata Patching outlook!

Consult with Fusion Professional to discover how our best practices can ensure your Exadata Patching is done the right way the first time with no unexpected results or down time. Let our experience work for you.


Fusion Insights

Many organisations don’t realise it, but in our current environment Data has become the main differentiator in the market. Most…

MORE INFORMATION

Professional services, one of the fastest growing sectors of the Australian economy, covers a broad group of companies and organizations…

MORE INFORMATION

We experience an increasing polarisation in our political landscape with tribalism becoming a real issue. This is partially to be…

MORE INFORMATION

Oracle’s introduction of the self-driving, self-securing, and self-repairing Autonomous Database draws upon its decades of expertise in automating databases and…

MORE INFORMATION

In a recent blog post from Dataiku, the leading data science, machine learning, and AI platform, Lynn Heidmann explored ways…

MORE INFORMATION

“With Great Power Comes Great Responsibility” One of the biggest ongoing responsibilities that comes after commissioning an Exadata appliance is…

MORE INFORMATION

According to Constellation Research, a little more than half of traditional Fortune 500 companies have disappeared due to the lack…

MORE INFORMATION

Fusion Professionals has signed a partnership agreement with Dataiku, one of the world’s leading machine learning platforms that moves companies…

MORE INFORMATION

Statistical language models apply probability distributions to a sequence of words. These models are finding increasing use as natural language…

MORE INFORMATION

Challenges The Company, one of Australia’s largest and fastest growing Telco companies had 2 primary SharePoint environments that had different…

MORE INFORMATION

Containerization allows applications to run on any machine- anytime, anywhere so long as they are compatible. By virtualizing your OS,…

MORE INFORMATION

So you’ve finally decided that the cloud is safer than corporate data centers and digital assets and you’ve chosen to…

MORE INFORMATION

Building a system that houses your organisation’s data can be daunting, especially now that data acquisition is growing rapidly. The…

MORE INFORMATION

Human-to-machine communication has not yet been perfected, but enterprises are already beginning to integrate this groundbreaking technology into their operations,…

MORE INFORMATION

Fusion Professionals has signed a partnership agreement with MapR Technologies, provider of the industry’s leading data platform for AI and…

MORE INFORMATION

“Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to…

MORE INFORMATION

In recent years data volumes have been increasing dramatically. This has created major challenges for traditional analytics platforms in terms…

MORE INFORMATION

With the increasing volumes of data that can be cost effectively stored in the cloud, comes increasing responsibility. The current…

MORE INFORMATION

With the advancement of technology and abundance of data your business receives on a daily basis, companies are now in…

MORE INFORMATION

Fusion Professionals held its annual Fusion Summit last Thursday the 18th of October at the Rag and Famish Hotel in…

MORE INFORMATION