Oracle RAC Database Oracle Clusterware Daemon Failure

The Oracle Clusterware is an essential resource for all RAC databases in a site. When the crsd or evmd daemons are aborted on account of a failure, they are automatically restarted on the node. When the cssd daemon is aborted on account of a failure on a node, the node is restarted. The RAC MNP stack continues to run with one less instance on the site.

The Site Controller Package continues to run uninterrupted as long as there is at least one RAC MNP instance running and the RAC MNP package has not failed. However, if the failed RAC database instance is the last surviving instance on the site, when the node is restarted, it initiates a failover of the Site Controller Package to the remote site. The Site Controller Package, during startup at the remote site, will detect the failure and perform a site failover starting up the RAC MNP stack configured in that site.

Administering the Site Aware Disaster Tolerant Metrocluster Environment

This section describes the procedures that you must perform to administer the SADTA environment.

This section addresses the following topics:

“Administering the SADTA Configuration” (page 395)

“Administering Metrocluster for RAC” (page 400)

Administering the SADTA Configuration

This section elaborates the procedures that need to be followed to administer a SADTA configuration in which complex workloads other than Oracle RAC are configured.

This section addresses the following topics:

“Maintaining a Node” (page 395)

“Maintaining the Site” (page 396)

“Maintaining the Metrocluster Environment File” (page 396)

“Moving the Site Controller Package to a Node at the Local Site” (page 396)

“Maintaining Site Controller Package” (page 396)

“Upgrading the Site Controller Package” (page 397)

“Deleting the Site Controller Package” (page 397)

“Starting a Complex Workload” (page 398)

“Shutting Down a Complex Workload” (page 398)

“Moving a Complex Workload to the Remote Site” (page 398)

“Restarting a Failed Site Controller Package” (page 399)

“Migrating Complex Workloads Using Legacy SG SMS CVM/CFS Packages to Modular SG SMS CVM/CFS Packages with Minimal Downtime” (page 399)

Maintaining a Node

To perform maintenance procedures on a cluster node, the node must be removed from the cluster. Run the cmhaltnode -fcommand to move the node out of the cluster. This command halts the complex workload package instance running on the node. As long as there are other nodes in the site and the Site Controller Package is still running on the site, the site aware disaster tolerant workload continues to run with one less instance on the same site.

Once the node maintenance procedures are complete, join the node to the cluster using the cmrunnode command. If the Site Controller Package is running on the site that the node belongs to, the active complex-workload package instances on the site that have the auto_run flag set

Administering the Site Aware Disaster Tolerant Metrocluster Environment 395