Friday, January 3, 2020

Oracle RAC Cluster Startup Sequence

     In an Oracle 10g clusterware, remember how oracle clusterware was locating OCR and Voting file in? You probably know that location of these files were maintained in /etc/oracle/ocr.loc file and clusterware in 10g was able to start OCSSD (Oracle Cluster Synchronization Service Daemon) using voting disks/files and OCRSD  (Oracle Cluster Registry Service Daemon) using OCR files upon startup.
      However from Oracle 11gR2, clusterware architecture and startup process have changed a bit, which is described in this article.

       Let's first understand the main components involved in 11gR2 (and did not change much in 12c) clusterware startup.

Four Important Clusterware files used during startup:

1. Voting Files:  Stores information about node membership. You can have multiple voting disks/files and each files must be accessible by all the nodes in the cluster for nodes to be members of the cluster. When cluster is running, voting files are used to check heartbeat among all available nodes in a cluster over cluster interconnect network to avoid split brain scenario

2. OCR (Oracle Cluster Registry) files : Stores Oracle clusterware and RAC database configuration information such as node membership, s/w version, configuration and status of RAC DB resources i.e. instances, listener, services, etc.

3. OLR (Oracle Local Registry) file: In Oracle 11gR2, additional component related to the OCR called the OLR is installed on each node of the cluster.
  The OLR is a local registry for node specific resources. Location of OLR is CRS_HOME/cdata/<hostname>.olr and the location of the OLR is stored in /etc/oracle/olr.loc and used by OHASD (Oracle High Availability Service Daemon) to startup cluster resources.

     Few important info which OLR Contains are active CRS version, CRS_HOMES, GPnP details, Node names, OCR latest backup time and location etc. 

     If OLR is missing or corrupted, clusterware can't be started on that node.

Note: Please use this article- "How to restore local OLR in Oracle 11gR2 RAC?",in case your OLR gets corrupted or missing.

4. GPnP Profile: Grid Plug aNd Play (GPnP) is a XML file located at CRS_HOME/profiles/peer/profile.xml which is known as GPnP profile. Each node in the cluster maintains a copy of this profile locally and is maintained by ora.gpnpd (GPnP daemon) together with ora.mdnsd (mdns daemon).

This GPNP profile.xml contains info like: Network interfaces/IPs for public and private interconnects, ASM Server Parameter (SP) file, CSS voting disks, cluster name, cluster id, hostname, ASM diskgroup discovery string, Name of the ASM diskgroup containing voting files, etc.
     Now You know 4 important file components of the clusterware, now it will be easy for us to understand the sequence of clusterware startup.

     There are many clusterware processes, however we will concentrate here mainly 4 important processes involved in clusterware startup, namely init.d/systemd, OHASD, OCCSD and OCRSD. 
1. init.d/systemd :  Once your OS finishes the bootstarp process, it reads /etc/init.d file by the initialization daemon init.d.  init.d daemon triggers the startup of OHASD.

Note: In latest Linux distributions init.d has been replaced by systemd daemon. For more details on init.d and systemd , please read this very good article- The Story Behind ‘init’ and ‘systemd’: Why ‘init’ Needed to be Replaced with ‘systemd’ in Linux

2. OHASD : is the root for bringing up Oracle clusterware. OHASD has access to the OLR stored on the local file system. OLR provides needed data to complete OHASD initialization.

     OHASD triggers GPnPD and CSSD. CSSD daemon has access of GPNP profile from which it locates the well-known pointers for the voting files locations in the ASM disk headers and CSSD is able to complete initialization and joins an existing cluster.

     OHASD now starts ASM instance and ASM can now operate with CSSD initialized and operating. The ASM instance uses special code to locate the contents of the ASM SPFile assuming it's stored in a Diskgroup. 
     With an ASM instance operating and it's diskgroup mounted, access to clusterware OCR file is available to CRSD and hence OHASD starts CRSD. Clusterware completes initialization and brings up other services under it's control.

Please note that ASM does not depend on on OCR or OLR to be online. ASM depends on OCCSD (voting disk/file) to be online.
Also note that there are many more processes/daemons involved in clusterware startup and normal working of it, which is displayed in below mentioned diagram, however we have only covered most important components of clusterware in this article.

For more information on clusterware  startup sequence, please refer Oracle support Doc id:1053147.1

Please refer below 12c Clusterare startup sequence diagram for more info: