Friday, January 3, 2020

Oracle Clusterware Startup Sequence

     In an Oracle 10g clusterware, remember how oracle clusterware was locating OCR and Voting file in? You probably know that location of these files were maintained in /etc/oracle/ocr.loc file and clusterware in 10g was able to start OCSSD (Oracle Cluster Synchronization Service Daemon) using voting disks/files and OCRSD  (Oracle Cluster Registry Service Daemon) using OCR files upon startup.
      However from Oracle 11gR2, clusterware architecture and startup process have changed a bit, which is described in this article.

       Let's first understand the main components involved in 11gR2 (and did not change much in 12c) clusterware startup.

Four Important Clusterware files used during startup:

1. Voting Files:  Stores information about node membership. You can have multiple voting disks/files and each files must be accessible by all the nodes in the cluster for nodes to be members of the cluster. When cluster is running, voting files are used to check heartbeat among all available nodes in a cluster over cluster interconnect network to avoid split brain scenario

2. OCR (Oracle Cluster Registry) files : Stores Oracle clusterware and RAC database configuration information such as node membership, s/w version, configuration and status of RAC DB resources i.e. instances, listener, services, etc.

3. OLR (Oracle Local Registry) file: In Oracle 11gR2, additional component related to the OCR called the OLR is installed on each node of the cluster.
  The OLR is a local registry for node specific resources. Location of OLR is CRS_HOME/cdata/<hostname>.olr and the location of the OLR is stored in /etc/oracle/olr.loc and used by OHASD (Oracle High Availability Service Daemon) to startup cluster resources.

     Few important info which OLR Contains are active CRS version, CRS_HOMES, GPnP details, Node names, OCR latest backup time and location etc. 

     If OLR is missing or corrupted, clusterware can't be started on that node.

Note: Please use this article- "How to restore local OLR in Oracle 11gR2 RAC?",in case your OLR gets corrupted or missing.

4. GPnP Profile: Grid Plug aNd Play (GPnP) is a XML file located at CRS_HOME/profiles/peer/profile.xml which is known as GPnP profile. Each node in the cluster maintains a copy of this profile locally and is maintained by ora.gpnpd (GPnP daemon) together with ora.mdnsd (mdns daemon).

This GPNP profile.xml contains info like: Network interfaces/IPs for public and private interconnects, ASM Server Parameter (SP) file, CSS voting disks, cluster name, cluster id, hostname, ASM diskgroup discovery string, Name of the ASM diskgroup containing voting files, etc.
     Now You know 4 important file components of the clusterware, now it will be easy for us to understand the sequence of clusterware startup.

     There are many clusterware processes, however we will concentrate here mainly 4 important processes involved in clusterware startup, namely init.d/systemd, OHASD, OCCSD and OCRSD. 
1. init.d/systemd :  Once your OS finishes the bootstarp process, it reads /etc/init.d file by the initialization daemon init.d.  init.d daemon triggers the startup of OHASD.

Note: In latest Linux distributions init.d has been replaced by systemd daemon. For more details on init.d and systemd , please read this very good article- The Story Behind ‘init’ and ‘systemd’: Why ‘init’ Needed to be Replaced with ‘systemd’ in Linux

2. OHASD : is the root for bringing up Oracle clusterware. OHASD has access to the OLR stored on the local file system. OLR provides needed data to complete OHASD initialization.

     OHASD triggers GPnPD and CSSD. CSSD daemon has access of GPNP profile from which it locates the well-known pointers for the voting files locations in the ASM disk headers and CSSD is able to complete initialization and joins an existing cluster.

     OHASD now starts ASM instance and ASM can now operate with CSSD initialized and operating. The ASM instance uses special code to locate the contents of the ASM SPFile assuming it's stored in a Diskgroup. 
     With an ASM instance operating and it's diskgroup mounted, access to clusterware OCR file is available to CRSD and hence OHASD starts CRSD. Clusterware completes initialization and brings up other services under it's control.

Please note that ASM does not depend on on OCR or OLR to be online. ASM depends on OCCSD (voting disk/file) to be online.
Also note that there are many more processes/daemons involved in clusterware startup and normal working of it, which is displayed in below mentioned diagram, however we have only covered most important components of clusterware in this article.

For more information on clusterware  startup sequence, please refer Oracle support Doc id:1053147.1

Please refer below 12c Clusterare startup sequence diagram for more info:

Oracle RAC 12c Clusterware Startup Sequence


Tuesday, December 31, 2019

What is the Fast Sync Oracle Data Guard feature?


FAST SYNC is a new Data guard feature introduced in Oracle 12cR1 and it's required Oracle Active Data Guard(ADG) license to use for production. 

Maximum Availability mode now allows the LOG_ARCHIVE_DEST_n attributes SYNC and NOAFFIRM to be used together for redo transport service. This enables asynchronous standby database to be deployed at a farther distance from the primary site without increasing the impact on primary DB performance. 

The attribute NOAFFIRM in LOG_ARCHIVE_DEST_n parameter  instructs the standby to acknowledge the receipt of redo changes without waiting for the Remote File Server (RFS) to write to a Standby  Redo Logs (SRL).

This mode is only available in maximum availability protection mode.

example of SYNC NOAFFIRM attribute in LOG_ARCHIVE_DEST_n param on primary DB:

SQL> show parameter log_archive_dest_2
NAME                                    TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                    string       service="STNDBYDB", SYNC NOAFFIRM
                                                   delay=0 optional compression=
                                                   disable max_failure=0 max_conn
                                                   ections=1 reopen=300 db_unique
                                                   _name="PRIMEDB" net_timeout=30,


Monday, December 30, 2019

What is the Far Sync Oracle Data Guard feature?

Far Sync: 

Far Sync feature was  first introduced in 12c Release 1 and it requires Oracle Active Data Guard (ADG) License to use for production purpose.

An Oracle Standby Far SYNC instance is a  remote log transport proxy Oracle standby instance that accepts redo changes from the primary DB and write them into Standby Redo Logs (SRLs) at Far Sync instance's  local destination and then ships that redo changes to the target standby DB. It also archives those Standby Redo Logs(SRLs) to local destination at Far SYNC instance.

A far SYNC instance requires only control file and SRLs and it does not have data files and hence you can't open far sync instance to read/write data.

 Far Sync feature will require an Oracle Active Data Guard (ADG) license to use for production purpose.

It's recommended that Far Sync instance to be placed near(1-150 miles apart) to the primary DB so that there is low network latency between primary and Far Sync instance, which will help in minimizing impact on commit response time and guarantees higher data protection.


Tuesday, December 17, 2019

Why Extra Standby Redo Log Group is required at Oracle Standby Database?

If You create fewer or equal Standby Redo Log (SRL) groups than Oracle Redo Log (ORL) groups, then you may run into trouble when the primary has a high rate of redo generation, especially if the primary is RAC db.  You should have enough SRL groups so that the Network Server SYNC(NSSn) process involved in maximum protection mode  and Network Server ASYNC(NSAn) process involved in maximum performance mode can write from all of the ORL groups from Primary database to SRLS at Standby database. 

For better understanding purpose consider below scenario:

In standalone (Non-RAC) DB, if the primary DB has 2 ORL groups #1 and #2 and redo switches are high due to heavy DML activities , in that case, we want to make sure that standby DB can keep up with the primary. If LGWR on primary just finished #1 and switched to #2, and now it needs to switch back to #1 again because #2 just become full, the standby must catch up, otherwise the primary LGWR cannot reuse #1 because standby is still archiving the standby's #1 SRL. Now, if you have the extra SRL group #3 on standby, then standby in this case can start to use #3 while its #1 SRL is being archived. That way, the primary can reuse the primary's #1 without delay.  


Nice article on Why and  How SRL by Brian Peasland:

Wednesday, February 29, 2012

Creating SQL Profile

You can use below code to create the sql profile:
my_task_name VARCHAR2(30);
my_sqltext CLOB;

my_sqlprofile_name VARCHAR2(30);
BEGIN my_sqltext := 'select * from emp where emp_id=1555';
my_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_text =my_sqltext, user_name =& 'SCOTT',scope => 'COMPREHENSIVE', time_limit => 60, task_name => 'my_sql_tuning_task', description => 'Test Task to tune a query');
--You can use the sql_id from shared pool, you can use v$sqlarea or v$sql to find sql_id for the previously executed query
--my_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_id => '8wvyt72dd4zur', user_name => 'SCOTT',scope => 'COMPREHENSIVE', time_limit => 60, task_name => 'my_sql_tuning_task', description => 'Test Task to tune a query'); DBMS_SQLTUNE.EXECUTE_TUNING_TASK( task_name => 'my_sql_tuning_task'); my_sqlprofile_name := DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (task_name =>'my_sql_tuning_task',name => 'my_sql_profile');

You can use below statment to check the detail of tuning task:
select DBMS_SQLTUNE.report_tuning_task(task_name =>'my_sql_tuning_task') from dual;

You can use below statement to drop tuning task:
execute dbms_sqltune.drop_tuning_task(task_name =>'my_sql_tuning_task');

To drop sql profile:
execute DBMS_SQLTUNE.drop_SQL_PROFILE (name => 'my_sql_profile');

Friday, December 16, 2011

Oracle RAC Interview Questions

Q-1 : What is the split-brain scenario?
A-1 : In Oracle RAC, split-brain is the scenario when one or more nodes updates to the database files w/o considering the integrity with other nodes. so in that scenario there is high possiblity of compromissing of database integrity and introducing the corruption to the database.

Q-2: What is the role of voting disk/file in RAC?
A-2: In Oracle RAC, voting disk file is used to determine the state of each nodes in the cluster. Each node should write heartbeat to the voting disk in predetermine interval i.e. 1 sec, so other nodes in the the cluster know that the node is alive. If node could not register the heartbeat to voting disk in stipulated time frame then it should be fence out from cluster to avoid split-brain scenario, which might introduce corruption to the database. Oracle Cluster Synchronization Service Daemon(OCSSD) is responsible to maintain synchronization of the cluster using voting disk.

Monday, February 7, 2011

How to restore local OLR in Oracle 11gR2 RAC?

When you see the following error in your OHSD log file under $CRS_HOME/cdata/

2009-10-16 15:02:43.664: [ default][3046311632] OHASD Daemon Starting. Command string :restart
2009-10-16 15:02:43.668: [ default][3046311632] Initializing OLR
2009-10-16 15:02:43.672: [ OCROSD][3046311632]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/server1.olr, errno=2, os err string=No such file or directory
2009-10-16 15:02:43.672: [ OCROSD][3046311632]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2009-10-16 15:02:43.673: [ OCRRAW][3046311632]proprinit: Could not open raw device
2009-10-16 15:02:43.673: [ OCRAPI][3046311632]a_init:16!: Backend init unsuccessful : [26]
2009-10-16 15:02:43.673: [ CRSOCR][3046311632] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2009-10-16 15:02:43.673: [ default][3046311632] OLR initalization failured, rc=26
2009-10-16 15:02:43.674: [ default][3046311632]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry
2009-10-16 15:02:43.674: [ default][3046311632][PANIC] OHASD exiting; Could not init OLR
2009-10-16 15:02:43.674: [ default][3046311632] Done.

cd /oracle_crs/product/11.2.0/crs_1/cdata

touch lkcme25070.olr

cd /oracle_crs/product/11.2.0/crs_1/bin

./ocrconfig -local –restore /oracle_crs/product/11.2.0/crs_1/cdata/lkcme25070/backup_20101130_154551.olr