niedziela, 22 maja 2011

What will happen after putting a tablespace in a backup mode ?

OS: Oracle Enterprise Linux 5.4
Oracle: 10.2.0.5 (PSU 3)

What will happen after putting a tablespace in a backup mode ?
Some of the Oracle DBA believes in a myth that a datafile is "frozen" and there are no writes to the datafiles.

This is a bullshit and I will shatter this myth.

To perform a test I will create a tablespace TEST and put in a backup mode.

16:31:44 SQL> create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M;

Tablespace created.

16:32:00 SQL> alter tablespace TEST begin backup;

Tablespace altered.

I am checking in a alert log if the tablespace was put in a backup mode.

Sun May 22 16:31:51 CEST 2011
create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M
Sun May 22 16:31:51 CEST 2011
Starting control autobackup
Control autobackup written to DISK device
        handle '/u01/app/oracle/ORCL2/autobackup/2011_05_22/o1_mf_s_751825911_6xl7lqy9_.bkp'
Completed: create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M
Sun May 22 16:32:10 CEST 2011
alter tablespace TEST begin backup
Sun May 22 16:32:10 CEST 2011
Completed: alter tablespace TEST begin backup
Sun May 22 16:34:00 CEST 2011


16:32:05 SQL> select file_id, tablespace_name, file_name
from dba_data_files where tablespace_name='TEST';

FILE_ID TABLESPACE_NAME FILE_NAME
------- --------------- ----------------------------------------
      2 TEST            /u01/app/oracle/oradata/ORCL2/test01.dbf


16:32:10 SQL> select a.FILE#, a.STATUS, a.CHANGE#, a.TIME
from v$backup a, v$datafile b
where a.FILE#=b.FILE# and a.status='ACTIVE';

FILE# STATUS   CHANGE#   TIME 
----- ------- --------- ---------------------
   2  ACTIVE   13309240  2011-05-22 16:32:10

After putting the tablespace TEST in the backup mode I will create a table.

16:32:22 SQL> create table BARTEK (string varchar2(20)) tablespace TEST;

Table created.

16:32:30 SQL> select object_id from dba_objects where object_name='BARTEK';

 OBJECT_ID
----------
     57788

16:32:44 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788;

     FILE#     BLOCK# D
---------- ---------- -
         2         11 Y
         2          9 Y
         2         10 Y

The table BARTEK is in the database buffer cache but the blocks are dirty.
I will make them not dirty by forcing a checkpoint.

16:32:57 SQL> alter system checkpoint;

System altered.

16:34:00 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788;

     FILE#     BLOCK# D
---------- ---------- -
         2         11 N
         2          9 N
         2         10 N

Now I am inserting one row into the table BARTEK.

16:34:06 SQL> insert into BARTEK values ('Hello World !');

1 row created.

16:34:24 SQL> select dbms_rowid.rowid_relative_fno(rowid) file_num,
dbms_rowid.rowid_block_number(rowid) block_num, string 
from BARTEK ;

  FILE_NUM  BLOCK_NUM STRING
---------- ---------- --------------------
         2         15 Hello World !

16:34:43 SQL> select FILE#,BLOCK#,DIRTY 
from v$bh where OBJD=57788 and BLOCK#=15;

     FILE#     BLOCK# D
---------- ---------- -
         2         15 Y

16:35:09 SQL> commit;

Commit complete.

16:35:20 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788 and BLOCK#=15;

     FILE#     BLOCK# D
---------- ---------- -
         2         15 Y

Even I inserted one row into the table BARTEK and committed but the block no 15 is still dirty in the database buffer cache.
So it was not written into the datafile.

osb3@ORCL2(16:36)[/u01/app/oracle/oradata/ORCL2]$ dd if=test01.dbf bs=8192 skip=15 count=1 | strings | grep 'Hello World'
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 2.3467e-05 seconds, 349 MB/s

I will make the checkpoint to write a data to the datafiles.

16:37:41 SQL> alter system checkpoint;

System altered.

16:38:08 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788 and BLOCK#=15;

     FILE# BLOCK# D
---------- ------ -
         2     15 N

And the data were written to the datafile.

osb3@ORCL2(16:37)[/u01/app/oracle/oradata/ORCL2]$ dd if=test01.dbf bs=8192 skip=15 count=1 | strings | grep 'Hello World'
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 2.3187e-05 seconds, 353 MB/s
Hello World !

Finally I can end the backup mode.

16:40:40 SQL> alter tablespace TEST end backup;

Tablespace altered.

16:41:15 SQL> select a.FILE#, b.name, a.STATUS, a.CHANGE#, a.TIME
from v$backup a, v$datafile b
where a.FILE#=b.FILE# and a.status='ACTIVE';

no rows selected

16:41:22 SQL> select * from v$backup;

     FILE# STATUS                CHANGE# TIME
---------- ------------------ ---------- -------------------
         1 NOT ACTIVE                  0
         2 NOT ACTIVE           13309240 2011-05-22 16:32:10
         3 NOT ACTIVE                  0
         4 NOT ACTIVE                  0
         5 NOT ACTIVE                  0
         6 NOT ACTIVE                  0
        10 NOT ACTIVE                  0

7 rows selected.

16:43:14 SQL> select FILE#, CHECKPOINT_CHANGE#,CHECKPOINT_TIME from  v$datafile;

FILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME
----- ------------------ -------------------
1 13309523         2011-05-22 16:40:36
2 13309523         2011-05-22 16:40:36
3 13309523         2011-05-22 16:40:36
4 13309523          2011-05-22 16:40:36
5 13309523         2011-05-22 16:40:36
6 13309523         2011-05-22 16:40:36
10 13309523         2011-05-22 16:40:36

7 rows selected.

Sun May 22 16:41:15 CEST 2011
alter tablespace test end backup
Sun May 22 16:41:15 CEST 2011
Completed: alter tablespace test end backup
Sun May 22 16:44:36 CEST 2011

My test was based on the very good article under the website

piątek, 20 maja 2011

When a metalink lies about bug fixes

OS: Oracle Enterprise Linux 5.4
Oracle: 10.2.0.5


I tried to run RMAN backup using the following command

osb3@ORCL2(09:47)[/home/oracle]$ rman target /

Recovery Manager: Release 10.2.0.5.0 - Production on Fri May 20 09:47:38 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: ORCL2 (DBID=3949336569)

RMAN> run {
backup database include current controlfile TAG FULLY
SKIP READONLY
SKIP OFFLINE;
}2> 3> 4> 5>

Starting backup at 2011-05-20 09:48:55
using channel ORA_DISK_1
skipping read-only file 5
skipping offline file 6
RMAN-06060: WARNING: skipping datafile compromises tablespace SOE_INDEX recoverability
RMAN-06060: WARNING: skipping datafile compromises tablespace SOE_INDEX recoverability
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00010 name=/u01/app/oracle/oradata/ORCL2/undotbs.dbf
input datafile fno=00001 name=/u01/app/oracle/oradata/ORCL2/system01.dbf
input datafile fno=00004 name=/u01/app/oracle/oradata/ORCL2/users01.dbf
input datafile fno=00003 name=/u01/app/oracle/oradata/ORCL2/sysaux01.dbf
channel ORA_DISK_1: starting piece 1 at 2011-05-20 09:48:55
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/20/2011 09:49:30
ORA-19694: some changed blocks were not found in the change tracking file
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
channel ORA_DISK_1: starting piece 1 at 2011-05-20 09:49:30
channel ORA_DISK_1: finished piece 1 at 2011-05-20 09:49:31
piece handle=/u01/app/oracle/ORCL2/backupset/2011_05_20/o1_mf_ncnnf_FULLY_6xd78bsv_.bkp tag=FULLY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/20/2011 09:49:30
ORA-19694: some changed blocks were not found in the change tracking file


The block change tracking is enabled.

SQL> select*from v$block_change_tracking;

STATUS    FILENAME                                            BYTES
---------------------------------------------------------------------
ENABLED   /u01/app/oracle/ORCL2/block_change_tracking.trc     11599872


On the metalink I found the note:
BLOCK CHANGE TRACKING GIVES ORA-19694 WHEN USING DIRECT LOAD [ID 300989.1]

It is written:


Likelihood of Occurrence 
~~~~~~~~~~~~~~~~~~~~~~~~ 
  This will affect those using SQL Direct Load and RMAN Fast Incremental
  Backups with Change Tracking enabled.
 
Possible Symptoms 
~~~~~~~~~~~~~~~~~ 
  RMAN-571: ===========================================================
  RMAN-569: =============== ERROR MESSAGE STACK FOLLOWS ===============
  RMAN-571: ===========================================================
  RMAN-3009: failure of backup command on ORA_DISK_1 channel at 02/14/2005
  ORA-19694: some changed blocks were not found in the change tracking file

When this error occurs the old incremental backups are suspect and should
not be kept/used for recovery. The incremental level 0 backups are not
affected as that is like a full backup and reads all blocks.
 
Workaround 
~~~~~~~~~~
  Disable Change Tracking

To disable change tracking, use this SQL statement:

  SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;

Patches
~~~~~~~
  
  Fixed in Patchset10.1.0.5, and one-off patches are available 
  for 10.1.0.4 and 10.1.0.3.


I have 10.2.0.5 version with PSU 2.

SQL> select * from registry$history;

ACTION_TIME       ACTION NAMESPACE VERSION  ID COMMENTS       BUNDLE_SERIES
----------------- ------ --------- -------- -- -------------- -------------
11/02/03 08:37:29 APPLY  SERVER    10.2.0.5 2  PSU 10.2.0.5.2 PSU


osb3@ORCL2(12:07)[/u01/app/oracle/product/10.2.0.5/OPatch]$ ./opatch lsinventory
Invoking OPatch 10.2.0.5.1

Oracle Interim Patch Installer version 10.2.0.5.1
Copyright (c) 2010, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/10.2.0.5
Central Inventory : /u01/app/oracle/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 10.2.0.5.1
OUI version       : 10.2.0.5.0
OUI location      : /u01/app/oracle/product/10.2.0.5/oui
Log file location : /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/opatch2011-05-20_12-07-59PM.log

Patch history file: /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/lsinv/lsinventory2011-05-20_12-07-59PM.txt

--------------------------------------------------------------------------
Installed Top-level Products (2):

Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 4                           10.2.0.5.0a
There are 2 products installed in this Oracle Home.


Interim patches (1) :

Patch  10248542     : applied on Thu Feb 03 08:35:01 CET 2011
Unique Patch ID:  13098962
   Created on 28 Dec 2010, 00:46:13 hrs PST8PDT
   Bugs fixed:
     6402302, 9713537, 8350262, 9949948, 8394351, 10327179, 8546356, 9711859
     9714832, 9952230, 10248542, 8544696, 9963497, 9772888, 8664189, 10249537
     7519406, 9952270, 8277300, 9726739


Of course the workaround works. The backup runs fine when I disabled the block change tracking.
But the bug should be fixed in 10.2.0.5
Once again, don`t trust what Oracle guys write in the Metalink notes.

wtorek, 17 maja 2011

When a process is consuming a CPU

OS: AIX 5300-06
Oracle: 9.2.0.8


I received an alert from the monitoring tool that an oracle process is consuming too much CPU.
First of all quick view on topas.


The monitoring tool is complaining that the monitoring process is consuming too much of CPU. Great.
Let`s check this process.

SQL> select vs.username, vs.program, vs.status,
     to_char(vs.logon_time,'YYYY.MM.DD HH24:MI:SS') LOGON_TIME,
     vs.sid, vs.serial#, vp.spid bg_pid
     from v$session vs, v$process vp
     where vs.paddr(+) = vp.addr and vp.spid=1495068;

USERNAME   PROGRAM          STATUS   LOGON_TIME          SID   SERIAL# BG_PID
---------- ---------------- -------- ------------------- ---- ---------- -----
TIVOLI     /opt/IBM/ITM//ai ACTIVE   2011.03.30 18:21:51  26   51336  1495068
           x536/bin/kor09co
           l@xxx

SQL> SELECT a.INVALIDATIONS,a.SQL_TEXT
     FROM v$sqlarea a, v$session b
     WHERE a.ADDRESS=b.SQL_ADDRESS AND b.serial#=51336;

SQL_TEXT
-----------------------------------------------------------------------
SELECT sess.PADDR PADDR, COUNT(op.HASH_VALUE) OPENCURS from v$open_cursor op, 
v$session sess where sess.SADDR = op.SADDR(+) group by sess.PADDR

sys@XXX> @spid_info.sql

PL/SQL procedure successfully completed.

Enter the OS process id: 1495068
paddr     : 070000002C6C7CB8
SID       : 26
Username  : TIVOLI
Machine   : XXX
Program   : /opt/IBM/ITM//aix536/bin/kor09col@XXX
Status    : ACTIVE
OS User   : itmora
Logged On : 30-03-2011 18:21:51
Waiting   : SQL*Net message to client 214965 secs, WAITED KNOWN TIME driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 354933526 secs
Event     : db file sequential read, waited 802647 secs
Event     : latch free, waited 49019 secs
Event     : control file sequential read, waited 3104 secs
Event     : buffer busy waits, waited 1044 secs
Event     : SQL*Net message to client, waited 619 secs
Event     : SQL*Net more data to client, waited 38 secs
Log read% : 32
Cur SQL   :
------------
SQL Address   : 070000002F170D60
SQL Hash value: 4272527880
------------------------------------------
SELECT sess.PADDR               PADDR,   COUNT(op.HASH_VALUE)
OPENCURS from v$open_cursor op, v$session sess where sess.SADD
R = op.SADDR(+) group by sess.PADDR

I will try to trace this session

sys@XXX> exec dbms_system.set_ev(26,51336,10046,12,'');
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_sql_trace_in_session(sid=>26,serial#=>51336,sql_trace=>true); 
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>8, nm=>' ');
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>0, nm=>' ');
PL/SQL procedure successfully completed.

After execute every of these packages I checked udump directory but I didn`t find any trace file.
Last shot

sys@XXX> oradebug setospid 1495068

Oracle pid: 17, Unix process pid: 1495068, image: oracle@XXX (TNS V1-V3)
sys@XXX>  oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 8;
...


And the session is hanging. Now I have two 2 processed which are high consuming CPU.
The session which was trying to trace another one was hanging.

sys@aresmf4> @spid_info.sql
Enter the OS process id: 819210
paddr     : 070000002C6C9CC8
SID       : 24
Username  : SYS
Machine   : XXX
Program   : sqlplus@XXX(TNS V1-V3)
Status    : INACTIVE
OS User   : oraaesmf
Logged On : 16-05-2011 15:44:02
Waiting   : ksdxexeother 33 secs, WAITED KNOWN TIME =0, =0, =0
Event     : SQL*Net message from client, waited 6302 secs
Event     : ksdxexeother, waited 215 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 070000002F5D6000
SQL Hash value: 2346790304
------------------------------------------
select lower(user)||'@'||lower(global_name) id from global_name


Even truss didn`t help so much.

oraaesmf[XXX](/home/oraaesmf)$ truss -aef -o bartek.truss -p 819210

+1  819210: psargs: oraclearesmf4 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
+2  819210: yield()                                         =
+3  819210: kill(1495068, 0)                                = 0
...
+60171  819210: kill(1495068, 0)                                = 0
+60172  819210: getprocs(0x0FFFFFFFFFFF9018, 1536, 0x0000000000000000, 0, 0x0FFFFFFFFFFF9010, 1) = 1
+60173  819210: yield()   


Let`s try to kill the session

sys@XXX> alter system kill session '26,51336';

ERROR at line 1:
ORA-00031: session marked for kill

USERNAME          SID SERIAL# LOGON_TIME          PROGRAM              SPID    STATUS   OSUSER
--------------- ----- ------- ------------------- -------------------- ------- -------- ------
TIVOLI             26   51336 2011-03-30 18:21:51 /opt/IBM/ITM//aix536 1495068 KILLED   itmora
                                                  /bin/kor09col@XXX



And the process with PID 1495068 was still consuming the CPU.
Finally an Unix guy had to kill it.

Probably we are hitting the Oracle bug described in the notes

Killed Session Hung Consuming 100% CPU [ID 458881.1]
Session Marked As Killed But Continues To Consume 100% CPU [ID 752676.1]


Next time I will try to investigate a little bit deeper.