zielinski.bartosz oracle dba: maja 2011

niedziela, 22 maja 2011

What will happen after putting a tablespace in a backup mode ?

OS: Oracle Enterprise Linux 5.4
Oracle: 10.2.0.5 (PSU 3)

What will happen after putting a tablespace in a backup mode ?
Some of the Oracle DBA believes in a myth that a datafile is "frozen" and there are no writes to the datafiles.

This is a bullshit and I will shatter this myth.

To perform a test I will create a tablespace TEST and put in a backup mode.

16:31:44 SQL> create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M;

Tablespace created.

16:32:00 SQL> alter tablespace TEST begin backup;

Tablespace altered.

I am checking in a alert log if the tablespace was put in a backup mode.

Sun May 22 16:31:51 CEST 2011

create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M

Sun May 22 16:31:51 CEST 2011

Starting control autobackup

Control autobackup written to DISK device

handle '/u01/app/oracle/ORCL2/autobackup/2011_05_22/o1_mf_s_751825911_6xl7lqy9_.bkp'

Completed: create tablespace TEST datafile '/u01/app/oracle/oradata/ORCL2/test01.dbf' size 10M

Sun May 22 16:32:10 CEST 2011

alter tablespace TEST begin backup

Sun May 22 16:32:10 CEST 2011

Completed: alter tablespace TEST begin backup

Sun May 22 16:34:00 CEST 2011

16:32:05 SQL> select file_id, tablespace_name, file_name
from dba_data_files where tablespace_name='TEST';

FILE_ID TABLESPACE_NAME FILE_NAME
------- --------------- ----------------------------------------
2 TEST /u01/app/oracle/oradata/ORCL2/test01.dbf

16:32:10 SQL> select a.FILE#, a.STATUS, a.CHANGE#, a.TIME

from v$backup a, v$datafile b

where a.FILE#=b.FILE# and a.status='ACTIVE';

FILE# STATUS CHANGE# TIME

----- ------- --------- ---------------------

2 ACTIVE 13309240 2011-05-22 16:32:10

After putting the tablespace TEST in the backup mode I will create a table.

16:32:22 SQL> create table BARTEK (string varchar2(20)) tablespace TEST;

Table created.

16:32:30 SQL> select object_id from dba_objects where object_name='BARTEK';

OBJECT_ID

----------

57788

16:32:44 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788;

FILE# BLOCK# D

---------- ---------- -

2 11 Y

2 9 Y

2 10 Y

The table BARTEK is in the database buffer cache but the blocks are dirty.
I will make them not dirty by forcing a checkpoint.

16:32:57 SQL> alter system checkpoint;

System altered.

16:34:00 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788;

FILE# BLOCK# D

---------- ---------- -

2 11 N

2 9 N

2 10 N

Now I am inserting one row into the table BARTEK.

16:34:06 SQL> insert into BARTEK values ('Hello World !');

1 row created.

16:34:24 SQL> select dbms_rowid.rowid_relative_fno(rowid) file_num,

dbms_rowid.rowid_block_number(rowid) block_num, string

from BARTEK ;

FILE_NUM BLOCK_NUM STRING

---------- ---------- --------------------

2 15 Hello World !

16:34:43 SQL> select FILE#,BLOCK#,DIRTY

from v$bh where OBJD=57788 and BLOCK#=15;

FILE# BLOCK# D

---------- ---------- -

2 15 Y

16:35:09 SQL> commit;

Commit complete.

16:35:20 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788 and BLOCK#=15;

FILE# BLOCK# D

---------- ---------- -

2 15 Y

Even I inserted one row into the table BARTEK and committed but the block no 15 is still dirty in the database buffer cache.

So it was not written into the datafile.

osb3@ORCL2(16:36)[/u01/app/oracle/oradata/ORCL2]$ dd if=test01.dbf bs=8192 skip=15 count=1 | strings | grep 'Hello World'

1+0 records in

1+0 records out

8192 bytes (8.2 kB) copied, 2.3467e-05 seconds, 349 MB/s

I will make the checkpoint to write a data to the datafiles.

16:37:41 SQL> alter system checkpoint;

System altered.

16:38:08 SQL> select FILE#,BLOCK#,DIRTY from v$bh where OBJD=57788 and BLOCK#=15;

FILE# BLOCK# D

---------- ------ -

2 15 N

And the data were written to the datafile.

osb3@ORCL2(16:37)[/u01/app/oracle/oradata/ORCL2]$ dd if=test01.dbf bs=8192 skip=15 count=1 | strings | grep 'Hello World'

1+0 records in

1+0 records out

8192 bytes (8.2 kB) copied, 2.3187e-05 seconds, 353 MB/s

Hello World !

Finally I can end the backup mode.

16:40:40 SQL> alter tablespace TEST end backup;

Tablespace altered.

16:41:15 SQL> select a.FILE#, b.name, a.STATUS, a.CHANGE#, a.TIME

from v$backup a, v$datafile b

where a.FILE#=b.FILE# and a.status='ACTIVE';

no rows selected

16:41:22 SQL> select * from v$backup;

FILE# STATUS CHANGE# TIME

---------- ------------------ ---------- -------------------

1 NOT ACTIVE 0

2 NOT ACTIVE 13309240 2011-05-22 16:32:10

3 NOT ACTIVE 0

4 NOT ACTIVE 0

5 NOT ACTIVE 0

6 NOT ACTIVE 0

10 NOT ACTIVE 0

7 rows selected.

16:43:14 SQL> select FILE#, CHECKPOINT_CHANGE#,CHECKPOINT_TIME from v$datafile;

FILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME

----- ------------------ -------------------

1 13309523 2011-05-22 16:40:36

2 13309523 2011-05-22 16:40:36

3 13309523 2011-05-22 16:40:36

4 13309523 2011-05-22 16:40:36

5 13309523 2011-05-22 16:40:36

6 13309523 2011-05-22 16:40:36

10 13309523 2011-05-22 16:40:36

7 rows selected.

Sun May 22 16:41:15 CEST 2011

alter tablespace test end backup

Sun May 22 16:41:15 CEST 2011

Completed: alter tablespace test end backup

Sun May 22 16:44:36 CEST 2011

My test was based on the very good article under the website

http://www.bluegecko.net/oracle/oracle-tablespace-hot-backup-mode-revisited/

piątek, 20 maja 2011

When a metalink lies about bug fixes

OS: Oracle Enterprise Linux 5.4
Oracle: 10.2.0.5

I tried to run RMAN backup using the following command

osb3@ORCL2(09:47)[/home/oracle]$ rman target /

Recovery Manager: Release 10.2.0.5.0 - Production on Fri May 20 09:47:38 2011

connected to target database: ORCL2 (DBID=3949336569)

RMAN> run {

backup database include current controlfile TAG FULLY

SKIP READONLY

SKIP OFFLINE;

}2> 3> 4> 5>

Starting backup at 2011-05-20 09:48:55

using channel ORA_DISK_1

skipping read-only file 5

skipping offline file 6

RMAN-06060: WARNING: skipping datafile compromises tablespace SOE_INDEX recoverability

channel ORA_DISK_1: starting full datafile backupset

channel ORA_DISK_1: specifying datafile(s) in backupset

input datafile fno=00010 name=/u01/app/oracle/oradata/ORCL2/undotbs.dbf

input datafile fno=00001 name=/u01/app/oracle/oradata/ORCL2/system01.dbf

input datafile fno=00004 name=/u01/app/oracle/oradata/ORCL2/users01.dbf

input datafile fno=00003 name=/u01/app/oracle/oradata/ORCL2/sysaux01.dbf

channel ORA_DISK_1: starting piece 1 at 2011-05-20 09:48:55

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/20/2011 09:49:30

ORA-19694: some changed blocks were not found in the change tracking file

continuing other job steps, job failed will not be re-run

channel ORA_DISK_1: starting full datafile backupset

channel ORA_DISK_1: specifying datafile(s) in backupset

including current control file in backupset

channel ORA_DISK_1: starting piece 1 at 2011-05-20 09:49:30

channel ORA_DISK_1: finished piece 1 at 2011-05-20 09:49:31

piece handle=/u01/app/oracle/ORCL2/backupset/2011_05_20/o1_mf_ncnnf_FULLY_6xd78bsv_.bkp tag=FULLY comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/20/2011 09:49:30

ORA-19694: some changed blocks were not found in the change tracking file

The block change tracking is enabled.

SQL> select*from v$block_change_tracking;

STATUS FILENAME BYTES

---------------------------------------------------------------------

ENABLED /u01/app/oracle/ORCL2/block_change_tracking.trc 11599872

On the metalink I found the note:

BLOCK CHANGE TRACKING GIVES ORA-19694 WHEN USING DIRECT LOAD [ID 300989.1]

It is written:

Likelihood of Occurrence

~~~~~~~~~~~~~~~~~~~~~~~~ 
  This will affect those using SQL Direct Load and RMAN Fast Incremental
  Backups with Change Tracking enabled.
 
Possible Symptoms 
~~~~~~~~~~~~~~~~~ 
  RMAN-571: ===========================================================
  RMAN-569: =============== ERROR MESSAGE STACK FOLLOWS ===============
  RMAN-571: ===========================================================
  RMAN-3009: failure of backup command on ORA_DISK_1 channel at 02/14/2005
  ORA-19694: some changed blocks were not found in the change tracking file

When this error occurs the old incremental backups are suspect and should
not be kept/used for recovery. The incremental level 0 backups are not
affected as that is like a full backup and reads all blocks.
 
Workaround 
~~~~~~~~~~
  Disable Change Tracking

To disable change tracking, use this SQL statement:

  SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;

Patches
~~~~~~~
  
  Fixed in Patchset10.1.0.5, and one-off patches are available 
  for 10.1.0.4 and 10.1.0.3.

I have 10.2.0.5 version with PSU 2.

SQL> select * from registry$history;

ACTION_TIME ACTION NAMESPACE VERSION ID COMMENTS BUNDLE_SERIES

----------------- ------ --------- -------- -- -------------- -------------

11/02/03 08:37:29 APPLY SERVER 10.2.0.5 2 PSU 10.2.0.5.2 PSU

osb3@ORCL2(12:07)[/u01/app/oracle/product/10.2.0.5/OPatch]$ ./opatch lsinventory

Invoking OPatch 10.2.0.5.1

Oracle Interim Patch Installer version 10.2.0.5.1

Oracle Home : /u01/app/oracle/product/10.2.0.5

Central Inventory : /u01/app/oracle/oraInventory

from : /etc/oraInst.loc

OPatch version : 10.2.0.5.1

OUI version : 10.2.0.5.0

OUI location : /u01/app/oracle/product/10.2.0.5/oui

Log file location : /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/opatch2011-05-20_12-07-59PM.log

Patch history file: /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /u01/app/oracle/product/10.2.0.5/cfgtoollogs/opatch/lsinv/lsinventory2011-05-20_12-07-59PM.txt

--------------------------------------------------------------------------

Installed Top-level Products (2):

Oracle Database 10g 10.2.0.1.0

Oracle Database 10g Release 2 Patch Set 4 10.2.0.5.0a

There are 2 products installed in this Oracle Home.

Interim patches (1) :

Patch 10248542 : applied on Thu Feb 03 08:35:01 CET 2011

Unique Patch ID: 13098962

Created on 28 Dec 2010, 00:46:13 hrs PST8PDT

Bugs fixed:

6402302, 9713537, 8350262, 9949948, 8394351, 10327179, 8546356, 9711859

9714832, 9952230, 10248542, 8544696, 9963497, 9772888, 8664189, 10249537

7519406, 9952270, 8277300, 9726739

Of course the workaround works. The backup runs fine when I disabled the block change tracking.
But the bug should be fixed in 10.2.0.5
Once again, don`t trust what Oracle guys write in the Metalink notes.

wtorek, 17 maja 2011

When a process is consuming a CPU

OS: AIX 5300-06
Oracle: 9.2.0.8

I received an alert from the monitoring tool that an oracle process is consuming too much CPU.
First of all quick view on topas.

The monitoring tool is complaining that the monitoring process is consuming too much of CPU. Great.
Let`s check this process.

SQL> select vs.username, vs.program, vs.status,

to_char(vs.logon_time,'YYYY.MM.DD HH24:MI:SS') LOGON_TIME,

vs.sid, vs.serial#, vp.spid bg_pid

from v$session vs, v$process vp

where vs.paddr(+) = vp.addr and vp.spid=1495068;

USERNAME PROGRAM STATUS LOGON_TIME SID SERIAL# BG_PID

---------- ---------------- -------- ------------------- ---- ---------- -----

TIVOLI /opt/IBM/ITM//ai ACTIVE 2011.03.30 18:21:51 26 51336 1495068

x536/bin/kor09co

l@xxx

SQL> SELECT a.INVALIDATIONS,a.SQL_TEXT

FROM v$sqlarea a, v$session b

WHERE a.ADDRESS=b.SQL_ADDRESS AND b.serial#=51336;

SQL_TEXT

-----------------------------------------------------------------------

SELECT sess.PADDR PADDR, COUNT(op.HASH_VALUE) OPENCURS from v$open_cursor op,

v$session sess where sess.SADDR = op.SADDR(+) group by sess.PADDR

sys@XXX> @spid_info.sql

PL/SQL procedure successfully completed.

Enter the OS process id: 1495068

paddr : 070000002C6C7CB8

SID : 26

Username : TIVOLI

Machine : XXX

Program : /opt/IBM/ITM//aix536/bin/kor09col@XXX

Status : ACTIVE

OS User : itmora

Logged On : 30-03-2011 18:21:51

Waiting : SQL*Net message to client 214965 secs, WAITED KNOWN TIME driver

id=1650815232, #bytes=1, =0

Event : SQL*Net message from client, waited 354933526 secs

Event : db file sequential read, waited 802647 secs

Event : latch free, waited 49019 secs

Event : control file sequential read, waited 3104 secs

Event : buffer busy waits, waited 1044 secs

Event : SQL*Net message to client, waited 619 secs

Event : SQL*Net more data to client, waited 38 secs

Log read% : 32

Cur SQL :

------------

SQL Address : 070000002F170D60

SQL Hash value: 4272527880

------------------------------------------

SELECT sess.PADDR PADDR, COUNT(op.HASH_VALUE)

OPENCURS from v$open_cursor op, v$session sess where sess.SADD

R = op.SADDR(+) group by sess.PADDR

I will try to trace this session

sys@XXX> exec dbms_system.set_ev(26,51336,10046,12,'');

PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_sql_trace_in_session(sid=>26,serial#=>51336,sql_trace=>true);

PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>8, nm=>' ');

PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>0, nm=>' ');

PL/SQL procedure successfully completed.

After execute every of these packages I checked udump directory but I didn`t find any trace file.
Last shot

sys@XXX> oradebug setospid 1495068

Oracle pid: 17, Unix process pid: 1495068, image: oracle@XXX (TNS V1-V3)

sys@XXX> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 8;

...

And the session is hanging. Now I have two 2 processed which are high consuming CPU.

The session which was trying to trace another one was hanging.

sys@aresmf4> @spid_info.sql

Enter the OS process id: 819210

paddr : 070000002C6C9CC8

SID : 24

Username : SYS

Machine : XXX

Program : sqlplus@XXX(TNS V1-V3)

Status : INACTIVE

OS User : oraaesmf

Logged On : 16-05-2011 15:44:02

Waiting : ksdxexeother 33 secs, WAITED KNOWN TIME =0, =0, =0

Event : SQL*Net message from client, waited 6302 secs

Event : ksdxexeother, waited 215 secs

Log read% : 0

Cur SQL :

------------

SQL Address : 070000002F5D6000

SQL Hash value: 2346790304

------------------------------------------

select lower(user)||'@'||lower(global_name) id from global_name

Even truss didn`t help so much.

oraaesmf[XXX](/home/oraaesmf)$ truss -aef -o bartek.truss -p 819210

+1 819210: psargs: oraclearesmf4 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

+2 819210: yield() =

+3 819210: kill(1495068, 0) = 0

...

+60171 819210: kill(1495068, 0) = 0

+60172 819210: getprocs(0x0FFFFFFFFFFF9018, 1536, 0x0000000000000000, 0, 0x0FFFFFFFFFFF9010, 1) = 1

+60173 819210: yield()

Let`s try to kill the session

sys@XXX> alter system kill session '26,51336';

ERROR at line 1:

ORA-00031: session marked for kill

USERNAME SID SERIAL# LOGON_TIME PROGRAM SPID STATUS OSUSER

--------------- ----- ------- ------------------- -------------------- ------- -------- ------

TIVOLI 26 51336 2011-03-30 18:21:51 /opt/IBM/ITM//aix536 1495068 KILLED itmora

/bin/kor09col@XXX

And the process with PID 1495068 was still consuming the CPU.
Finally an Unix guy had to kill it.

Probably we are hitting the Oracle bug described in the notes

Killed Session Hung Consuming 100% CPU [ID 458881.1]
Session Marked As Killed But Continues To Consume 100% CPU [ID 752676.1]

Next time I will try to investigate a little bit deeper.