piątek, 19 sierpnia 2011

CTWR Consuming Lots Of Cpu

OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 5) 
Oracle: 11.2.0.2

I received an alert that CPU utilization is high on one of the server.


First look at top command.






oracle@asrwed003(/home/oracle)$ ps auxw | grep 4149 | grep -v grep
oracle    4149 84.9  0.2 2525648 18540 ?     Rs   Aug16 3366:47 ora_ctwr_emrep



CTWR process is the block change tracking one.
Let`s search on the Metalink.
Bingo.

Bug 10170431 - CTWR consuming lots of CPU cycles with block tracking (Doc ID 10170431.8)

CTWR Consuming Lots Of Cpu Cycles With Block Tracking (Doc ID 1310304.1)

There are solution to apply the patch but it requires a downtime.
I use a workaround: disable block change tracking.

Let`s check the session on the database using SQL*Plus.


SQL> select * from v$block_change_tracking;

STATUS FILENAME  BYTES
----------
----------------------------------------------------------------------------------------------------
ENABLED /opt/oracle/admin/emrep/bct/blockchange.track   11599872

SQL> alter database disable block change tracking;

Database altered. 



After that CPU utilization came back to normal.



wtorek, 2 sierpnia 2011

SQL from the hell

OS: AIX 5300-09
Oracle: 11.2.0.2

During  some performance research on one of the database I discovered SQL from hell.


USERNAME               SID     SERIAL SQL_ID        EVENT                                  P1         P2         P3 SECONDS_IN_WAIT
--------------- ---------- ---------- ------------- ------------------------------ ---------- ---------- ---------- -------------------------------------------
SCALASIGNAL           1728      31921 cd1wv1fyuhbn0 db file sequential read               440     209309          1            1808 


SQL> select SQL_FULLTEXT from v$sqlarea where sql_id='cd1wv1fyuhbn0';

 /* 1049 */
SELECT /*+ first_rows */ DISTINCT IKV_ID, IKV_ID, WAARDE, DATAANVTV, TS_MUT, HIS_TS_END, 'IKT', null
FROM (SELECT /*+ ordered use_nl(tr ikt tab ikv ikp iko ikpt ivah ivph ikvh ikgh srg srgt)  */ ikt.ikv_id
,ikt.dataanvtv
,ikt.ts_mut
,NVL(ikt.ts_ovr, to_timestamp('99991231','yyyymmdd')) his_ts_end
, s.sect || ';' || s.risgrp || ';' || s.dataanv waarde
FROM   np_sector_risicogroep s
      ,np_inkomstenverhouding i
      ,la_ikv_transactie ikt
      ,la_tijdvakaangifte_ontvangst ta
      ,np_aangiftetijdvak a
      ,np_administratieve_eenheid e
      ,np_loonaangifte_bericht b
WHERE  ikt.ikv_id = i.ikv_id_core
AND    ikt.ikv_type <> 'I'
AND    i.ikv_id = s.ikv_id
AND    s.cdverw = 3
AND    i.atv_id = a.atv_id
AND    a.aeh_id = e.aeh_id
AND    a.atv_type = ta.tva_type
AND    b.ber_id = e.ber_id
AND    b.beridbd_ref = ta.beridbd
AND    b.berdlnr_ref = ta.berdlnr
AND    ikt.tvao_id = ta.tvao_id
AND    ikt.dataanvtv = TO_NUMBER(REPLACE(a.dataanvtv, '-')))
WHERE 1=1  AND TS_MUT > to_timestamp('2011-07-31 13:31:49','yyyy-mm-ddhh24:mi:ss')    
AND DATAANVTV BETWEEN 20070101 AND 99991230
ORDER BY null, 1,2;



Let`s have a look for an explain plan.

SQL> select * from TABLE(dbms_xplan.display_cursor('cd1wv1fyuhbn0', NULL, NULL);




WTF! It looks like never ending SQL.


First let`s remove these 2 hints in the SQL: /*+ first_rows */ and
/*+ ordered use_nl(tr ikt tab ikv ikp iko ikpt ivah ivph ikvh ikgh srg srgt)  */



SQL> alter session set current_schema=SCALASIGNAL;

SQL> explain plan for
    /* 1049 */
    SELECT DISTINCT IKV_ID, IKV_ID, WAARDE, DATAANVTV, TS_MUT, HIS_TS_END, 'IKT', null
    FROM (SELECT
    ikt.ikv_id
    ,ikt.dataanvtv
    ,ikt.ts_mut
    ,NVL(ikt.ts_ovr, to_timestamp('99991231','yyyymmdd')) his_ts_end
    , s.sect || ';' || s.risgrp || ';' || s.dataanv waarde
   FROM   np_sector_risicogroep s
   ,np_inkomstenverhouding i
   ,la_ikv_transactie ikt
   ,la_tijdvakaangifte_ontvangst ta
    ,np_aangiftetijdvak a
    ,np_administratieve_eenheid e
    ,np_loonaangifte_bericht b
    WHERE  ikt.ikv_id = i.ikv_id_core
    AND    ikt.ikv_type <> 'I'
    AND    i.ikv_id = s.ikv_id
    AND    s.cdverw = 3
    AND    i.atv_id = a.atv_id
    AND    a.aeh_id = e.aeh_id
   AND    a.atv_type = ta.tva_type
   AND    b.ber_id = e.ber_id
   AND    b.beridbd_ref = ta.beridbd
   AND    b.berdlnr_ref = ta.berdlnr
   AND    ikt.tvao_id = ta.tvao_id
   AND    ikt.dataanvtv = TO_NUMBER(REPLACE(a.dataanvtv, '-')))
   WHERE 1=1  AND TS_MUT > to_timestamp('2011-07-31 13:31:49','yyyy-mm-dd hh24:mi:ss')      AND DATAANVTV BETWEEN 20070101 AND 99991230
ORDER BY null, 1,2;
 

Explained.

sys@pppls> SELECT PLAN_TABLE_OUTPUT FROM
TABLE(DBMS_XPLAN.DISPLAY('SYS.PLAN_TABLE',NULL,'ALL'))

 
What a difference. Of course never rely on time in explain plan because Oracle optimizer cannot estimate the real execution time.
It is better to look at Cost (%CPU).

Lesson learn: don`t use hints in the SQL when it is not necessary.

poniedziałek, 11 lipca 2011

When a process is consuming a lot of memory

OS: AIX 5300-09
Oracle: 11.2.0.2


I saw by topas command a strange situation on one of the server.



OK, let`s check how much real memory is used by this process.
oraappls[a3ppls](/home/oraappls)$ ps v 1360106
     PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
 1360106      - A    1700:19    1 8553784 5481240 32768 38138 13680  5.4 38.0 rman ta



Simple calculation:
REAL MEMORY USAGE = RSS - TRS [KB]

REAL MEMORY USAGE = (5481240 - 13680) / 1024 = 5339 [MB]

Wow, over 5GB by one process.


Let`s check the session on the database using SQL*Plus.


USERNAME          SID SERIAL# LOGON_TIME          PROGRAM              SPID    STATUS   OSUSER        ACTION     CLIENT_INFO
--------------- ----- ------- ------------------- -------------------- ------- -------- ------------- ---------- -----------
SYS               177      27 2011-07-07 15:53:49 rman@XXX 1720476 ACTIVE   oraappls      0000011 FI
                                                   (TNS V1-V3)                                        NISHED129

SYS               216      35 2011-07-07 15:53:49 rman@XXX 880826  INACTIVE oraappls
                                                   (TNS V1-V3)

SYS                 7       1 2011-07-07 15:53:51 rman@XXX 1540298 INACTIVE oraappls      0000087 FI rman channe
                                                   (TNS V1-V3)                                        NISHED129  l=ORA_SBT_T
                                                                                                                 APE_1

SYS               174    3397 2011-07-11 07:42:28 sqlplus@XXX 1495152 ACTIVE   oraappls
                                                   (TNS V1-V3)



oraappls[XXX](/home/XXX)$ ps -ef | egrep '1720476|880826|1540298' | grep -v grep
oraappls  880826 1360106   0   Jul 07      -  0:00 oracleaXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1540298 1360106   0   Jul 07      -  0:02 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1720476 1360106  44   Jul 07      - 1549:35 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))



These three processes has a child process with PID 1360106.

So, RMAN process is consuming a lot of memory. Let`s check why.

There are no waits in v$session_wait so let`s try another trick.


sys@a3ppls> @spid_info.sql
Enter the OS process id: 1720476
paddr     : 070000006350B3A8
SID       : 177
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : ACTIVE
OS User   : oraappls

Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message to client 0 secs, WAITED SHORT TIME driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 17101820 secs
Event     : SQL*Net message to client, waited 125501 secs
Event     : events in waitclass Other, waited 797 secs
Event     : control file sequential read, waited 360 secs
Event     : db file sequential read, waited 41 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 3 secs
Event     : library cache: mutex X, waited 2 secs
Event     : Disk file operations I/O, waited 2 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 07000000626D0510
SQL Hash value: 3465914171
------------------------------------------
begin sys.dbms_backup_restore.createRmanOutputRow( l0row_id    =
> :l0row_id, l0row_stamp => :l0row_stamp, row_id      => :row_id
, row_stamp   => :row_stamp, txt         => :txt, sameline    =>
:i_sameline); end;
------------------------------------------




sys@a3ppls> @spid_info.sql
Enter the OS process id: 880826
paddr     : 070000006350C418
SID       : 216
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message from client 317460 secs, WAITING driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 14027 secs
Event     : control file sequential read, waited 4 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0



sys@a3ppls> @spid_info.sql     
Enter the OS process id: 1540298
paddr     : 070000006350D488
SID       : 7
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:51
Waiting   : SQL*Net message from client 317515 secs, WAITING driver
id=1650815232, #bytes=1, =0 Event     : SQL*Net message from client, waited 5920 secs
Event     : Backup: MML query backup piece, waited 3444 secs
Event     : Backup: MML write backup piece, waited 1730 secs
Event     : Backup: MML commit backup piece, waited 260 secs
Event     : SQL*Net break/reset to client, waited 194 secs
Event     : Backup: MML create a backup piece, waited 191 secs
Event     : control file sequential read, waited 83 secs
Event     : RMAN backup & recovery I/O, waited 59 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 13 secs
Event     : Disk file operations I/O, waited 13 secs
Event     : Backup: MML initialization, waited 7 secs
Event     : events in waitclass Other, waited 3 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0




The 3rd one looks like interesting. Let`s have a look closer.


oraappls[XXX](/usr/local/oracle/sql)$ sqlplus "/as sysdba"
 

SQL*Plus: Release 11.2.0.2.0 Production on Mon Jul 11 08:08:31 2011
 

sys@a3ppls> EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(7,1,waits=>TRUE, binds=>TRUE, plan_stat=>'ALL_EXECUTIONS');

PL/SQL procedure successfully completed.




oraappls[a3ppls](/a570/app/oracle/diag/rdbms/XXX/XXX/trace)$ tail -f a3ppls_ora_1540298.trc
*** 2011-07-07 15:55:45.349
*** ACTION NAME:(0000077 STARTED111) 2011-07-07 15:55:45.349

Starting control autobackup

*** 2011-07-07 15:55:50.014
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.5.2.0',
        media '22'
        handle 'c-2235007239-20110707-0c'


And nothing more.
The previous SQL for this process was
select count(*) from sys.x$kcrmx


Unfortunately there is no information about it on the Metalink. 
In this case only one solution is killing the process.



oraappls[a3ppls](/home/oracle)$ ps -ef | grep rman | grep -v grep
oraappls 1360106       1  81   Jul 07      - 1715:09 rman target / nocatalog
oraappls[a3ppls](/home/oracle)$ kill -9 1360106





Please have a look that before killing was 54195 free pages ( 54195 * 4096 bytes = 211 MB) and after kill the process 1424898 free pages (1424898 * 4096 bytes = 5566 MB).

piątek, 8 lipca 2011

ORA-01591: lock held by in-doubt distributed transaction


OS: Red Hat Enterprise Linux Server release 5.3
Oracle: 11.1.0.7


One a the application support guy asked me for help.
He received the error


Error starting at line 3 in command:
update scv_cv_publish_xml set status = 'A' where status = 'V' and verwerking_status='A'
Error report:
SQL Error: ORA-01591: lock held by in-doubt distributed transaction 63.8.92859
01591. 00000 -  "lock held by in-doubt distributed transaction %s"
*Cause:    Trying to access resource that is locked by a dead two-phase commit
           transaction that is in prepared state.
*Action:   DBA should query the pending_trans$ and related tables, and attempt
           to repair network connection(s) to coordinator and commit point.
           If timely repair is not possible, DBA should contact DBA at commit
           point if known or end user for correct outcome, or use heuristic
           default if given to issue a heuristic commit or abort command to
           finalize the local portion of the distributed transaction.

So, first look in the database.


SQL> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';


LOCAL_TRAN_ID       STATE
---------------------- ----------------
63.8.92859                prepared

SQL> select a.sql_text, s.osuser, s.username
     from v$transaction t, v$session s, v$sqlarea a where s.taddr = t.addr
     and a.address = s.prev_sql_addr
     and t.xidusn = 63
     and t.xidslot = 8
     and t.xidsqn = 92859;

no rows selected

Ups. Hmm. Metalink help me, please.
Bingo !

How To Resolve Stranded DBA_2PC_PENDING Entries [ID 401302.1]




SQL>  SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
            KTUXESTA Status,
            KTUXECFL Flags
            FROM x$ktuxe
            WHERE ktuxesta!='INACTIVE'
            AND ktuxeusn= 63;

  KTUXEUSN   KTUXESLT KTUXESQN STATUS  FLAGS
---------- ---------- ---------- ---------------- ------------------------
63    8   92859 PREPARED  SCO|COL|REV|DEAD|EXTDTX

SQL> rollback force '63.8.92859';

Rollback complete.

SQL> commit;

Commit complete.

SQL> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';

LOCAL_TRAN_ID       STATE
---------------------- ----------------
63.8.92859                forced rollback

Unfortunately it didn`t help.



SQL> delete from sys.pending_trans$ where local_tran_id='63.8.92859';

1 row deleted.

sys@avac> delete from sys.pending_sessions$ where local_tran_id ='63.8.92859';

1 row deleted.

sys@avac> delete from sys.pending_sub_sessions$ where local_tran_id = '63.8.92859';

0 rows deleted.

sys@avac> commit;

Commit complete.

sys@avac> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';

no rows selected

Solved.

czwartek, 2 czerwca 2011

MIssing data in v$backup_async_io and v$backup_sync_io

OS: AIX 5300-06 
Oracle: 9.2.0.8



I hit a strange problem. I had a problem with a performance of RMAN backup and according to Oracle Metalink note
Known RMAN Performance Problems [ID 247611.1]

RMAN incremental backup performance degrades over time in NOCATALOG mode
Version affected : 8.1 9.0.1
Version fixed    : 8174 9014 9201
Affects          : Target database
Diagnosis        : Many entries in X$KSFQP
Workaround    : SQL> alter system set events 'immediate trace name ksfqp_limit level 1';

I executed on RMAN session

RMAN> sql "alter system set events ''immediate trace name ksfqp_limit level 1''";

sql statement: alter system set events ''immediate trace name ksfqp_limit level 1''
After that I wanted to check a backup performance using the views v$backup_async_io and 
v$backup_sync_io but the datas weren`t up to date.

SQL> select max(open_time),max(close_time) from v$backup_async_io;
MAX(OPEN_TIME)      MAX(CLOSE_TIME)
------------------- -------------------
2011-05-31 10:26:39 2011-05-31 10:26:39

SQL> select max(open_time),max(close_time) from v$backup_sync_io;
MAX(OPEN_TIME)      MAX(CLOSE_TIME)
------------------- -------------------
2011-05-31 10:26:39 2011-05-31 10:26:39
Where are the data after 2011-05-31 10:26:39 ?
Let`s check an alert.log

...
Tue May 31 10:26:38 2011
Starting control autobackup
Tue May 31 10:28:01 2011
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.4.1.0',
        media '2'
        handle 'c-1601303025-20110531-26'
Tue May 31 10:31:19 2011
OS Pid: 2113700 executed alter system set events 'immediate trace name ksfqp_limit level 1'
Tue May 31 10:41:01 2011
Thread 1 advanced to log sequence 59271
  Current log# 3 seq# 59271 mem# 0: /r655/oradata/XXX/redo03.log
  Current log# 3 seq# 59271 mem# 1: /r656/oradata/XXX/redo03.log
Tue May 31 10:41:01 2011
ARCH: Evaluating archive   log 2 thread 1 sequence 59270
ARCH: Beginning to archive log 2 thread 1 sequence 59270
Creating archive destination LOG_ARCHIVE_DEST_1: '/a655/app/oracle/admin/XXX/arch/1_59270.arch'
Tue May 31 10:41:01 2011
ARC0: Evaluating archive   log 2 thread 1 sequence 59270
ARC0: Unable to archive log 2 thread 1 sequence 59270
      Log actively being archived by another process
Tue May 31 10:41:01 2011
ARCH: Completed archiving  log 2 thread 1 sequence 59270
Tue May 31 10:41:58 2011
Starting control autobackup
Tue May 31 10:43:41 2011 
...

Indeed at 10:31:19 I set an event and it was stamped in the alert log.
On the metalink I found the note: Data missing from V$backup_async_io [ID 277605.1]


Symptoms
After taking Rman backup you do not see any data populated in v$backup_aysnc_io.

Cause
You have set this event in init.ora or session level 
alter system set events 'immediate trace name ksfqp_limit level 1

Fix
This Event limits the maximum number of rows in v$backup_[a]sync_io table setting it to 1, 
maximum number of rows is 0; by setting it to 2, 
maximum number of rows is 2 by setting it to 3, 
maximum number of rows is 4 (ie 2 power of (3-1)) and so on.



I fix the problem by executing


SQL> alter system set events 'immediate trace name ksfqp_limit level 20';
System altered.

How to speed up a RMAN backup ?

OS: AIX 5300-06 
Oracle: 9.2.0.8


I had a problem with RMAN backup on one of the database.
The database has 20GB but RMAN backup takes about 1 hour during a night.
It is too long if I compare a size of the database.
RMAN uses Tivoli Data Protection for Oracle and goes to the tape.
I ran the backup manually with the tracing and the debugging options.



orapdvs[XXX](/aXXX/app/oracle/XXX)$ rman target / log=rman1.log debug trace=rman1_debug.trc


RMAN> run {
      allocate channel c1 type 'sbt_tape' DEBUG=5 TRACE=5 parms 
      'ENV=(TDPO_OPTFILE=/a655/app/oracle/etc/tdp/pdvs.opt)';
      backup incremental level 0 cumulative database 
      include current controlfile format 'DB_%u_%p_%c' TAG BARTEK;


In the rman1_debug.trc I found



...
 +3755  DBGRPC:   krmxrpc: xc=4568322960 starting longrunning RPC #28 to target: DBMS_BACKUP_RESTORE.BACKUPPIECECREATE
 +3756  DBGRPC:   krmxr: xc=4568322960 started long running rpc
 +3757  DBGRPC:   krmxpoq: xc=4568322960, action="0000027 FINISHED", col_l=16, ind=0, sid=20
 +3758  DBGRPC:   krmxr: callback returned TRUE, skipping sleep
 +3759  DBGRPC:   krmxpoq: xc=4568322960, action="0000028 STARTED", col_l=15, ind=0, sid=20
 +3760  DBGRPC:   krmxr: sleeping for 1 seconds
 +3761  DBGRPC:   krmxpoq: xc=4568322960, action="0000028 STARTED", col_l=15, ind=0, sid=20
 +3762  DBGRPC:   krmxr: sleeping for 2 seconds
 +3763  DBGRPC:   krmxpoq: xc=4568322960, action="0000028 STARTED", col_l=15, ind=0, sid=20
 +3764  DBGRPC:   krmxr: sleeping for 4 seconds
 +3765  DBGRPC:   krmxpoq: xc=4568322960, action="0000028 STARTED", col_l=15, ind=0, sid=20
 +3766  DBGRPC:   krmxr: sleeping for 8 seconds
 +3767  DBGRPC:   krmxpoq: xc=4568322960, action="0000028 STARTED", col_l=15, ind=0, sid=20
 +3768  DBGRPC:   krmxr: sleeping for 10 seconds
...


On the metalink I found the note: RMAN Hangs Using Tivoli TDP [ID 221212.1]
In this note Oracle writes about parameter called TDPO_NUM_BUFFERS.
In the IBM Redbook "Backing Up Oracle Using Tivoli Storage Management" is written:



Tivoli Data Protection for Oracle now includes an option to transfer Oracle 
data to the TSM server using multiple buffers to help alleviate network 
bottlenecks. 
This option specifies the number of buffers TDP for Oracle uses to support 
multi-buffering. By entering a value, multi buffering is activated.
We measured this options with runtimes of a full database backup with values 
of 2, 4, 8 and without this option. We achieved the best results with only 2 
buffers.


I decided to test using different configurations: without specifying TDPO_NUM_BUFFERS and with values from 2 to 5.
I ran rman incremental level 0 backup on one tape channel.
These results are very interesting.


TDPO_NUM_BUFFERS1st time 2nd time 3rd time
not specified 18 min
16 sec
17 min
25 sec
17 min
30 sec
 2 6 min
36 sec
5 min
26 sec
5 min
46 sec
 3 4 min
45 sec
4 min
36 sec
4 min
35 sec
 4 5 min
45 sec
6 min
46 sec
5 min
5 sec
 5 6 min
56 sec
5 min
16 sec
5 min
5 sec


Time of the backup was decreased about 3 times. Nice improvement.