poniedziałek, 11 lipca 2011

When a process is consuming a lot of memory

OS: AIX 5300-09
Oracle: 11.2.0.2


I saw by topas command a strange situation on one of the server.



OK, let`s check how much real memory is used by this process.
oraappls[a3ppls](/home/oraappls)$ ps v 1360106
     PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
 1360106      - A    1700:19    1 8553784 5481240 32768 38138 13680  5.4 38.0 rman ta



Simple calculation:
REAL MEMORY USAGE = RSS - TRS [KB]

REAL MEMORY USAGE = (5481240 - 13680) / 1024 = 5339 [MB]

Wow, over 5GB by one process.


Let`s check the session on the database using SQL*Plus.


USERNAME          SID SERIAL# LOGON_TIME          PROGRAM              SPID    STATUS   OSUSER        ACTION     CLIENT_INFO
--------------- ----- ------- ------------------- -------------------- ------- -------- ------------- ---------- -----------
SYS               177      27 2011-07-07 15:53:49 rman@XXX 1720476 ACTIVE   oraappls      0000011 FI
                                                   (TNS V1-V3)                                        NISHED129

SYS               216      35 2011-07-07 15:53:49 rman@XXX 880826  INACTIVE oraappls
                                                   (TNS V1-V3)

SYS                 7       1 2011-07-07 15:53:51 rman@XXX 1540298 INACTIVE oraappls      0000087 FI rman channe
                                                   (TNS V1-V3)                                        NISHED129  l=ORA_SBT_T
                                                                                                                 APE_1

SYS               174    3397 2011-07-11 07:42:28 sqlplus@XXX 1495152 ACTIVE   oraappls
                                                   (TNS V1-V3)



oraappls[XXX](/home/XXX)$ ps -ef | egrep '1720476|880826|1540298' | grep -v grep
oraappls  880826 1360106   0   Jul 07      -  0:00 oracleaXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1540298 1360106   0   Jul 07      -  0:02 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1720476 1360106  44   Jul 07      - 1549:35 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))



These three processes has a child process with PID 1360106.

So, RMAN process is consuming a lot of memory. Let`s check why.

There are no waits in v$session_wait so let`s try another trick.


sys@a3ppls> @spid_info.sql
Enter the OS process id: 1720476
paddr     : 070000006350B3A8
SID       : 177
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : ACTIVE
OS User   : oraappls

Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message to client 0 secs, WAITED SHORT TIME driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 17101820 secs
Event     : SQL*Net message to client, waited 125501 secs
Event     : events in waitclass Other, waited 797 secs
Event     : control file sequential read, waited 360 secs
Event     : db file sequential read, waited 41 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 3 secs
Event     : library cache: mutex X, waited 2 secs
Event     : Disk file operations I/O, waited 2 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 07000000626D0510
SQL Hash value: 3465914171
------------------------------------------
begin sys.dbms_backup_restore.createRmanOutputRow( l0row_id    =
> :l0row_id, l0row_stamp => :l0row_stamp, row_id      => :row_id
, row_stamp   => :row_stamp, txt         => :txt, sameline    =>
:i_sameline); end;
------------------------------------------




sys@a3ppls> @spid_info.sql
Enter the OS process id: 880826
paddr     : 070000006350C418
SID       : 216
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message from client 317460 secs, WAITING driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 14027 secs
Event     : control file sequential read, waited 4 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0



sys@a3ppls> @spid_info.sql     
Enter the OS process id: 1540298
paddr     : 070000006350D488
SID       : 7
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:51
Waiting   : SQL*Net message from client 317515 secs, WAITING driver
id=1650815232, #bytes=1, =0 Event     : SQL*Net message from client, waited 5920 secs
Event     : Backup: MML query backup piece, waited 3444 secs
Event     : Backup: MML write backup piece, waited 1730 secs
Event     : Backup: MML commit backup piece, waited 260 secs
Event     : SQL*Net break/reset to client, waited 194 secs
Event     : Backup: MML create a backup piece, waited 191 secs
Event     : control file sequential read, waited 83 secs
Event     : RMAN backup & recovery I/O, waited 59 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 13 secs
Event     : Disk file operations I/O, waited 13 secs
Event     : Backup: MML initialization, waited 7 secs
Event     : events in waitclass Other, waited 3 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0




The 3rd one looks like interesting. Let`s have a look closer.


oraappls[XXX](/usr/local/oracle/sql)$ sqlplus "/as sysdba"
 

SQL*Plus: Release 11.2.0.2.0 Production on Mon Jul 11 08:08:31 2011
 

sys@a3ppls> EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(7,1,waits=>TRUE, binds=>TRUE, plan_stat=>'ALL_EXECUTIONS');

PL/SQL procedure successfully completed.




oraappls[a3ppls](/a570/app/oracle/diag/rdbms/XXX/XXX/trace)$ tail -f a3ppls_ora_1540298.trc
*** 2011-07-07 15:55:45.349
*** ACTION NAME:(0000077 STARTED111) 2011-07-07 15:55:45.349

Starting control autobackup

*** 2011-07-07 15:55:50.014
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.5.2.0',
        media '22'
        handle 'c-2235007239-20110707-0c'


And nothing more.
The previous SQL for this process was
select count(*) from sys.x$kcrmx


Unfortunately there is no information about it on the Metalink. 
In this case only one solution is killing the process.



oraappls[a3ppls](/home/oracle)$ ps -ef | grep rman | grep -v grep
oraappls 1360106       1  81   Jul 07      - 1715:09 rman target / nocatalog
oraappls[a3ppls](/home/oracle)$ kill -9 1360106





Please have a look that before killing was 54195 free pages ( 54195 * 4096 bytes = 211 MB) and after kill the process 1424898 free pages (1424898 * 4096 bytes = 5566 MB).

piątek, 8 lipca 2011

ORA-01591: lock held by in-doubt distributed transaction


OS: Red Hat Enterprise Linux Server release 5.3
Oracle: 11.1.0.7


One a the application support guy asked me for help.
He received the error


Error starting at line 3 in command:
update scv_cv_publish_xml set status = 'A' where status = 'V' and verwerking_status='A'
Error report:
SQL Error: ORA-01591: lock held by in-doubt distributed transaction 63.8.92859
01591. 00000 -  "lock held by in-doubt distributed transaction %s"
*Cause:    Trying to access resource that is locked by a dead two-phase commit
           transaction that is in prepared state.
*Action:   DBA should query the pending_trans$ and related tables, and attempt
           to repair network connection(s) to coordinator and commit point.
           If timely repair is not possible, DBA should contact DBA at commit
           point if known or end user for correct outcome, or use heuristic
           default if given to issue a heuristic commit or abort command to
           finalize the local portion of the distributed transaction.

So, first look in the database.


SQL> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';


LOCAL_TRAN_ID       STATE
---------------------- ----------------
63.8.92859                prepared

SQL> select a.sql_text, s.osuser, s.username
     from v$transaction t, v$session s, v$sqlarea a where s.taddr = t.addr
     and a.address = s.prev_sql_addr
     and t.xidusn = 63
     and t.xidslot = 8
     and t.xidsqn = 92859;

no rows selected

Ups. Hmm. Metalink help me, please.
Bingo !

How To Resolve Stranded DBA_2PC_PENDING Entries [ID 401302.1]




SQL>  SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
            KTUXESTA Status,
            KTUXECFL Flags
            FROM x$ktuxe
            WHERE ktuxesta!='INACTIVE'
            AND ktuxeusn= 63;

  KTUXEUSN   KTUXESLT KTUXESQN STATUS  FLAGS
---------- ---------- ---------- ---------------- ------------------------
63    8   92859 PREPARED  SCO|COL|REV|DEAD|EXTDTX

SQL> rollback force '63.8.92859';

Rollback complete.

SQL> commit;

Commit complete.

SQL> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';

LOCAL_TRAN_ID       STATE
---------------------- ----------------
63.8.92859                forced rollback

Unfortunately it didn`t help.



SQL> delete from sys.pending_trans$ where local_tran_id='63.8.92859';

1 row deleted.

sys@avac> delete from sys.pending_sessions$ where local_tran_id ='63.8.92859';

1 row deleted.

sys@avac> delete from sys.pending_sub_sessions$ where local_tran_id = '63.8.92859';

0 rows deleted.

sys@avac> commit;

Commit complete.

sys@avac> select local_tran_id, state from dba_2pc_pending where local_tran_id='63.8.92859';

no rows selected

Solved.