poniedziałek, 11 lipca 2011

When a process is consuming a lot of memory

OS: AIX 5300-09
Oracle: 11.2.0.2


I saw by topas command a strange situation on one of the server.



OK, let`s check how much real memory is used by this process.
oraappls[a3ppls](/home/oraappls)$ ps v 1360106
     PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
 1360106      - A    1700:19    1 8553784 5481240 32768 38138 13680  5.4 38.0 rman ta



Simple calculation:
REAL MEMORY USAGE = RSS - TRS [KB]

REAL MEMORY USAGE = (5481240 - 13680) / 1024 = 5339 [MB]

Wow, over 5GB by one process.


Let`s check the session on the database using SQL*Plus.


USERNAME          SID SERIAL# LOGON_TIME          PROGRAM              SPID    STATUS   OSUSER        ACTION     CLIENT_INFO
--------------- ----- ------- ------------------- -------------------- ------- -------- ------------- ---------- -----------
SYS               177      27 2011-07-07 15:53:49 rman@XXX 1720476 ACTIVE   oraappls      0000011 FI
                                                   (TNS V1-V3)                                        NISHED129

SYS               216      35 2011-07-07 15:53:49 rman@XXX 880826  INACTIVE oraappls
                                                   (TNS V1-V3)

SYS                 7       1 2011-07-07 15:53:51 rman@XXX 1540298 INACTIVE oraappls      0000087 FI rman channe
                                                   (TNS V1-V3)                                        NISHED129  l=ORA_SBT_T
                                                                                                                 APE_1

SYS               174    3397 2011-07-11 07:42:28 sqlplus@XXX 1495152 ACTIVE   oraappls
                                                   (TNS V1-V3)



oraappls[XXX](/home/XXX)$ ps -ef | egrep '1720476|880826|1540298' | grep -v grep
oraappls  880826 1360106   0   Jul 07      -  0:00 oracleaXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1540298 1360106   0   Jul 07      -  0:02 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oraappls 1720476 1360106  44   Jul 07      - 1549:35 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))



These three processes has a child process with PID 1360106.

So, RMAN process is consuming a lot of memory. Let`s check why.

There are no waits in v$session_wait so let`s try another trick.


sys@a3ppls> @spid_info.sql
Enter the OS process id: 1720476
paddr     : 070000006350B3A8
SID       : 177
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : ACTIVE
OS User   : oraappls

Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message to client 0 secs, WAITED SHORT TIME driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 17101820 secs
Event     : SQL*Net message to client, waited 125501 secs
Event     : events in waitclass Other, waited 797 secs
Event     : control file sequential read, waited 360 secs
Event     : db file sequential read, waited 41 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 3 secs
Event     : library cache: mutex X, waited 2 secs
Event     : Disk file operations I/O, waited 2 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 07000000626D0510
SQL Hash value: 3465914171
------------------------------------------
begin sys.dbms_backup_restore.createRmanOutputRow( l0row_id    =
> :l0row_id, l0row_stamp => :l0row_stamp, row_id      => :row_id
, row_stamp   => :row_stamp, txt         => :txt, sameline    =>
:i_sameline); end;
------------------------------------------




sys@a3ppls> @spid_info.sql
Enter the OS process id: 880826
paddr     : 070000006350C418
SID       : 216
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:49
Waiting   : SQL*Net message from client 317460 secs, WAITING driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 14027 secs
Event     : control file sequential read, waited 4 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0



sys@a3ppls> @spid_info.sql     
Enter the OS process id: 1540298
paddr     : 070000006350D488
SID       : 7
Username  : SYS
Machine   : XXX
Program   : rman@XXX (TNS V1-V3)
Status    : INACTIVE
OS User   : oraappls
Logged On : 07-07-2011 15:53:51
Waiting   : SQL*Net message from client 317515 secs, WAITING driver
id=1650815232, #bytes=1, =0 Event     : SQL*Net message from client, waited 5920 secs
Event     : Backup: MML query backup piece, waited 3444 secs
Event     : Backup: MML write backup piece, waited 1730 secs
Event     : Backup: MML commit backup piece, waited 260 secs
Event     : SQL*Net break/reset to client, waited 194 secs
Event     : Backup: MML create a backup piece, waited 191 secs
Event     : control file sequential read, waited 83 secs
Event     : RMAN backup & recovery I/O, waited 59 secs
Event     : control file single write, waited 25 secs
Event     : control file parallel write, waited 13 secs
Event     : Disk file operations I/O, waited 13 secs
Event     : Backup: MML initialization, waited 7 secs
Event     : events in waitclass Other, waited 3 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 00
SQL Hash value: 0




The 3rd one looks like interesting. Let`s have a look closer.


oraappls[XXX](/usr/local/oracle/sql)$ sqlplus "/as sysdba"
 

SQL*Plus: Release 11.2.0.2.0 Production on Mon Jul 11 08:08:31 2011
 

sys@a3ppls> EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(7,1,waits=>TRUE, binds=>TRUE, plan_stat=>'ALL_EXECUTIONS');

PL/SQL procedure successfully completed.




oraappls[a3ppls](/a570/app/oracle/diag/rdbms/XXX/XXX/trace)$ tail -f a3ppls_ora_1540298.trc
*** 2011-07-07 15:55:45.349
*** ACTION NAME:(0000077 STARTED111) 2011-07-07 15:55:45.349

Starting control autobackup

*** 2011-07-07 15:55:50.014
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.5.2.0',
        media '22'
        handle 'c-2235007239-20110707-0c'


And nothing more.
The previous SQL for this process was
select count(*) from sys.x$kcrmx


Unfortunately there is no information about it on the Metalink. 
In this case only one solution is killing the process.



oraappls[a3ppls](/home/oracle)$ ps -ef | grep rman | grep -v grep
oraappls 1360106       1  81   Jul 07      - 1715:09 rman target / nocatalog
oraappls[a3ppls](/home/oracle)$ kill -9 1360106





Please have a look that before killing was 54195 free pages ( 54195 * 4096 bytes = 211 MB) and after kill the process 1424898 free pages (1424898 * 4096 bytes = 5566 MB).

Brak komentarzy:

Prześlij komentarz