wtorek, 17 maja 2011

When a process is consuming a CPU

OS: AIX 5300-06
Oracle: 9.2.0.8


I received an alert from the monitoring tool that an oracle process is consuming too much CPU.
First of all quick view on topas.


The monitoring tool is complaining that the monitoring process is consuming too much of CPU. Great.
Let`s check this process.

SQL> select vs.username, vs.program, vs.status,
     to_char(vs.logon_time,'YYYY.MM.DD HH24:MI:SS') LOGON_TIME,
     vs.sid, vs.serial#, vp.spid bg_pid
     from v$session vs, v$process vp
     where vs.paddr(+) = vp.addr and vp.spid=1495068;

USERNAME   PROGRAM          STATUS   LOGON_TIME          SID   SERIAL# BG_PID
---------- ---------------- -------- ------------------- ---- ---------- -----
TIVOLI     /opt/IBM/ITM//ai ACTIVE   2011.03.30 18:21:51  26   51336  1495068
           x536/bin/kor09co
           l@xxx

SQL> SELECT a.INVALIDATIONS,a.SQL_TEXT
     FROM v$sqlarea a, v$session b
     WHERE a.ADDRESS=b.SQL_ADDRESS AND b.serial#=51336;

SQL_TEXT
-----------------------------------------------------------------------
SELECT sess.PADDR PADDR, COUNT(op.HASH_VALUE) OPENCURS from v$open_cursor op, 
v$session sess where sess.SADDR = op.SADDR(+) group by sess.PADDR

sys@XXX> @spid_info.sql

PL/SQL procedure successfully completed.

Enter the OS process id: 1495068
paddr     : 070000002C6C7CB8
SID       : 26
Username  : TIVOLI
Machine   : XXX
Program   : /opt/IBM/ITM//aix536/bin/kor09col@XXX
Status    : ACTIVE
OS User   : itmora
Logged On : 30-03-2011 18:21:51
Waiting   : SQL*Net message to client 214965 secs, WAITED KNOWN TIME driver
id=1650815232, #bytes=1, =0
Event     : SQL*Net message from client, waited 354933526 secs
Event     : db file sequential read, waited 802647 secs
Event     : latch free, waited 49019 secs
Event     : control file sequential read, waited 3104 secs
Event     : buffer busy waits, waited 1044 secs
Event     : SQL*Net message to client, waited 619 secs
Event     : SQL*Net more data to client, waited 38 secs
Log read% : 32
Cur SQL   :
------------
SQL Address   : 070000002F170D60
SQL Hash value: 4272527880
------------------------------------------
SELECT sess.PADDR               PADDR,   COUNT(op.HASH_VALUE)
OPENCURS from v$open_cursor op, v$session sess where sess.SADD
R = op.SADDR(+) group by sess.PADDR

I will try to trace this session

sys@XXX> exec dbms_system.set_ev(26,51336,10046,12,'');
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_sql_trace_in_session(sid=>26,serial#=>51336,sql_trace=>true); 
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>8, nm=>' ');
PL/SQL procedure successfully completed.

sys@XXX> exec dbms_system.set_ev(si=>26,se=>51336,ev=>10046, le=>0, nm=>' ');
PL/SQL procedure successfully completed.

After execute every of these packages I checked udump directory but I didn`t find any trace file.
Last shot

sys@XXX> oradebug setospid 1495068

Oracle pid: 17, Unix process pid: 1495068, image: oracle@XXX (TNS V1-V3)
sys@XXX>  oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 8;
...


And the session is hanging. Now I have two 2 processed which are high consuming CPU.
The session which was trying to trace another one was hanging.

sys@aresmf4> @spid_info.sql
Enter the OS process id: 819210
paddr     : 070000002C6C9CC8
SID       : 24
Username  : SYS
Machine   : XXX
Program   : sqlplus@XXX(TNS V1-V3)
Status    : INACTIVE
OS User   : oraaesmf
Logged On : 16-05-2011 15:44:02
Waiting   : ksdxexeother 33 secs, WAITED KNOWN TIME =0, =0, =0
Event     : SQL*Net message from client, waited 6302 secs
Event     : ksdxexeother, waited 215 secs
Log read% : 0
Cur SQL   :
------------
SQL Address   : 070000002F5D6000
SQL Hash value: 2346790304
------------------------------------------
select lower(user)||'@'||lower(global_name) id from global_name


Even truss didn`t help so much.

oraaesmf[XXX](/home/oraaesmf)$ truss -aef -o bartek.truss -p 819210

+1  819210: psargs: oraclearesmf4 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
+2  819210: yield()                                         =
+3  819210: kill(1495068, 0)                                = 0
...
+60171  819210: kill(1495068, 0)                                = 0
+60172  819210: getprocs(0x0FFFFFFFFFFF9018, 1536, 0x0000000000000000, 0, 0x0FFFFFFFFFFF9010, 1) = 1
+60173  819210: yield()   


Let`s try to kill the session

sys@XXX> alter system kill session '26,51336';

ERROR at line 1:
ORA-00031: session marked for kill

USERNAME          SID SERIAL# LOGON_TIME          PROGRAM              SPID    STATUS   OSUSER
--------------- ----- ------- ------------------- -------------------- ------- -------- ------
TIVOLI             26   51336 2011-03-30 18:21:51 /opt/IBM/ITM//aix536 1495068 KILLED   itmora
                                                  /bin/kor09col@XXX



And the process with PID 1495068 was still consuming the CPU.
Finally an Unix guy had to kill it.

Probably we are hitting the Oracle bug described in the notes

Killed Session Hung Consuming 100% CPU [ID 458881.1]
Session Marked As Killed But Continues To Consume 100% CPU [ID 752676.1]


Next time I will try to investigate a little bit deeper.

Brak komentarzy:

Prześlij komentarz