OS: AIX 5300-05
Oracle: 9.2.0.8
My colleague received an incident that RMAN backup level 0 didn`t run for almost 2 weeks for production database and decided to run manually.
The database has about 300GB so it is not a big deal run the backup during the day.
I don`t understand why people thing that backup has to be done only in the night. But never mind.
He executed the backup script and looked into a log.
Tue May 10 15:40:08 CEST 2011 -- Day : 2 - Field : 6 -- -- CONNECT AS : target / nocatalog -- TDP OPT FILE: /XXX/app/oracle/etc/tdp/XXX.opt -- TAPE: 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/XXX/app/oracle/etc/tdp/XXX.opt)’ |
And nothing more. We were waiting and waiting but there was no progress.
It seems that a backup script was stuck somewhere.
I checked the sessions on the database and there was not 'rman' at all.
The background process which was executed was running but there was no progress at all.
Even a truss command didn`t show the reason
3674300: 3907729: __libc_sbrk(0x0000000000110020) = 0x00000000103D4DE0 6922378: 10072253: kread(3, "1001 Ó p / ñ ä\0 \0 f `".., 4096) (sleeping...) 8634422: 7528505: kread(0, "\0\0\0\0\0\0\0\0\0\0\0\0".., 4096) (sleeping...) 8319216: 11698395: kread(10, "02 æ\0\006\0\0\0\0\0101D".., 2064) (sleeping...) |
OK. So first of all let`s check using topas what is going on at the system.
oraXXX[XXX](/home/oraXXX)$ ps auxw | grep 7073904 | grep -v grep oraXXX 7073904 6.7 0.0 51524 33220 - A 15:40:14 2:15 oracleXXX (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) |
LOCAL=YES, so it was run from a server.
sys@XXX> select vs.username, vs.program, vs.status, to_char(vs.logon_time,'YYYY.MM.DD HH24:MI:SS') LOGON_TIME, vs.sid, vs.serial#, vp.spid bg_pid from v$session vs, v$process vp where vs.paddr(+) = vp.addr and vp.spid=7073904; USERNAME PROGRAM STATUS LOGON_TIME SID SERIAL# BG_PID ---------- ---------------- -------- ------------------- ---- --------------- SYS sqlplus@XXXXXXXX ACTIVE 2011.05.10 15:40:14 1558 14384 7073904 XXXXXXX (TNS V1- V3) |
Let`s have a look what this session is doing.
sys@XXX> SELECT a.SQL_TEXT FROM v$sqlarea a, v$session b WHERE a.ADDRESS=b.SQL_ADDRESS AND b.serial#='14384'; SQL_TEXT ------------- ----------------------------------------------------------------select decode(count(s.sid),0,'ENQUEUEFREE') from v$session s, v$process p, v$transaction t, v$rollstat r, v$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v$session s, v$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = order by 1 |
Hmm, is this SQL is a part of the backup script ?
view /XXX/local/oracle/bin/rman.sh ... check_enqueue() { sqlplus -s "/ as sysdba" <<EOF | grep [A-Z] set heading off set feedback off set timing off select decode(count(s.sid),0,'ENQUEUEFREE') from v\$session s, v\$process p, v\$transaction t, v\$rollstat r, v\$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v\$session s, v\$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = 2) order by 1 ; EOF } waitEnqueue() { # checking if the controlfile enqueue is available if [ "`check_enqueue`" != "ENQUEUEFREE" ] then echo "Sleeping 60 seconds: controlfile enqueue not available" | tee -a ${LOGFILE} sleep 60 waitEnqueue else doBackup fi } … |
Bingo. Let`s see what the session is waiting for ?
No idle waiting events for this session in the v$session_wait. Only topas shows that a session is consuming a CPU.
Let`s trace the sql.
SQL> alter session set timed_statistics = true; SQL> alter session set statistics_level=ALL; SQL> alter session set max_dump_file_size=unlimited; SQL> alter session set tracefile_identifier='without_rule' SQL> alter session set events = '10046 trace name context forever, level 12 : 10053 trace name context forever, level 1'; SQL> select decode(count(s.sid),0,'ENQUEUEFREE') from v$session s, v$process p, v$transaction t, v$rollstat r, v$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v$session s, v$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = 2) order by 1; SQL> select * from dual; SQL> alter session set events = '10046 trace name context off : 10053 trace name context off '; SQL> exit; |
After using tkprof I have
select decode(count(s.sid),0,'ENQUEUEFREE') from v$session s, v$process p, v$transaction t, v$rollstat r, v$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v$session s, v$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = 2) order by 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ------ ------- --------- ------- Parse 1 0.05 0.06 0 4 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 599.16 785.91 0 0 0 1 ------- ------ -------- ---------- ------ ------- --------- ------- total 4 599.21 785.98 0 4 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: SYS Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=0 r=0 w=0 time=785917722 us) 0 HASH JOIN OUTER (cr=0 r=0 w=0 time=785917712 us) 0 NESTED LOOPS (cr=0 r=0 w=0 time=785917251 us) 0 HASH JOIN OUTER (cr=0 r=0 w=0 time=785917249 us) 0 HASH JOIN OUTER (cr=0 r=0 w=0 time=785916653 us) 0 FIXED TABLE FULL X$KSUSE (cr=0 r=0 w=0 time=785915416 us) 0 FIXED TABLE FULL X$KTCXB 0 FIXED TABLE FULL X$KTURD 0 FIXED TABLE FIXED INDEX X$KSUPR (ind:1) 0 VIEW 0 NESTED LOOPS 0 FIXED TABLE FULL X$KTURD 0 INDEX UNIQUE SCAN I_UNDO1 (object id 34) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 SQL*Net message from client 2 328.41 328.41 |
PARSE: Translates the SQL statement into an execution plan, including checks for proper security authorization and checks for the existence of tables, columns, and other referenced objects.
EXECUTE: Actual execution of the statement by Oracle. For INSERT, UPDATE, and DELETE statements, this modifies the data. For SELECT statements, this identifies the selected rows.
FETCH: Retrieves rows returned by a query. Fetches are only performed for SELECT statements.
SQL Trace Statistic
COUNT: Number of times a statement was parsed, executed, or fetched.
CPU: Total CPU time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.
ELAPSED: Total elapsed time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.
DISK: Total number of data blocks physically read from the datafiles on disk for all parse, execute, or fetch calls.
QUERY: Total number of buffers retrieved in consistent mode for all parse, execute, or fetch calls. Usually, buffers are retrieved in consistent mode for queries.
CURRENT: Total number of buffers retrieved in current mode. Buffers are retrieved in current mode for statements such as INSERT, UPDATE, and DELETE.
Let`s have a look at tkprof output. We have
FETCH/CPU 599.21 and FETCH/ELAPSED 785.98. This select is very costly.
I try to force using a RULE optimizer in this select.
SQL> select /*+ RULE */ decode(count(s.sid),0,'ENQUEUEFREE') from v$session s, v$process p, v$transaction t, v$rollstat r, v$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v$session s, v$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = 2) order by 1; |
Let`s see an output from TKPROF.
select /*+ RULE */ decode(count(s.sid),0,'ENQUEUEFREE') from v$session s, v$process p, v$transaction t, v$rollstat r, v$rollname n where s.paddr = p.addr and s.taddr = t.addr (+) and t.xidusn = r.usn (+) and r.usn = n.usn (+) and s.username is not null and s.sid = (select s.sid from v$session s, v$enqueue_lock l where l.sid = s.sid and l.type = 'CF' and l.id1 = 0 and l.id2 = 2) order by 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ------- ------- --------- ------ Parse 1 0.01 0.01 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.30 0.38 0 2 1258 1 ------- ------ -------- ---------- ------- ------- --------- ------ total 4 0.31 0.39 0 4 1258 1 Misses in library cache during parse: 1 Optimizer goal: RULE Parsing user id: SYS Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=2 r=0 w=0 time=381085 us) 0 FILTER (cr=2 r=0 w=0 time=381075 us) 1440 MERGE JOIN OUTER (cr=2 r=0 w=0 time=169179 us) 1440 SORT JOIN (cr=0 r=0 w=0 time=165098 us) 1440 MERGE JOIN OUTER (cr=0 r=0 w=0 time=158664 us) 1440 SORT JOIN (cr=0 r=0 w=0 time=155327 us) 1440 MERGE JOIN OUTER (cr=0 r=0 w=0 time=148218 us) 1440 SORT JOIN (cr=0 r=0 w=0 time=64458 us) 1440 MERGE JOIN (cr=0 r=0 w=0 time=56813 us) 1446 FIXED TABLE FULL X$KSUPR (cr=0 r=0 w=0 time=13101 us) 1440 SORT JOIN (cr=0 r=0 w=0 time=41178 us) 1440 FIXED TABLE FULL X$KSUSE (cr=0 r=0 w=0 time=32526 us) 40 SORT JOIN (cr=0 r=0 w=0 time=81663 us) 40 FIXED TABLE FULL X$KTCXB (cr=0 r=0 w=0 time=80434 us) 40 SORT JOIN (cr=0 r=0 w=0 time=1763 us) 72 FIXED TABLE FULL X$KTURD (cr=0 r=0 w=0 time=1005 us) 40 SORT JOIN (cr=2 r=0 w=0 time=1670 us) 72 VIEW (cr=2 r=0 w=0 time=958 us) 72 NESTED LOOPS (cr=2 r=0 w=0 time=924 us) 72 FIXED TABLE FULL X$KTURD (cr=0 r=0 w=0 time=649 us) 72 INDEX UNIQUE SCAN I_UNDO1 (cr=2 r=0 w=0 time=143 us)(object id 34) 0 MERGE JOIN (cr=0 r=0 w=0 time=211198 us) 0 SORT JOIN (cr=0 r=0 w=0 time=211196 us) 0 MERGE JOIN (cr=0 r=0 w=0 time=211183 us) 0 SORT JOIN (cr=0 r=0 w=0 time=211182 us) 0 MERGE JOIN (cr=0 r=0 w=0 time=211168 us) 3 SORT JOIN (cr=0 r=0 w=0 time=64881 us) 3 FIXED TABLE FULL X$KSQRS (cr=0 r=0 w=0 time=64826 us) 0 SORT JOIN (cr=0 r=0 w=0 time=146249 us) 663 FIXED TABLE FULL X$KSQEQ (cr=0 r=0 w=0 time=145238 us) 0 SORT JOIN 0 FIXED TABLE FULL X$KSUSE 0 SORT JOIN 0 FIXED TABLE FULL X$KSUSE Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 SQL*Net message from client 2 8.38 8.38 |
What a difference ! Now I have
FETCH/CPU 0.31 and FETCH/ELAPSED 0.39 so it is almost 2000 times better than before.
Brak komentarzy:
Prześlij komentarz