今天客户反馈某一个应用部署补丁的时候,执行了一个脚本一个多小时还没有执行完。
语句是下面这样的形式。
insert into em1_rater_00068_01
(select *
from em1_rater_00050_01_backup a
where a.record_id and not exists (select b.record_id
from em1_rater_00068_01 b
where a.record_id = b.record_id));
查看执行计划发现语句的执行计划信息真是惊人,执行计划中竟然出现了27T的字样,但是查看预估的时间却只有35秒左右。而且这个预估是在4个并行的基础上。
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 2879 (100)| | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10003 | 15M| 27T| 2879 (3)| 00:00:35 | | | Q1,03 | P->S | QC (RAND) |
| 3 | INDEX MAINTENANCE | EM1_RATER_00068_01 | | | | | | | Q1,03 | PCWP | |
| 4 | PX RECEIVE | | 15M| 27T| 2879 (3)| 00:00:35 | | | Q1,03 | PCWP | |
| 5 | PX SEND RANGE | :TQ10002 | 15M| 27T| 2879 (3)| 00:00:35 | | | Q1,02 | P->P | RANGE |
| 6 | LOAD AS SELECT | | | | | | | | Q1,02 | PCWP | |
| 7 | HASH JOIN | | 15M| 27T| 2879 (3)| 00:00:35 | | | Q1,02 | PCWP | |
| 8 | PX RECEIVE | | 1816K| 43M| 814 (1)| 00:00:10 | | | Q1,02 | PCWP | |
| 9 | PX SEND BROADCAST | :TQ10001 | 1816K| 43M| 814 (1)| 00:00:10 | | | Q1,01 | P->P | BROADCAST |
| 10 | HASH JOIN RIGHT ANTI | | 1816K| 43M| 814 (1)| 00:00:10 | | | Q1,01 | PCWP | |
| 11 | PX RECEIVE | | 21 | 273 | 0 (0)| | | | Q1,01 | PCWP | |
| 12 | PX SEND BROADCAST | :TQ10000 | 21 | 273 | 0 (0)| | | | Q1,00 | P->P | BROADCAST |
| 13 | PX PARTITION RANGE ALL | | 21 | 273 | 0 (0)| | 1 | 5 | Q1,00 | PCWC | |
| 14 | INDEX RANGE SCAN | EM1_RATER_00068_01_PK | 21 | 273 | 0 (0)| | 1 | 5 | Q1,00 | PCWP | |
| 15 | PX PARTITION RANGE ALL | | 1816K| 20M| 810 (1)| 00:00:10 | 1 | 5 | Q1,01 | PCWC | |
| 16 | INDEX RANGE SCAN | EM1_RECORD_STRM_PERIODKEY_PK | 1816K| 20M| 810 (1)| 00:00:10 | 1 | 5 | Q1,01 | PCWP | |
| 17 | PX PARTITION RANGE ALL | | 25M| 45T| 2013 (1)| 00:00:25 | 1 | 5 | Q1,02 | PCWC | |
| 18 | TABLE ACCESS BY LOCAL INDEX ROWID| EM1_RATER_00050_01_BACKUP | 25M| 45T| 2013 (1)| 00:00:25 | 1 | 5 | Q1,02 | PCWP | |
| 19 | INDEX RANGE SCAN | EM1_RATER_00050_01_BK_PK | 4506K| | 7 (0)| 00:00:01 | 1 | 5 | Q1,02 | PCWP | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
如果不启用并行,执行计划的情况就更糟糕了。
Execution Plan
----------------------------------------------------------
Plan hash value: 3489211022
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25M| 45T| 25M (1)| 83:25:53 | | |
| 1 | PARTITION RANGE ALL | | 1251K| 2319G| 387 (1)| 00:00:05 | 1 | 5 |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| EM1_RATER_00050_01_BACKUP | 1251K| 2319G| 387 (1)| 00:00:05 | 1 | 5 |
|* 3 | INDEX RANGE SCAN | EM1_RATER_00050_01_BK_PK | 225K| | 26 (4)| 00:00:01 | 1 | 5 |
| 4 | PARTITION RANGE ALL | | 2 | 26 | 1 (0)| 00:00:01 | 1 | 5 |
|* 5 | INDEX RANGE SCAN | EM1_RATER_00068_01_PK | 2 | 26 | 1 (0)| 00:00:01 | 1 | 5 |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A"."RECORD_ID" filter( NOT EXISTS (SELECT 0 FROM "EM1_RATER_00068_01" "B" WHERE "B"."RECORD_ID"=:B1))
5 - access("B"."RECORD_ID"=:B1)
实际上这个表中的数据只有几十G,根本不会出现几十T的可能。
可以看出执行计划落差很大,查看了表的统计信息,发现还是存在很大的落差,先启用并行收集统计信息。
exec dbms_stats.gather_table_stats(OWNNAME=>null,tabname=>'EM1_RATER_00050_01_BACKUP',estimate_percent =>dbms_stats.auto_sample_size,METHOD_OPT =>'FOR ALL INDEXED COLUMNS SIZE 1',granularity=>'DEFAULT',cascade=>TRUE,degree=>8,block_sample=>TRUE);
PL/SQL procedure successfully completed.
Elapsed: 00:03:14.68
可以借着这个机会看到收集统计信息的时候,后台还是做了大量的信息计算。
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8h0z7512pn17a, child number 0
-------------------------------------
/* SQL Analyze(0) */ select /*+ full(t) parallel(t,8)
parallel_index(t,8) dbms_stats cursor_sharing_exact use_weak_name_resl
dynamic_sampling(0) no_monitoring no_substrb_pad
*/to_char(count("RECORD_ID")),to_char(substrb(dump(min("RECORD_ID"),16,0
,32),1,120)),to_char(substrb(dump(max("RECORD_ID"),16,0,32),1,120)),to_c
har(count("PERIOD_KEY")),to_char(substrb(dump(min("PERIOD_KEY"),16,0,32)
,1,120)),to_char(substrb(dump(max("PERIOD_KEY"),16,0,32),1,120)),to_char
(count("RECORD_STATUS")),to_char(count("RECORD_TYPE")),to_char(count("RE
SOLUTION_STATUS")),to_char(count("FIELD_00033_O")),to_char(count("FIELD_
00033_C")),to_char(count("FIELD_00279_O")),to_char(count("FIELD_00279_C"
)),to_char(count("FIELD_00436_O")),to_char(count("FIELD_00436_C")),to_ch
ar(count("FIELD_00361_O")),to_char(count("FIELD_00361_C")),to_char(count
("FIELD_00148_O")),to_char(count("FIELD_00148_C")),to_char(count("FIELD_
00341_O")),to_char(count("FIELD_00341_C")),to_char(count("FIELD_00116_O"
)),to_char(count("FIELD_00116。。。。。。
如果这个时候好奇想查看收集统计信息的语句的执行计划,发现更是惊人,里面有901T的字样,绝对是海量数据。
Plan hash value: 2890548601
--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 251K(100)| | | | | | |
| 1 | SORT AGGREGATE | | 1 | 1933K| | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 1933K| | | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 1933K| | | | | Q1,00 | PCWP | |
| 5 | APPROXIMATE NDV AGGREGATE| | 500M| 901T| 251K (42)| 00:50:14 | | | Q1,00 | PCWP | |
| 6 | PX BLOCK ITERATOR | | 500M| 901T| 251K (42)| 00:50:14 | 1 | 5 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL | EM1_RATER_00050_01_BACKUP | 500M| 901T| 251K (42)| 00:50:14 | 1 | 5 | Q1,00 | PCWP | |
--------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access(:Z>=:Z AND :Z 经过了短暂等待的3分钟,一切就绪,再次查看语句的执行计划,指标一下子就降了下来。
然后对这个语句进行了初步分析,发现其实还是可以尝试使用minus操作来做数据过滤。
insert into em1_rater_00068_01
(select b.*
from em1_rater_00050_01_backup b,( select record_id
from em1_rater_00050_01_backup
where record_id
minus
select record_id
from em1_rater_00068_01
where record_id )temp
where b.record_id=temp.record_id
)
对过滤后的数据再次关联就会轻松很多。在不启用并行的情况下执行计划如下:
Plan hash value: 3652964767
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 9709K| 24G| | 2412K (1)| 08:02:31 | | |
| 1 | LOAD TABLE CONVENTIONAL | EM1_RATER_00068_01 | | | | | | | |
|* 2 | HASH JOIN | | 9709K| 24G| 231M| 2412K (1)| 08:02:31 | | |
| 3 | VIEW | | 9709K| 120M| | 70833 (3)| 00:14:10 | | |
| 4 | MINUS | | | | | | | | |
| 5 | SORT UNIQUE | | 9709K| 55M| 111M| | | | |
| 6 | PARTITION RANGE ALL| | 9709K| 55M| | 2560 (1)| 00:00:31 | 1 | 5 |
|* 7 | INDEX RANGE SCAN | EM1_RATER_00050_01_BK_PK | 9709K| 55M| | 2560 (1)| 00:00:31 | 1 | 5 |
| 8 | SORT UNIQUE | | 10M| 57M| 115M| | | | |
| 9 | PARTITION RANGE ALL| | 10M| 57M| | 3195 (1)| 00:00:39 | 1 | 5 |
|* 10 | INDEX RANGE SCAN | EM1_RATER_00068_01_PK | 10M| 57M| | 3195 (1)| 00:00:39 | 1 | 5 |
| 11 | PARTITION RANGE ALL | | 9709K| 24G| | 1079K (2)| 03:35:51 | 1 | 5 |
| 12 | TABLE ACCESS FULL | EM1_RATER_00050_01_BACKUP | 9709K| 24G| | 1079K (2)| 03:35:51 | 1 | 5 |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("B"."RECORD_ID"="TEMP"."RECORD_ID")
7 - access("RECORD_ID" 10 - access("RECORD_ID" 26 rows selected.
使用并行后,执行计划就好多了,根据初步的测试大概在10分钟左右。