.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Saturday, December 17, 2005

Optimizing Materialized Views Part V: HOLAP Query Performance

HOLAP Performance With Different Aggregation Levels

I'm going to construct a slightly different test data set for this experiment, involving the aggregation of 10,000,000 by three different columns each having around 1000 distinct values.

drop table master;

create table master
(
col1 not null,
col2 not null,
col3 not null,
tx_qty not null
)
pctfree 0
nologging
as
select
trunc(dbms_random.value(1,1000)),
trunc(dbms_random.value(1,1000)),
trunc(dbms_random.value(1,1000)),
trunc(dbms_random.value(1,100),2)
from
dual
connect by
1=1 and
level <= 1000000;

The performance test queries will take the form of a query such as the following:

clear screen
alter session set workarea_size_policy = manual;
alter session set sort_area_size = 1000000000;
alter session set sort_area_retained_size = 1000000;
set timing on autotrace on feedback off

-- loop: the following executed four times
alter system flush buffer_cache;
select sum(s_tx_qty) from
(Select
sum(tx_qty) s_tx_qty,
{grouped column list}
From master group by grouping sets
({grouping sets});
-- end of loop

In the four executions of the query the first invokes a hard parse and the autotrace statistics are discarded. The autotrace results of the following three results are averaged.

We will try aggregations at a number of different levels. Some of these will involve hierarchies and some will not, but the aim is to expose as full a range of optimizations as we can. For the sake of my time, my sanity, and blog length I'm going to run these to get simple wall clock times and autotrace statistics for now. The host machine and database will be as quiescent as possible.

We'll start with simple, conventional aggregations to get benchmarks for familiar operations.


Benchmark 1

Benchmark 2

Benchmark 3

Benchmark 4

Group By

None

col1

col1,col2

Col, col2, col3

rows

1

999

997,955

9,949,574

consistent gets

25,074

25,074

25,074

25,074

physical reads

25,062

25,062

25,062

25,062

sorts (memory)

0

1

1

1

Elapsed Time

8.87

18.95

59.70

92.42



In the following table we present the results of the various grouping set tests, with the “Y” indicating that the grouping set was included as part of that test.

Grouping Set(s) Used

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

Test 8

()

Y


Y

Y

Y

Y

Y

Y

(COL1)


Y

Y

Y

Y

Y

Y

Y

(COL2)




Y

Y

Y


Y

(COL3)






Y


Y

(COL1,COL2)





Y

Y

Y

Y

(COL1,COL3)






Y


Y

(COL2,COL3)






Y


Y

(COL1,COL2,COL3)







Y

Y

rows

1

999

1,000

1,999

999,954

2,996,849

10,948,529

12,946,423

recursive calls

0

0

0

185

25

299

270

319

db block gets

0

0

0

23,575

2803

38,219

34745

40,872

consistent gets

25,074

25,074

25074

71,498

33286

122,911

62313

130,765

physical reads

25,062

25,062

25062

71,434

27799

122,776

62242

130,622

redo size

0

0

0

1,929

2,147

3525.33

1513

3,080

sorts (memory)

0

1

1

2

3

4

2

4

sorts (disk)

0

0

0

0

0

0

0

0

Elapsed Time

12.33

16.34

17.84

63.57

71.49

230.23

173.08

266.69



Test 1 is a straight aggregation to find the sum of tx_qty, but expressed as a grouping set of () rather than the usual “Select Sum(my_col) From my_table” syntax. The autotrace statistics are also very conventional and show physical reads very close to consistent gets, with no db block gets or recursive SQL.

This test is logically the same as Benchmark 1, but required an elapsed time 50% longer. The execution plans are in fact different, with the SORT AGGREGATE of the conventional method being replaced with a SORT GROUP BY NOSORT in the grouping sets method.

Test 2 is similarly conventional, finding the sum of tx_qty grouped by col1. Again this is expressed as a grouping set (col1) rather than the regular syntax and the autotrace statistics are very similar to Test 1 with the addition of an in-memory sort. The performance is very similar to that of Benchmark 2, to which it is logically equivalent, and both queries are using a SORT GROUP BY operation.

Test 3 gives our first hierarchical result set, combining the results of Test 1 and Test 2. In fact the result set for Test 3 is logically equivalent to (Benchmark1 UNION ALL Benchmark 2), and the benefits of the HOLAP cube begin to show themselves. The average elapsed time of 17.84 seconds is only 62% of the elapsed times for Tests 1 and 2 combined. The execution plan shows a SORT (GROUP BY ROLLUP) operation which is evidently an optimization specifically for hierarchical cube queries. It appears either that the result for the (COL1) grouping set is being used as the basis for the () grouping set calculation, or that the results for both sets are being computed in the same operation. As we will shortly see, the latter appears to be the more probable scenario.

Test 4 adds into Test 3 a grouping set of (Col2), giving two sets at the same level with no more detailed level included. This result ought to be logically similar to (Test 2 UNION ALL Test 3) -- with the appropriate columns being used -- or (Benchmark 1 UNION ALL Benchmark 2 UNION ALL Benchmark 3). The result is surprising because this time the performance is well in excess of those comparisons, and gives the first hint at a possible contra-indication for HOLAP cube MV's. The elapsed time of 63.57 seconds is 86% higher than the sum of the elapsed times for Tests 2 and 3, of 34.18 seconds.

The explain plan show the creation of two temporary tables, although when the statement is executed these temporary table names are of course different.

------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 170 (85)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 707K| | 170 (85)| 00:00:02 |
| 3 | TEMP TABLE TRANSFORMATION | | | | | |
| 4 | LOAD AS SELECT | | | | | |
| 5 | TABLE ACCESS FULL | MASTER | 10M| 114M| 162 (88)| 00:00:02 |
| 6 | LOAD AS SELECT | | | | | |
| 7 | SORT GROUP BY | | 1 | 26 | 3 (34)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6665_1214756C | 1 | 26 | 2 (0)| 00:00:01 |
| 9 | LOAD AS SELECT | | | | | |
| 10 | SORT GROUP BY ROLLUP | | 1 | 26 | 3 (34)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6665_1214756C | 1 | 26 | 2 (0)| 00:00:01 |
| 12 | VIEW | | 1 | 39 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6666_1214756C | 1 | 39 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------

Here are the STAT lines from a 10046 trace file:

STAT #6 id=1 cnt=1999 pid=0 pos=1 obj=0 op='TEMP TABLE TRANSFORMATION  (cr=71632 pr=66687 pw=23191 time=58587487 us)'
STAT #6 id=10 cnt=1999 pid=1 pos=1 obj=0 op='VIEW (cr=144 pr=6 pw=0 time=27785 us)'
STAT #6 id=11 cnt=1999 pid=10 pos=1 obj=-40016278 op='TABLE ACCESS FULL SYS_TEMP_4254951018 (cr=144 pr=6 pw=0 time=19771 us)'
STAT #6 id=12 cnt=1 pid=1 pos=2 obj=0 op='LOAD AS SELECT (cr=25074 pr=20328 pw=23185 time=19896368 us)'
STAT #6 id=13 cnt=10000000 pid=2 pos=1 obj=260950 op='TABLE ACCESS FULL OBJ#(260950) (cr=25074 pr=20328 pw=0 time=30000104 us)'
STAT #6 id=14 cnt=1 pid=1 pos=3 obj=0 op='LOAD AS SELECT (cr=23208 pr=23185 pw=3 time=18510637 us)'
STAT #6 id=15 cnt=999 pid=4 pos=1 obj=0 op='SORT GROUP BY (cr=23208 pr=23185 pw=0 time=18508219 us)'
STAT #6 id=16 cnt=10000000 pid=5 pos=1 obj=-40016279 op='TABLE ACCESS FULL SYS_TEMP_4254951017 (cr=23208 pr=23185 pw=0 time=30022328 us)'
STAT #6 id=17 cnt=1 pid=1 pos=4 obj=0 op='LOAD AS SELECT (cr=23206 pr=23168 pw=3 time=20142290 us)'
STAT #6 id=18 cnt=1000 pid=7 pos=1 obj=0 op='SORT GROUP BY ROLLUP (cr=23206 pr=23168 pw=0 time=20140880 us)'
STAT #6 id=19 cnt=10000000 pid=8 pos=1 obj=-40016279 op='TABLE ACCESS FULL SYS_TEMP_4254951017 (cr=23206 pr=23168 pw=0 time=30018947 us)'

Cleaned up a little, this reads more easily as:

TEMP TABLE TRANSFORMATION
VIEW
TABLE ACCESS FULL SYS_TEMP_4254951018
LOAD AS SELECT
TABLE ACCESS FULL OBJ#(260950)
LOAD AS SELECT
SORT GROUP BY
TABLE ACCESS FULL SYS_TEMP_4254951017
LOAD AS SELECT
SORT GROUP BY ROLLUP
TABLE ACCESS FULL SYS_TEMP_4254951017

The temporary table definitions are as follows:

CREATE GLOBAL TEMPORARY TABLE "SYS"."SYS_TEMP_0FD9D6669_1214756C"
("C0" NUMBER,
"C1" NUMBER,
"A0" NUMBER )
IN_MEMORY_METADATA CURSOR_SPECIFIC_SEGMENT STORAGE (OBJNO 4254951017 ) NOPARALLEL;

CREATE GLOBAL TEMPORARY TABLE "SYS"."SYS_TEMP_0FD9D666A_1214756C"
("C0" NUMBER,
"C1" NUMBER,
"D0" NUMBER,
"A0" NUMBER )
IN_MEMORY_METADATA CURSOR_SPECIFIC_SEGMENT STORAGE (OBJNO 4254951018 ) NOPARALLEL

You'll note that the global temporary tables' OBJNO clause correlate to part of the temporary table name referenced in the STAT lines – a bit unintuitive that, if you ask me.

So how do we interpret this? Fortunately, a 10046 trace file contains some interesting clues.

Although it shows the creation of the temporary tables it does not reveal any other recursive SQL against them. However it does show wait events of course, and where there is a 'db file scattered read' there are also p1 and p2 parameters which tell you the file number and block number respectively for the object being read. Now the only objects involved here ought to be our MASTER table and our two temporary tables, and we know the file and block numbers for the MASTER table are available from DBA_EXTENTS.

So I'm going to break off there for this entry, and present the analysis of the execution plan, and more, next time.

0 Comments:

Post a Comment

<< Home