should be made to determine if the needed statistics are available. Also in environments where long running queries are run only one time, it may be beneficial to ensure that statistics are available prior to running the queries.

Some properties of database column statistics are as follows:

yColumn statistics occupy little storage, on average 8-12k per column.

yColumn Statistics are gathered through one full scan of the database file for any given number of columns in the database file.

yColumn statistics are maintained periodically through means of statistics refreshing mechanisms that require a full scan of the database file.

yColumn statistics are packed in one concise data structure that requires few I/Os to page it into main memory during query optimization.

As stated above, statistics may have a direct effect on the quality of the access plan chosen by the query optimizer and thereby influence the end user query performance. Shown below is an illustrative example that underscores the effect of statistics on access plan selection process.

Statistic Usage Example:

Select * from T1, T2 where T1.A=T2.A and T1.B = ’VALUE1’ and T2.C = ‘VALUE2’

Database characteristics: indexes on T1.A and T2.A exist, NO column statistics, T1 has 100 million rows, T2 has 10 million rows. T1 is 1 GB and T2 0.1 GB

Since statistics are not available, the optimizer has to consider default estimates for selectivity of T1.B = ’VALUE1’ ==> 10% T2.C = ‘VALUE2’ ==> 10%

The actual estimates are T1.B = ’VALUE1’ ===>10% and T2.C = ‘VALUE2’ ===>0.00001%

Based on selectivity estimates the optimizer will select the following access plan

Scan(T1) - Probe (T2.A index) - > Probe (T2 Table) ---

the real cost for the above access plan would be approximately 8192 I/Os + 3600 I/Os ~ 11792 I/Os

If column statistics existed on T2.C the selectivity estimate for T2.C = ‘VALUE2’ would be 10 rows or 0.00001%

And the query optimizer would select the following plan instead

Scan(T2) - Probe (T1.A index) - > Probe (T1 Table)

Accordingly the real cost could be calculated as follows:

819 I/Os + 10 I/Os ~ 830 I/Os. The result of having statistics on T2.C led to an access plan that is faster by order of magnitude from a case where no statistics exist.

For more information on database statistics collection see the DB2 for i5/OS Database Performance and Query Optimization manual.

IBM i 6.1 Performance Capabilities Reference - January/April/October 2008

© Copyright IBM Corp. 2008

Chapter 4 - DB2 Performance

 

51

Page 51
Image 51
Intel 170 Servers, AS/400 RISC Server, 7xx Servers manual Statistic Usage Example