Professional Documents
Culture Documents
Jonathan Lewis
jonathanlewis.wordpress.com
www.jlcomp.demon.co.uk
Who am I ?
Independent Consultant
28+ years in IT
24+ using Oracle
Strategy, Design, Review,
Briefings, Educational,
Trouble-shooting
Member of the Oak Table Network
Oracle ACE Director
Oracle author of the year 2006
Select Editors choice 2007
UKOUG Inspiring Presenter 2011
UKOUG Council member 2012
ODTUG 2012 Best Presenter (d/b)
O1 visa for USA
Jonathan Lewis
2011
Title
2 / 30
O-1 Visa
Title
3 / 30
Highlights
Why Histograms
Current mechanisms
Problems and workarounds
New mechanisms
Jonathan Lewis
2011
Title
4 / 30
COUNT(*)
52,352
9,416,360
3,499
86,084
CODE
A
B
C
L
O
P
DESCRIPTION
ASSIGNED
HANDED BACK
CLOSED
LOGGED
HANDED OVER
PENDING
Standard Strategy
Frequency histogram with literals in SQL
Other ideas
Change 'commonest value' to null
Virtual columns / Function-based indexes
List partitions
Jonathan Lewis
2011
Title
5 / 30
Problems
Jonathan Lewis
2011
Title
6 / 30
Limits (a)
select
specifier, count(*)
from
messages
group by
specifier
order by
count(*) desc
;
Distinct Specifiers = 352
Frequency Limit is 254
Height-balanced less precise
Popular values use lots of buckets
SPECIFIER
BVGFJB
LYYVLH
MTVMIE
YETSDP
DAJYGS
...
KDCFVJ
JITCRI
DNRYKC
BEWPEQ
...
JXXXRE
OHMNVU
YGOBWQ
UBBWQH
COUNT(*)
1,851,177
719,582
672,823
659,661
504,641
75,328
74,104
70,029
68,681
1
1
1
1
Jonathan Lewis
2011
Title
7 / 30
Limits (b)
Interesting arithmetic - for THIS data set
Top N values
140
210
250
% of data
99.00
99.90
99.98
Jonathan Lewis
2011
Title
8 / 30
Limits (c)
12c allows 2,048 buckets
The default is still 254
Don't be in a rush to use the maximum
Don't forget the optstat history tables
There are several new columns
There are some new costs
Jonathan Lewis
2011
Title
9 / 30
Precision (a)
select
status, count(*)
from
orders
group by
status
order by
status
;
S
C
P
R
S
X
COUNT(*)
529,100
300
300
300
500,000
begin
dbms_stats.gather_table_stats(
tabname
=>'orders',
estimate_percent => dbms_stats.auto_sample_size,
method_opt
=> 'for columns status size 10'
);
end;
/
Jonathan Lewis
2011
Title
10 / 30
Precision (b)
select
from
endpoint_number,
endpoint_number - nvl(prev_endpoint,0) frequency,
chr(to_number(substr(hex_val, 2,2),'XX'))
status
(
select
endpoint_number,
lag(endpoint_number,1) over(
order by endpoint_number
)
prev_endpoint,
to_char(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_val
from
user_tab_histograms
where
table_name = 'ORDERS'
and
column_name = 'STATUS'
)
order by
endpoint_number
/
http://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/
Jonathan Lewis
2011
Title
11 / 30
Precision (c)
Results 11.2.0.3 - four attempts
ENDPOINT_NUMBER
2741
2742
2743
5331
FREQUENCY
2741
1
1
2588
STATUS
C
P
R
X
ENDPOINT_NUMBER
2848
2849
5629
FREQUENCY
2848
1
2780
STATUS
C
P
X
ENDPOINT_NUMBER
2706
2708
5355
FREQUENCY
2706
2
2647
STATUS
C
P
X
ENDPOINT_NUMBER
2852
2854
2856
2859
5472
FREQUENCY
2852
2
2
3
2613
STATUS
C
P
R
S
X
Title
12 / 30
Basic Cost
select
substrb(dump(val,16,0,32),1,120) ep, cnt
from
Jonathan Lewis
2011
Title
13 / 30
Solution (b)
c_array
srec.bkvals
srec.epc
dbms_stats.prepare_column_values(srec, c_array);
dbms_stats.set_column_stats(
ownname
=> user,
tabname
=> 'ORDERS',
colname
=> 'STATUS',
distcnt
=> m_distcnt,
density
=> m_density,
nullcnt
=> m_nullcnt,
srec
=> srec,
avgclen
=> m_avgclen
);
end;
Jonathan Lewis
2011
Title
14 / 30
Solution (a)
declare
srec
c_array
dbms_stats.statrec;
dbms_stats.chararray;
m_distcnt
m_density
m_nullcnt
m_avgclen
number;
number;
number;
number;
begin
m_distcnt
m_density
m_nullcnt
m_avgclen
:=
:=
:=
:=
5;
0.00001;
0;
1;
http://jonathanlewis.wordpress.com/2009/05/28/frequency-histograms/
Jonathan Lewis
2011
Title
15 / 30
Precision (12c)
11.2.0.3
12.1.0.0
ENDPOINT_NUMBER
2741
2742
2743
5331
Jonathan Lewis
2011
FREQUENCY
2741
1
1
2588
STATUS
C
P
R
X
2848
2849
5629
2848 C
1 P
2780 X
2706
2708
5355
2706 C
2 P
2647 X
2852
2854
2856
2859
5472
2852
2
2
3
2613
ENDPOINT_NUMBER
529100
529400
529700
530000
1030000
FREQUENCY
529100
300
300
300
500000
STATUS
C
P
R
S
X
C
P
R
S
X
Title
16 / 30
Basic Principle
0
240
15
255
Jonathan Lewis
2011
Title
17 / 30
Minimising cost
0
240
15
255
We only keep 16,384 items in the hash table for each column.
We discard half the table each time we reach this limit
Jonathan Lewis
2011
Title
18 / 30
Top-Frequency (12c)
select
skewed, count(*)
from
t1
group by
skewed
order by
skewed
;
SKEWED
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
COUNT(*)
4
8
12
16
20
24
28
32
36
116
44
48
52
56
60
64
68
72
76
4
Jonathan Lewis
2011
Title
19 / 30
Top-Frequency (12c)
select
endpoint_value
epv,
endpoint_number
epn,
endpoint_number lag(endpoint_Number,1) over (
order by endpoint_number
)
freq
from
user_tab_histograms
where
table_name = 'T1'
and
column_name = 'SKEWED'
order by
endpoint_value
;
(There is still a little flaw)
Jonathan Lewis
2011
EPV
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
EPN
1
17
37
61
89
121
157
273
317
365
417
473
533
597
665
737
813
814
FREQ
16
20
24
28
32
36
116
44
48
52
56
60
64
68
72
76
1
Title
20 / 30
10
23
38
33
33
12
28
27
19
39
31
24
20
34
16
26
29
28
45
35
13
36
29
28
22
35
27
42
42
32
31
13
26
33
38
41
30
19
27
20
44
46
16
38
34
29
43
33
35
18
22
29
26
21
37
30
25
43
35
27
33
20
18
12
29
33
39
19
8
30
43
38
31
59
50
31
20
32
35
33
28
33
35
34
27
32
29
28
31
27
17
35
22
24
15
28
19
27
31
35
43
19
27
31
35
43
20
27
31
35
44
20
27
31
35
45
20
27
31
35
46
20
28
32
35
50
20
28
32
36
59
Sort
8
21
28
32
37
12
22
28
33
38
12
22
28
33
38
13
22
28
33
38
13
23
29
33
38
13
23
29
33
38
15
24
29
33
39
16
24
29
33
39
16
25
29
33
40
17
26
29
34
41
18
26
30
34
42
18
26
30
34
42
19
27
30
35
43
Jonathan Lewis
2011
Title
21 / 30
12
22
28
33
38
12
22
28
33
38
13
22
28
33
38
13
23
29
33
38
13
23
29
33
38
15
24
29
33
39
16
24
29
33
39
16
25
29
33
40
17
26
29
34
41
18
26
30
34
42
18
26
30
34
42
19
27
30
35
43
19
27
31
35
43
19
27
31
35
43
20
27
31
35
44
20
27
31
35
45
20
27
31
35
46
20
28
32
35
50
20
28
32
36
59
(10.2.0.4+)
29 31 32 33 34 35 36 38 41 43 59
Title
22 / 30
11
Title
23 / 30
12
22
28
33
38
12
22
28
33
38
13
22
28
33
38
13
23
29
33
38
13
23
29
33
38
15
24
29
33
39
16
24
29
33
39
16
25
29
33
40
17
26
29
34
41
18
26
30
34
42
18
26
30
34
42
19
27
30
35
43
19
27
31
35
43
19
27
31
35
43
20
27
31
35
44
20
27
31
35
45
20
27
31
35
46
20
28
32
35
50
20
28
32
36
59
8
21
28
32
37
12
22
28
33
38
12
22
28
33
38
13
22
28
33
38
13
23
29
33
38
13
23
29
33
38
15
24
29
33
39
16
24
29
33
39
16
25
29
33
40
17
26
29
34
41
18
26
30
34
42
18
26
30
34
42
19
27
30
35
43
19
27
31
35
43
19
27
31
35
43
20
27
31
35
44
20
27
31
35
45
20
27
31
35
46
20
28
32
35
50
20
28
32
36
59
Jonathan Lewis
2011
Title
24 / 30
12
Hybrid Histogram
EPN
1
6
endpoint_number,
12
endpoint_value,
20
26
endpoint_repeat_count
32
from
38
user_tab_histograms
44
where
50
58
table_name = 'T1'
69
;
79
7 rows in
the bucket 86
90
92
95
This looks like an old frequency histogram, but
96
each bucket has a "repeat count" showing how
97
often the highest value appears in the bucket.
98
100
select
EPV
8
13
18
20
23
26
27
28
29
31
33
35
38
41
42
43
44
45
46
59
REP
1
3
2
5
2
3
6
6
6
5
8
7
38 appear
5
5 times
1
2
3
1
1
1
1
Jonathan Lewis
2011
Title
25 / 30
/* NDV,NIL,NIL*/
/* TOPN,NIL,NIL,RWID,U18U*/
Title
26 / 30
13
)
order by "VALUE"
Jonathan Lewis
2011
Title
27 / 30
SQL (hybrid)
select
substrb(dump(val,16,0,64),1,20) ep, freq, cdn, ndv,
(sum(pop) over()) popcnt, (sum(pop * freq) over()) popfreq,
substrb(dump(max(val) over(),16,0,64),1,20) maxval,
substrb(dump(min(val) over(),16,0,64),1,20) minval
from
(
select
val, freq, (sum(freq) over()) cdn, (count(*) over()) ndv,
(case when freq > ((sum(freq) over())/15) then 1 else 0 end) pop
from (
select /*+ lots of hints */
"VALUE" val, count("VALUE") freq
from
"TEST_USER"."T1" t
With only 15 buckets this
where
dataset got a hybrid histogram
"VALUE" is not null
group by
"VALUE"
)
)
order by val
/
Jonathan Lewis
2011
Title
28 / 30
14
Title
29 / 30
Hybrid
Capture far more popular values, still samples, and costly
Title
30 / 30
15