You are on page 1of 25

Advanced geostatistics in Reservoir Modeling

Statistics for Geoscience Applications

Stephen Tyson

Australian School of Petroleum, Adelaide

School of Petroleum Engineering, UNSW

Statistics for Geoscience Applications

Univariate Statistics

Bivariate Statistics

Multivariate Statistics

September 2006 1
1
Advanced geostatistics in Reservoir Modeling

Univariate Statistics

Frequency tables and histograms


Measures of locations
Measures of dispersion
Measures of shape
Box plots

A Porosity Dataset

12.64 14.56 15.89 16.26 16.85 17.68 18.55 19.31 19.94 20.74
12.85 14.74 15.90 16.50 17.15 17.72 18.62 19.32 20.04 21.12
13.56 15.03 15.96 16.54 17.18 17.75 18.85 19.33 20.07 21.14
13.62 15.28 16.04 16.58 17.24 17.78 18.88 19.36 20.08 21.50
14.03 15.36 16.08 16.58 17.27 17.82 18.90 19.42 20.12 21.75
14.09 15.39 16.09 16.59 17.42 18.04 18.93 19.59 20.17 22.38
14.13 15.42 16.15 16.75 17.50 18.04 19.00 19.62 20.34 22.43
14.21 15.43 16.17 16.79 17.58 18.06 19.08 19.76 20.35 22.53
14.25 15.43 16.23 16.83 17.62 18.24 19.24 19.84 20.49 23.31
14.51 15.67 16.25 16.85 17.66 18.48 19.25 19.90 20.58 23.34

September 2006 2
2
Advanced geostatistics in Reservoir Modeling

Frequency Tables and Histograms


Given a set of data:
Look for the min & max values
Divide the range of values into certain intervals (bins)
Count the number of data fall within each interval (bin)
Make a frequency table
Make a histogram by plotting the data in the frequency table

Porosity Frequency 20
12 0 18
13 2
16
14 2
14
15 8 Frequency
16 11 12
17 18 10
18 14 8
19 11 6
20 15 4
21 10
2
22 4
0
23 3
12 13 14 15 16 17 18 19 20 21 22 23 24
24 2
Poros ity

Cumulative Plots

Given a frequency table:


Add the number of data which are smaller than a given value
Divide the number of data by the total number of data
Make a cumulative frequency table
Make a cumulative histogram
Porosity Frequency Cumulative Freq Cumulative % 100%
12 0 0 0%
13 2 2 2% 80%
14 2 4 4%
15 8 12 12%
Cumulative

60%
16 11 23 23%
17 18 41 41%
40%
18 14 55 55%
19 11 66 66%
20 15 81 81% 20%

21 10 91 91%
22 4 95 95% 0%
23 3 98 98% 12 13 14 15 16 17 18 19 20 21 22 23 24
24 2 100 100% Poros ity

September 2006 3
3
Advanced geostatistics in Reservoir Modeling

Probability Distribution
20
18
16
14
Frequency

12
10
8
6
4
2
0
12 13 14 15 16 17 18 19 20 21 22 23 24
Por os ity
Frequency % Prob Distribution
20%

15%

Frequency %
10%

5%

0%
12 13 14 15 16 17 18 19 20 21 22 23 24
Por os ity

Measures of Locations

Extreme values (Minimum and Maximum)

Quartiles (Q1=25th percentile, Q3=75th percentile)

Median (Q2=50th percentile)

Mode: The most frequent value (the value with the tallest bar)
1 n
Arithmetic mean: m= xi
n i =1

( x )
1
n
Geometric mean: m= i =1 i
n

1
1 n 1
Harmonic mean: m = i =1
n xi

September 2006 4
4
Advanced geostatistics in Reservoir Modeling

Measures of Dispersion

Range: r = xmax xmin

Interquartile range: IQR = Q3 Q1

1 n
Variance: 2 = (xi m) 2
n i =1

Standard deviation = 2

Coefficient of variation:

Cv =
m

Measures of Shape

Modality: Number of modes (unimodal, bimodal, polymodal)

Skewness (degree of symmetry):

s>0
(x m )
n 3
1
i s<0
s= i =1
n

3
Kurtosis (degree of peakedness):

k>0
(x m )
n 4
1
i k=0
k= i =1
3
n

4 k<0

September 2006 5
5
Advanced geostatistics in Reservoir Modeling

Example of Summary Statistics

IQR

x min x max

Q1 Q2 Q3 Variable
r

Box Plots
Outliers
Extremes

Lower Upper
hinge hinge
IQR
3 x IQR
1.5 x IQR
x x oo o oo o o xx

Lower Upper
whisker whisker
Variable
Q1 Q3
Q2

September 2006 6
6
Advanced geostatistics in Reservoir Modeling

Box Plots: Example

12

10

8
Deep Resisitivity

0
N= 16 67 82 23 25 18
A B C D E F

Facies

Bivariate Statistics

Scatterplots

Correlation coefficient

Regression

September 2006 7
7
Advanced geostatistics in Reservoir Modeling

Common Variables that are related

Porosity and permeability


Porosity and water saturation
Porosity and seismic amplitude
Density and molecular weight
Oil density and viscosity
Formation thickness and productivity
Sand-body width and thickness

Dependency
1

0.8
Independent
Variable B

0.6

0.4 no
0.2 Correlation
0
0 5 10 15
Variable A 20 25 30 35

300
250
Perfect
Variable D

200
150
Correlation
100
50
0
0 4 8 12 16
Variable C

September 2006 8
8
Advanced geostatistics in Reservoir Modeling

Partial Dependency
20

18

16
Variable F

14

12

10 This is usually
8
the real world
6

4
0 5 Variable E
10 15 20 25

Scatterplots
True versus Estimate
16.0
True Value

0.0 16.0
Estimate

Bivariate display, typically


two covariates (e.g. porosity and permeability) at SAME
location
the same variable at different locations - separated by some
distance vector
estimated value v. true value
Good for spotting aberrant data

September 2006 9
9
Advanced geostatistics in Reservoir Modeling

Log10(k) v Porosity (500 samples)


4.0
Log10 Permeability

3.0

2.0

1.0

0.0
0 10 20 30 40 50

Porosity, %

Scatterplots
Log10 Permeability

Marginal
Histogram of
Permeability

Porosity

Marginal
Histogram of
Porosity

September 2006 10
10
Advanced geostatistics in Reservoir Modeling

Bivariate Histogram for Porosity &


Log10 Permeabilty
Log10 Permeability
0 0.5 1 1.5 2 2.5 3 3.5 4 Total
5 0 3 16 3 0 0 0 0 22
10 0 2 24 21 0 0 0 0 47
Marginal
15 0 2 32 39 4 1 0 0 78
Distribution
20 0 0 19 56 21 1 0 0 97
of Porosity
Porosity

25 0 0 5 42 45 6 1 0 99
30 0 0 0 16 37 20 4 0 77
35 0 0 0 5 18 20 4 0 47
40 0 0 0 0 6 8 8 0 22
45 0 0 0 0 1 4 3 0 8
50 0 0 0 0 0 1 2 0 3
Total 0 7 96 182 132 61 22 0 500

Log10 Perm. Conditional Distribution for 30<Porosity<=35

Log10 Perm. Marginal Distribution

Bivariate Histogram

60

50

40

Number 30
50
20
35
10
20 Porosity, %
0
4.0 3.5
3.0 2.5 5
2.0 1.5
1.0 0.5

Log10 Permeability, md

September 2006 11
11
Advanced geostatistics in Reservoir Modeling

Covariance
The degree to which x and y go up and down together is quantified by a
calculation known as the covariance.

n
1
c(x,y)=Covxy= xy =
n
(x
i =1
i X )( y i Y )

where n is the number of data points


and X and Y denote the average n
1
values of x and y, respectively. 2
=
n

i =1
( xi X ) 2

It has the same units as the data

= XY X Y

Types of correlation

Y is positively Y and X are Y is negatively


correlated with X not correlated with
correlated X

September 2006 12
12
Advanced geostatistics in Reservoir Modeling

Quantifying (Linear) Dependency


with a Correlation Coefficient
Correlation coefficient, r, is a dimensionless value which describes the strength
of relationship between 2 variables

r=
( x i X )( yi Y )
=
xy n X Y
n x y n x y

where
x = standard deviation of variable x
y = standard deviation of variable y
Used by most software
Microsoft Excel: CORREL(array1, array2)
Dividing by the product of the std devs makes r unitless
And -1 <= r <= 1

Note: Independence implies r =0, but not vice versa

Example Correlation
Coefficient
Porosity Perm Porosity Perm Porosity Perm

29.06 737.10 8.27 309.00 30.00 1476.40

30.00 1218.20 17.23 627.20 30.00 1166.30

21.61 670.70 16.90 1028.00 30.00 1413.40

10.08 419.90 30.00 1528.00 24.38 2093.10

30.00 988.30 7.95 563.30 30.00 1815.30

30.00 1255.50 30.00 1431.10 26.22 982.70

30.00 3225.80 7.23 440.50 30.00 1661.70

17.70 785.40 14.47 1343.10 16.23 633.50

30.00 1050.70 23.38 849.50 15.12 336.60


30.00 1131.90 25.72 603.40 30.00 899.50

September 2006 13
13
Advanced geostatistics in Reservoir Modeling

Example Correlation
Coefficient
Porosity Perm

Mean 23.39 1089.50


Standard Deviation 7.91 594.36

xy 847292.145

n 30

847, 292.145 30 23.39 1089.50


r= = 0.5879
30 7.91 594.36
Microsoft Excel: CORREL(array1, array2) = 0.5879

Correlation Coefficient - Interpretation

What do I do with this number?


0.5683 indicates that porosity and permeability
are partially correlated
A correlation coefficient of 0.5683 means that
32.3% (= 0.56832) of the variation in
permeability is associated with the variability in
porosity.
r is a measure of how close the points come to
falling on a straight line

September 2006 14
14
Advanced geostatistics in Reservoir Modeling

Interpreting Correlation Coefficients


r=0 > r <1

r = -1
r = +1

Correlation: Problem areas Look at the


scatter plot

r low r high

Outliers significantly reduce Single point induces high


correlation coefficient correlation coefficient

r=0

r = undefined

No linear correlation, but


clearly NOT independent

September 2006 15
15
Advanced geostatistics in Reservoir Modeling

Non-linear Correlation

Non-linear Correlation

Correlation Coefficient - Interpretation

Since r is a measure of how close the points come


to falling on a straight line it is an indicator of how
successful we might be in predicting one variable
from another
If r is high then for a given value of one
variable, then we know that the other variable
is restricted to only a small range of values
If r is low then knowing the value of one
variable does not give us much information on
the other

September 2006 16
16
Advanced geostatistics in Reservoir Modeling

Relationship between Correlation


Coefficient and Co-Variance
r=
( x i X )( y i Y )
=
xy n X Y
n x y n x y
n
1
xy =
n
(x
i =1
i X )( y i Y )

So,

xy Cov xy
r= = That is, a dimensionless covariance
x y x y
Correlation Coefficient

Rank Correlation
When Linear correlation is a poor measure, we can correlate
the Ranks of the values instead
Rank is the position of a data value when sorted in
ascending order.
Smallest has Rank=1 and largest has Rank =N
Sort in ascending order of first (independent) variable
Cov RxRy ( R xi R x ) * ( R yi R y )
rR = =
Rx Ry n Rx Ry
What is relationship between the mean of Rx and mean of Ry?
If n is large, what is a good approximation of the means

September 2006 17
17
Advanced geostatistics in Reservoir Modeling

Example: Sample of 100 Porosities


30.3 29.7 16.9 9.2 21.1 23.5 17.8 26.3 28.3 30.9

39.8 27.4 19.1 20.9 5.1 35.6 22.8 34.2 17.9 23.4

37.5 29.4 29.3 25.5 16.2 19.5 28.2 28.1 26.8 38.1

14.7 21.4 31.7 24.3 26.5 34.9 14.3 5.7 22.2 37.0

23.7 26.0 29.6 28.4 11.5 17.8 22.1 23.0 7.6 13.3

25.0 29.9 26.1 15.1 10.8 26.3 26.0 18.4 20.7 22.4

33.8 29.2 31.9 34.6 11.3 24.4 9.5 4.1 15.8 27.2

12.0 24.0 39.1 12.9 42.1 35.1 11.7 14.7 43.6 12.2

20.5 26.9 20.1 29.5 31.5 32.5 16.5 17.3 21.2 13.0

7.8 9.1 25.9 8.0 2.5 21.9 11.1 28.3 12.4 18.3

Example: 100 Permeabilities measured on


same samples as the 100 porosities
442.5 155.0 26.5 41.4 741.6 36.9 143.8 75.2 97.3 166.6

729.6 328.2 93.9 153.2 17.9 179.2 101.7 125.1 69.2 65.6

436.1 133.7 630.9 853.4 58.6 48.2 102.5 545.1 99.0 2949.7

44.2 42.7 1072.7 64.1 97.6 923.8 67.2 26.9 192.4 1196.4

35.3 139.9 282.8 206.5 22.0 54.7 143.0 160.9 36.3 33.4

72.1 203.2 80.2 31.1 76.1 40.6 59.3 89.0 151.9 46.0

2715.6 474.6 318.8 137.9 52.3 110.8 30.1 12.0 105.2 625.9

44.0 100.7 1979.1 92.1 2467.2 865.1 49.6 33.4 404.4 30.9

157.3 781.0 46.4 303.1 307.1 683.6 61.5 19.9 297.4 29.2

24.3 55.6 107.5 36.8 14.9 111.7 33.7 136.2 56.4 90.0

September 2006 18
18
Advanced geostatistics in Reservoir Modeling

Correlation coefficients between porosity


and permeability measurements
3000

2500
r = 0.564
rrank = 0.816
Permability, md

2000

1500

1000

500

0
0 10 20 30 40 50
Porosity, %

Interpreting Rank Correlation


Learning from the difference between rrank and r
If rrank > r
There may be a few outliers are spoiling an otherwise
good correlation
There may be a non-linear relationship
If rrank < r
then a few outliers are enhancing an otherwise poor
correlation
If rrank = 1
Then the relationship is monotonic, but not necessarily
linear
e.g. r for Y=X2 will be close to zero but rrank = 1
In these cases a non-linear transform of one covariate can
make r = 1

September 2006 19
19
Advanced geostatistics in Reservoir Modeling

Regression: Estimating one variable from another


25

Want to use measurements of one


independent variable, X , to predict

Porosity (%)
20
another dependent variable Y.
Fit a line
^ y = mx + b 15

What line is best? ?


10
3.0 3.2 3.4 3.6 3.8 4.0

Velocity (km/s)
One that minimizes the error in the
prediction. error = ^
y-y
Define the error as the sum of y
squared differences between
prediction and true value ^
y m
^2
Minimize (y-y)
b
x

Regression
The m and b that minimize the sum of squared deviations from
the line
Y = mX +b
are given by y
m=r b = Y mX
or
x
m=
xy n X Y
n x2
A measure of goodness-of-fit, R2, is given by
( y Y )
n 2
Which is algebraically equivalent to the
R = in=1 i
2

i =1 ( y i Y )
2 correlation coefficient squared! hence earlier
statement that r was a measure of how close the
data fitted to a straight line

September 2006 20
20
Advanced geostatistics in Reservoir Modeling

Regression: What does it mean


Tells how much of the variance in the dependent
variable (y) is due to variance in the independent
variable (x)
If there is perfect correlation then r= +1 or 1
(R2=1) then all of the variation in y is explained
by variation in x
If there is zero correlation, then none of the
variation in y is explained by variation in x
If 0< R2<1 then some of the variation in y is
explained by variation in x

Linear Regression: Example & Residual Plot

residual = = y - y^
Vel-to-PHI Plot Residual Plot
25 4

y = -8.38x + 46.85 3

R2 = 0.527 2
20
Residual (Data)

1
Porosity

-1
15

-2

-3

10 -4
3.0 3.2 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8
Velocity (km/s)
Velocity (km/s)

What would a residual plot look like for a bad estimator?

September 2006 21
21
Advanced geostatistics in Reservoir Modeling

Regression
Application to what generic types of variables? And some specific
examples?

Spatial (x is location) y is variable


eg depth, or distance from a well
Hard (x) - Soft (y)
- eg core permeability log porosity
Time (x) Seismic amplitude (y)

How will the histogram of the predicted points compare to that


of the data points from which the predictor was derived?

Other Linear Regression Models


Minimize error in one variable
y-on-x or x-on-y
Minimise the total error in both variables.
Major axis (MA, perpendicular regression)
Reduced major axis (RMA)
error = di error = area of triangle

y di
y*

MA RMA

x x* x x
In what circumstances would you use each?

September 2006 22
22
Advanced geostatistics in Reservoir Modeling

Comparison of Regression Models

25

RMA bisects the X-on-Y


(x, y) and Y-on-X lines
20
Porosity

15
Y-on-X

RMA
X-on-Y MA
10
3.0 3.2 3.4 3.6 3.8 4.0
Velocity (km/s)

Correlation vs Regression

r=
xy n X Y The only
difference
n x y between
correlation &
regression is the

m=
xy n X Y denominator

n x2

September 2006 23
23
Advanced geostatistics in Reservoir Modeling

Conditional Expectations
Two issues
Not only do we want to estimate a true value of one variable from
another, we want to know is variability (uncertainty)
Relationship might not be linear (especially when the variables are the
same attribute (eg porosity) but at different locations

Distribution of
possible
Permeability (md)

permeability
values at a known
porosity value

Known primary value

Porosity, %
Prediction of conditional distributions is at the heart of geostatistical
algorithms

Conditional Expectation Curve


Average value of Permeability is calculated for successive porosity
intervals

500

400

300

200

100

0 10 20 30 40 50

September 2006 24
24
Advanced geostatistics in Reservoir Modeling

Conditional Expectation Curve


A smooth curve can be fitted
by using a moving window for
the averaging
Sort the N data pairs in 500

ascending order of x
Choose a window of M 400

(10 < M < n/10)


Calc average x and y for 300
first M pairs
Move window forward 1 200
step and repeat
Plot the (N-M) paired 100
averages
Predict Perm from Por by 0
linear interpolation 0 10 20 30 40 50

September 2006 25
25

You might also like