You are on page 1of 62

GEOSTATISTIK

Syahrul
26 September 2018
Silabus kuliah

• Introduction to geostatistics Today’s


• Non-spatial statistics lecture

• Spatial statistics
• Estimation
• Simulation
Introduction to Geostatistics
CONTENT
• What is geostatistic?
• Application of spatial statistics
• Basic assumptions in spatial statistics
• Key concepts in geostatistics
• Exploratory data analysis (EDA) for non-spatial
statistics
• Spatial description
Geostatistics
• Geostatistics: branch of statistics that
deal with spatially correlated data
• Basic assumptions:
– Sample values are not independent
– Spatial dependency exists
• Goal of geostatistics:
– Spatial continuity model
– Use the model for estimation and/or
simulation of spatial data distribution
Geostatistics
• Geostatistics Term used by Hart (1952) - Application of
Statistics in a Geographic Context
• Matheron (1962, 1963) Used Term in a Geological
Context for Inferring Ore Reserves from Data Spatially
Distributed Within an Ore Body :
- Developed Theory of Regionalized Variables
– Formal Introduction of New Statistic - the
Semivariogram
– Used Kriging to Obtain Best Estimate of a Property (i.e.
Ore Grade) at Some Location in an Ore Deposit
– Built Theory on Practical Work of Krige (1951, 1960)
What is special about spatial data?
• Location of a sample  intrinsic part
of its definition
• All data sets  implicitly related by
their coordinates (models of spatial
structure)
• Data values may be related to their
coordinates  spatial trend
What is special about spatial data?

• Values at sample points can NOT be


assumed to be independent
• That is, there may be a spatial structure
to the data
– Classical statistics  independence
– Implications for sampling design
Key Concepts

• Spatial dependence: the value of a


variable at a point in space is related
to its value at nearby points
• Spatial structure: the nature of the
spatial relation
• Support of a sample: the physical
dimensions it represents
Geostatistic application
Reservoir Property Distribution Using the Available Well
Log, Core, and/or Seismic Data
Geostatistical
Analysis
Raw
Data

Selection of
Model
Appropriate
Estimation or
Stochastic Algorithm
Geostatistic application
• Quantify Uncertainty Using Multiple Geologically and
Statistically Valid Models
Individual
Reservoir
Simulation
Runs Are
Numbered

n
RESERVOIR
1 3
1 FLOW 6
2
SIMULATOR
3
5
4
4 2
5
6 OUTCOME
PROPERTY (PHI, K) (RECOVERY)
DISTRIBUTIONS
Geostatistic application
3D Static Earth Model

Well-log
Geostatistic and Earth Modelling
Limitations of geostatistics
Geostatistics does NOT :
• Create Data or Eliminate the Value of
Obtaining Additional Good Data
• Replace Sound Qualitative
Understanding and Expert Judgment
• Necessarily Save Time, At Least in the
Short Term.
• Work Well as a “Black Box”
Some useful sites
• The central information server for
Geostatistics and Spatial Statistics
http://www.ai-geostats.org/
• gstat: http://www.gstat.org
• ArcGIS Geostatistical Analyst:
http://www.esri.com/software/arcgis/
arcgisxtensions/geostatistical/
• Geostatistical analysis tutor (Colorado
School of Mines) :
http://uncert.mines.edu/tutor/
The first law of geography was put forward by Tobler,
stating that everything is interconnected with one
another, but something close has more influence than
something far away (Anselin, 1988)
Geostatistics: Prediction and Interpolation

 The process of estimating data at a location that can’t be


sampled (data missing) requires a model
 But in some studies have problems including no model, there is
only one data sample or no inferencing technique that can be
used to estimate data that can’t be sampled.
 Geostatistics plays a role in this, namely using the estimation
method while still being based on the model.
 Prediction or estimate data missing:
• Nearest Neighbour
• Inverse Distance
• Tren surface analysis
• Kriging
• Co Kriging
Variogram dan Semivariogram
 modelling data that will be calculated
Types Spasial Data
 Point Data (Point Pattern Analysis)
Indicates the location in the form of a point, for
example in the form :
 Longitude dan latitude
 x and y
 Line Data (Geostatistical Data)
 Continuous spatial surface
 Area Data (Polygons or Lattice Data)
Shows the location in the form of area, such as a
country, district, city etc.
Point Data
Line Data
Data Area
Spatial Pattern
Form of Spasial Pattern

clustered

random uniform clustered


random

uniform clustered
Non-spatial statistics

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an approach/philosophy


for data analysis that employs a variety of techniques
(mostly graphical)
Exploratory Data Analysis (EDA)
For example, multidimensional scaling is an EDA that uses
visual representations of distances or similarities between sets
of objects; It’s up to the user to interpret exactly what the
distances represent
The purpose of exploratory data analysis is to:

1. Check for missing data and other mistakes.


2. Gain maximum insight into the data set and its
underlying structure.
3. Uncover a statistic model, one which explains the
data with a minimum number of predictor variables.
4. Check assumptions associated with any model
fitting or hypothesis test.
5. Create a list of outliers or other anomalies.
6. Find parameter estimates and their associated
confidence intervals or margins of error.
7. Identify the most influential variables.
Univariate description

Measure the characteristic of data population

• Mean
• Variance/standard deviation
• Histograms
• Spread/central tendency
• Skewness
Frequency Table
values:
2, 4, 1, 5, 2, 3, 6.9, 2, 5, 7, 2.1, 3.4, 4.2, 2.2, 2.9, 1.7, 3.5, 6.2

Cumulative
Interval Frequency Frequency
1 - 1.999 2 2
2 - 2.999 6 8
3 - 3.999 3 11
4 - 4.999 2 13
5 - 5.999 1 14
6 - 6.999 2 16
7 - 8.000 1 17
Histograms
6
5

frequency
4
3
2
1

0 1 2 3 4 5 6 7 8
data value
relative frequency

0.36
0.30
0.24
0.18
0.12
0.06
0 1 2 3 4 5 6 7 8

data value
Histograms
• Shape varies with Number of Bins
• Rule Of Thumb

Number of Bins = Number of Samples


Cumulative Distributions
cumulative

cumulative frequency
distribution
1.0
0.8
0.6
0.4
0.2

0 1 2 3 4 5 6 7 8
data value

• number of samples below bin maximum


• relative frequency below bin maximum
• probability of grade below bin maximum
Central Tendency Measurements

• Arithmetic Mean = Sum of values


No of values
• Mode = Highest Probability (i.e. ‘tallest’ bin in
histogram)
• Median = 50 percentile (i.e. 50% of values
are below the median)
Spread Measurements
• How different are values from the central
value?
– Range
• Maximum - Minimum
– Variance or Standard Deviation
– Inter-Quartile Range (IQR)
• 75 percentile - 25 percentile
• 90 percentile - 10 percentile
Skewness Measurements
• How symmetrical is the distribution?
– Skewness
– Kurtosis

Skewness and kurtosis are more visible measures


for viewing data distribution graphically
Skewness Measurements
Normal Distribution
• Many biological characteristics (e.g. height,
weight) follow a symmetrical distribution with
a predictable shape
• Called a Normal (or "Gaussian") distribution
Frequency
Normal Distribution
• Defined by mean and variance
Same Mean, Same Variance,
Different Variances Different Means

• Examples:
– Grain size, porosity, permeability, etc
Normal Distribution

Frequency

mean grade
mode
median

• Shape has known equation


• Mean = Median = Mode
Normal Distribution

• Where μ = Mean, σ = Standard Deviation


• From equation can calculate proportions
within Standard Deviation(s) of Mean
Probability Plots

• Straight line if data is Normally distributed


Skewed Distribution
• Unfortunately most variables in geology
follow a skewed, non symmetrical shape
Positively Negatively
Skewed Skewed

% %

grade grade
LogNormal Distribution
• Some variables in geology have a
LogNormal Distribution
– Logarithm of values have Normal Distribution
– Sometimes its Logarithm of (value + constant)

f%
f%

grade log-grade

RAW LOG-TRANSFORMED
LogNormal Distribution

• Mean ≠ Median ≠ Mode


• Mean is NOT antilog of Mean of Logs!
– Antilog (Mean + 0.5 x Variance)
Mixed Distributions

Frequency

Grade

• More than one mode


• Could be due to
– mixed domains
– multiple phases of mineralisation
Sample Support
• The characteristics of a sample:
– sample size (core diameter, sample length)
– sampling method (diamond, reverse circ)
– assay method (fire assay)
Volume-Variance
• Variance of data set changes according to
the support of the data (i.e. the volume of
material)
• The larger the sample volume, the lower the
variance of the samples
Volume-Variance
• Variance is inversely proportional to Volume
(size)
• Blasthole samples have a higher variance
than mining blocks
• Small model blocks have a higher variance
than bigger model blocks

Samples
Mining Blocks
Model Blocks
Outliers
• Outlier values may be cut to reduce their
impact on arithmetic mean and estimation
g/t gAu %
2 20,000 1%
4 40,000 3%
3 30,000 2%
7 70,000 4%
90 900,000 58%
15 150,000 10%
10 100,000 6%
20 200,000 13%

• E.g. if Top-Cut = 20 a value > 20 is changed


to 20
• Derive Top-Cut from Probability Plot
De-Clustering
• Statistics relies on samples being random
and un-biased
• In mining, we are more interested in ore
than waste
• Usually in mining, more drillholes are
located in high-grade areas
• So sampling is inherently biased - more
samples in high grade areas
De-Clustering
• Overcome this bias with de-clustering
• Put (3D) grid over data
• Within each cell take sample closest to
centre (or average of all samples)
Use average of Use single
4 samples sample

 Continue using this single value per cell


De-Clustering
• Any geological knowledge to split into
separate domains must be used

Only decluster if no geological separation


possible
Bivariate description
• Comparing two distributions
• Scatterplot
• Correlation
• Linear regression
Non-spatial statistics
Scatterplot

Regression line

  0.7

x
Non-spatial statistics
different values of correlation coefficient

(Picture taken from Dubrule, 2003)

You might also like