Professional Documents
Culture Documents
OF
SURVEYS
WITH APPLICATIONS
PANSE
and
P.
V.
SUKHATME:
Published by:
The Indian Council of Agricultural Research,
New Delhi, India
By
GEORGE
W.
SNEDECOR:
Statistical Methods
Published by:
The Iowa State College Press, Ames, Iowa, U.S.A.
Sampling Theory
of Surveys
with Applications
by
PANDURANG V. SUKHATME
Ph.D., D.Sc.
Chief, Statistics Branch, Economics Division,
Food and Agriculture Organization of the
United Nations.. Formerly Statistical Adviser,
Indian Council of Agricultural Research, New
Delhi, India.
A.~. BANGAtORE
UN'VfASJTY UBRARY,
I 1 DEC 1971
"I
~Ull'''''.'''''''''.'''''''''.-t
To
Professor Jerzy Neyman
PREFACE
book is an outgrowth of lectures on sample surveys which
the author has delivered since 1945 at the Indian Council of
Agricultural Research, subsequently at the International School
on Censuses and Statistics in 1949-50 held at Delhi under the
auspices of the Food and Agriculture Organization of the United
Nations, at the two summer sessions conducted by the Indian
Society of Agricultural Statistics in 1950 and 1951, and finally at
the Statistical Laboratory of the Iowa State College, Ames,
Iowa, U.S.A., in the spring of 1952.
HIS
PREFACE
xi
PREFACE
treatment has become too terse. Such sections have been marked
with an asterisk to indicate that the portion can be left over from
the first reading without losing the continuity of the text.
The author has received considerable assistance in preparing
the book from his former colleagues in India. First of all he
gratefully acknowledges the encouragement and help which he
received from his former Chief, Mr. P. M. Kharegat, then
Secretary to the Ministry of Agriculture, Government of India,
to whose farsightedness are principally due the advances which
India has made in the field of sampling. He is indebted to
Messrs. V. G. Panse, G. R. Seth, K. Kishen, R. D. Narain,
0. P. Aggarwal and B. V. Sukhatme who read parts of the
manuscript and made numerous suggestions to improve the
presentation; to Messrs. K. S. Krishnan, S. H. Ayer and
K. V. R. Sastry who worked through the examples; and to
Mrs. Evans of the Statistics Branch of F AO who checked through
them and also helped in the preparation of the index to the
book; to Dr. P. N. Saxena who shouldered a particularly heavy
responsibility of reading critically the manuscript and the proofs;
and to Suzanne Brunelle and Mary Nakano for their typing and
secretarial help. The author also likes to express his thanks to
Dr. T. A. Bancroft, Dr. D. J. Thompson and other members of
the staff of the Statistical Laboratory, Iowa State College, with
whom he had the opportunity to work as visiting professor during
the spring term of 1952 and to Marshall Townsend of the Iowa
State College Press for their interest and encouragement in the
publication of the book. Last but not least the author is
indebted to Mr. Norris E. Dodd, Director-General of the FAO,
who invited the author to come to FAO to head the Statistics
Branch, which gave him the opportunity to appreciate more ful1y
the urgent need for promoting sampling for improving agricultural
statistics in under-developed countries; and to Dr. A. H. Boerma,
Director of Economics Division of the F AO, for his constant
encouragement and advice.
September 1953.
PANDURANG
V.
SUKHA TMI!.
CONTENTS
PAGE
PREFACE.
LIST OF EXAMPLES .
INDEX TO PRINCIPAL NOTATION
SPECIAL SYMBOLS
ix
XXlll
. xxvii
. xxix
CHAPTER
I.
II.
1
1
2
3
3
6
10
10
18
BASIC THEORY
A.
20
21
23
23
25
28
28
31
xiv
CONTENTS
PAGE
B.
33
34
n
38
40
42
43
47
49
51
54
2b. 1
2b.2
2b.3
2b.4
2b.5
Introduction
Sampling with Replacement: Sample
Estimate and its Variance .
Sampling with Replacement: Estimation of the Sampling Variance
Sampling without Replacement .
Sampling without Replacement- General
Case
APPENDIX: Tables of Sg (P, Q)/Xl! XI!'..
Tables of 7TI! 7T2,! gs (P, Q)
60
62
64
65
69
74
78
CONTENTS
xv
PAGE
III.
3a.l
3a.2
Introduction
Estimate of the Population Mean and
its Variance
3a.3 Choice of Sample Sizes in Different
Strata
3a.4 Size of Sample for Estimating the Mean
with a Given Variance under (a) Optimum (Neyman), and (b) Proportional
Allocations
3a.5 Comparison of Stratified with Unstratified Simple Random Sampling
3a.6* Practical Difficulties in Adopting the
Neyman Method of Allocation
3a.7 Evaluation from the Sample of the GPo in
in Precision due to Stratification .
3a.8 Use of Strata Sizes for Improving the
Precision of an Unstratified Sample.
3a.9 Effect of Increasing the Number of
Strata on the Precision of the Estimate
3a. 10 Effects of Inaccuracies in Strata Sizes
B.
83
84
86
89
91
95
100
106
108
109
3b.l
3b.2
3b.3
3b.4
3b.5
127
129
131
132
135
xvi
CONTENTS
PAGE
IV.
4a. 1
4a.2
4a.3
4a .4*
4a.5
4a.6
4a.7*
4a.8
4a.9
4a. 10
4a.ll
4a .12
4a. 13
B.
Introduction
Notation and Definition of the Ratio
Estimate
Expected Value of the Ratio Estimate.
Second Approximation to the Expected
Value of the Ratio Estimate
Variance of the Ratio Estimate.
Estimate of the Variance of the Ratio
Estimate
Second Approximation to the Variance
of the Ratio Estimate
Conditions for a Ratio Estimate to be
the Best Unbiased Linear Estimate
Confidence Limits.
Efficiency of the Ratio Estimate
Ratio Estimate in Stratified Sampling
Ratio Method for Qualitative Characters: Two Classes
Extension to k Classes
SAMPLING
WITH
VARYING
PROBABILITIES
138
138
139
144
146
150
151
154
158
160
166
174
177
OF
SELECTION
4h.l
V.
179
188
Simple Regression .
Simple Regression Estimate and its
Variance
Estimation of the Variance of Simple
Regression Estimate
193
194
198
CONTENTS
xvii
PAGE
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11 *
VI.
201
204
208
213
220
222
223
231
EQUAL CLUSTERS
60.1
60.2
60.3
60.4
60.5
60.6
Cluster Sampling
Efficiency of Cluster Sampling
Efficiency of Cluster Sampling in Terms
of Intra-Class Correlation .
Estimation from the Sample of the Efficiency of Cluster Sampling
Relationship between the Variance of
the Mean of a Single Cluster and its
Size
Optimum Unit of Sampling and Multipurpose Surveys
238
239
243
250
252
257
B. UNEQUAL CLUSTERS
6b.l
6b.2
265
. 268
CONTENTS
"viii
PAGE
6b.3
6b.4
270
272
Introduction
7.2
7.3
7.4
7.5
7.6
7.7
7. 8
7.9
7.10
7.11
285
285
287
290
293
302
303
305
309
311
tion Mean
315
CONTENTS
xix
PAGE
7.12
7.13
7.14
7.15
7. 16
7.17
VIII.
318
329
335
339
349
352
8.1
8.2
358
358
362
363
366
369
373
376
379
385
389
399
xx
CONTENTS
PAGE
8.13
8.14*
404
410
Introduction
The Sample Mean and its Variance
417
419
9.3
420
423
9.4
9.5
9.6
9.7
9.8
425
431
433
438
X. NON-SAMPLING ERRORS
A.
OBSERVATIONAL ERRORS
lOa. 1
10a.2
IOa.3
10a.4
10a.5
Introduction
Mathematical Model for the Measurement of Observational Errors .
The Sample Mean and its Variance
Estimation of the Different Components
of Variance .
The Mean and Variance of a Stratified
Sample in which Enumerators are
Assigned the Units in their Respective
Strata
444
445
446
454
460
CONTENTS
xxi
PAGE
10a.6
10a.7
10a.8
10a.9
B.
464
467
468
473
INCOMPLETE SAMPLES
lOb.l
IOh.2
INDEX
The Problem
The Solution of the Problem of Incomplete Samples
478
478
487
00
00
c-
.5
g.
V"l
V"l
\0-N
\0
00
xxiv
LIST OF EXAMPLES
'2
~
........
,J::,
;::s
tI)
.....
on
on
.....
N
.....
.*
\0
00
00
00
.....
on
a-
t'-
LIST OF EXAMPLES
xxv
-:
0\
-.:to
....o
PAGE
a
ai
Ii
f3
B
cx,
Cy
ep
E
ij
y
h
h2
k
m
mo
mi
mI , m 2 .
m',M'.
M
Mi.Mo,M
iL
iLl> 1L2
iL4
iLIO,
iLo], iLao.
nt, n'1.
ni
n'
nij' lI.j' ni.
ii
39, 193
24,445
448
224
143, 193
330, 336, 457
142-43
33
22,457
446
193
445,460
478
49,417
285,445
315
315,358
197
453
239,447
265
448
29
36
144--45
4,239,447
42,478
49
96,117,363
446
105,447
4,239,445
Nl , N 2
N'
Ni
P
q
Pn,qn
Pi
Pi
p t]..
Q
Q, Qi
Q'
r
ri, tN' tn
rij, 'nm
R
Ri
Rs
Rn.RN
R nt , R nt , RNt
Rn
P
P
Po.
pl)W
S2
Si
Syx'
sw '1. .
-
Sb
Sw
478
224
49, 84
42,447
42
44
49
61,268
127
199,208
112, 113
224
212
138, 139, 179
405--406
317, 372, 405
405
317, 369, 373
139, 179
166-67
407
50, 142, 244
342,436
422
424
21, 250
101,291
150
105
250,292
105,250, 290,
436
xxviii
PAGE
SwX2, Swy 2
si
sii .
sbri
2
swc
se2
2
Seo
Su 2
52
S2
t
Sb2 .
SlJ'2, Sb"2
8w .
8W2
8.t2' Sp 2
5.tZ2
Sc2
SWT
2
Swc
SX2.
542
Se2
u2
O'x,CJ y
CJ2
l
CJb2
- 2
CJw
0'
0' 2
O'iz2
2
O'bz
c7wz
Ui
Vi
Wi
206. 212
64
129
269
431
454
454
455
20, 240
85
240
267,268
92
241, 289, 308
308
360
420
423
429
140,450
450
446
246
198
194,270
115,246
246,408
223
64
128
269
131
317,372
179
205
W
W'
Wi"
Xi
Xw .
X
y, .vi, YN'
Yn
Yij
Yn,. YN
Yi.
YN.,Yn.
Y..
Y.j
Yn.',Yn. "
j
,Vw .
yz
Yds'
y'ds
Yim, Ynm
Yi(m i ), Yn(m
Ym., Yi(Mi)'
Ys, Ys', ys"
Yijk, Yij., Yi..,
Yij(p), " .
YR
j ).
Y
YR
Y R,
YRr.
Yz
Zij'
zn
Zi"
305-306
317,369
138
139
166
167
193
204
Ywl
Zi'
315-17
Zoo
zn.
Zw
zs, zn(m,)
62
128,268
269
128,390
359
SPECIAL SYMBOLS
c.v.
Coefficient of variation
Cov
Covariance
c .
Cost function
Coefficients of cost functions
M.S.E.
S.E.
proportional
CHAPTER I
N-2
. 'N-l
N- r + 1
N -;+2
+ I).
The
unit, it follows that every one of the units in the population has
the same chance of being included in the sample under the
procedure of simple random sampling. This, in fact, has sometimes been used as the definition of simple random sampling.
However, this definition does not completely specify the procedure
of simple random sampling, for, as will be clear in Chapter IX,
there can be other procedures of sampling which do not give the
same chance of selection at the first draw to each unit of the
population and yet the probability that any specified unit is
included in the sample is niN.
The method of simple random sampling is also equivalent to
giving an equal probability to each possible cluster of n units
to form the sample of the population. The possible clusters of
n are
(~)
Random sampling implies that everyone of these possible clusters
will have an equal probability, namely,
1
(~)
of being selected as the sample. Thus, if the population consists
of 4 farms serially numbered 1, 2, 3 and 4, having 2, 3, 4 and
7 acres under corn respectively, then the possible clusters of 2 farms
from this population will be the following six:
Serial Nwnber of
Cluster
1,2
2,3
1,3
2,4
1,4
2, 7
2,3
3,4
2,4
3,7
3,4
4, 7
fir N -
1 ... N-=n-
+1
Since the order in which the units are selected is immaterial, the
probability of any given n units to form the sample is thus
given by
n! --------- or -1- - - - - - -..
N.(N - 1) .. . (N - n
+ 1)
(~)
of random numbers, and (c) taking for the sample the n units
whose numbers correspond to those drawn from the table of
random numbers. The following examples will illustrate the
procedure:
Example 1.1
Select a sample of 34 villages from a list of 338 villages.
Using the three-figure numbers given in columns 1 to 3, 4 to 6,
etc., of the table given in the Appendix and rejecting numbers
greater than 338 (and also the number 000), we have for the
sample:
I
206,
338,
165,
331,
223.
326,
114,
131,
218,
Example 1.2
Example 1.3
Nine villages in a certain administrative area contain 793, 170,
970, 657, 1721, 1603, 864, 383 and 826 fields respectively. Make a
random selection of 6 fields, using the method of random sampling.
The total number of fields in all the 9 villages is 7987. The
first step in the selection of a random sample of fields is to
have these serially numbered from 1 to 7987, by taking successive
cumulative totals:
Example 1.4
10
be divisible into units which are distinct so that every unit area
of the population belongs to one and only one sampling unit is
thus not fulfilled, with the consequence that the central areas get
a relatively greater chance of selection than those near the
border.
1.7 Non-Random Methods of Sampling
Methods of sampling, which are not based on laws of chance
but in which units of the population to be included in the sample
are determined by the personal judgment of the enumerator, are
called purposive or non~random methods. An example of this
method, where personal judgment is introduced in the selection
of a sample, is provided by the old official method in India of
selecting fields for sample~harvesting for determining the average
yield of a crop. Under this method, the experimenter was required
to select fields which, in his judgment, had an average crop. It
was found that the experimenter tended to select fields which were
poorer than the average when the season was good and better
than the average when the season was bad. The result was a
tendency to over~estimate yields in bad years and to under~estimate
them in good years. The quota method of sampling, so exten~
sively used in the United States of America in opinion surveys,
is another example of this method. Here quotas are set up for
the different categories of the population to be included in the
sample and the selection of units from each ~tegory is left to the
personal discretion of the enumerator. The method is convenient
to use in practice. Its cost is also low relative to that of the
method of probability sampling. However, the sample does not
provide any means of judging the reliability of the estimates based
thereon. If we want to have unbiased estimates of the population
character whose accuracy can be measured from the samples
themselves, probability sampling alone shQuld be used.
11
under (a) and (b), are usually grouped under the heading "nonsampling errors". Deming (1944, 1950) has listed the different
sources of errors and biases arising from (a). These, in his words,
are principally due to arbitrariness in definition and variable
performance of the man. An eye-estimate of the crop provides
an example of this source of errors. Eye-estimate is a form of
measurement which cannot, in the very nature of things, give
a unique result even when the same field is observed at different
times by the same enumerator. The result will depend upon the
personal judgment of the enumerator, no matter how well he is
trained and consequently there will be variation from enumerator
to enumerator observing the same field and in repeated observations by the same enumerator. A character like damage to a crop
in the field from rust will similarly involve a certain amount of
arbitrariness in definition and, therefore, give variable response.
Even with factual characters like the area under the crop in a field,
there is found to be marked variation in performance of the same
enumerator measuring the acreage at different times or of different
enumerators measuring the same field. Tn an inspection carried
out by the statistical staff to test the reliability of the area records
maintained by the patwaris (village officials) in 61 villages selected
at random in the Lucknow District (India), about 20% of the
reports were found to be in disagreement. Part of the discrepancies could, of course, be explained by carelessness or even
dishonesty but most of them were due to differing descriptions
of the same situation given by different agencies (Sukhatme and
Kishen, 1951).
An example of faulty canvassing of a selected sample is reported
by Kiser (1934) who selected a random sample of households for
studying morbidity. The relative frequency distribution of the size
of households included in the sample and as revealed by the Census
is given in Table 1.1, which shows that the sample is considerably
deficient in the frequency of households of size 2. Kiser attributed
the deficiency to the failure on the part of the enumerators to
re-visit missed households in which childless married women
working away from homes are likely to predominate.
A similar bias attributed to the poor execution by the field force
of the selected sample arose in a survey for estimating the yield
12
1.1
Relative Frequency Distribution of the Size
of the Households
TABLE
Percentage Frequency
in Sample
.
Percentage Frequency
in Census
194
268
259
265
2~'5
219
154
130
81
59
35
32
1'9
1'4
9 and over
22
I)
Size of Household
13
14
lot to lot. The results show that there is a consistent overestimation in the method of sampling individual fibres. The results
of sampling by method (b), however, show that in 12 out of 24
lots, the sample estimate is larger than the corresponding population value, in 11 cases it is lower, and in the one remaining lot
the two are equal, thus showing absence of bias in the method
of sampling by bunches. It is clear that some conscious or
unconscious tendency to select longer fibres of wool i!> introduced
when method (0) of sampling is adopted. The procedure of
random sampling implies identification of each one of the fibres
in the population with the serial numbers 1 to N, and then
selecting a sample of fibres with the help of random numbers.
This, however, is an impracticable procedure to adopt in the
sampling of wool. A practicable procedure is the one that was
actually followed of spreading the lot in a thin layer across a
velvet and selecting fibres from random positions with the help
of a scale placed across it. The method, however, gives scope
to the observer to select one fibre out of the several possible
fibres in the neighbourhood of the random position. This scope,
and with it the bias, is reduced when sampling is done by bunches.
Example 1.6
lS
00
IC
r:-
'-'
V)
"?
V)
.. I
;::-
::
00
.....
0\
,-.
...,
'"
IC
N
'-'
IC
0\
"?
0\
,-.
...
-'"
,-.
N
r:..
..... IC
.;.., "?
,-.
C!
._,
00
.~
.'~
:::1'-'
~~
~'-'
5 ~
~~
e'"'
E
~
~:
<u
bo
~;
'"
..;,.
;..
a-
N
IC
IC
...,
00
r:-
,-.
00
V)
!::!.
.....
.....
6'
!::!.
,-.
0\
";'I
'"
on
::
...:..
...,
':'
on
V)
.....
0\
"'r
.....
':'
00
.;..,
':'
V)
~
.....
.;..,
0\
V)
V)
.;..,
0\
.....
.....
IC
:e:>
r.i:
on
--
~
..;,. ':'
~
._,
V)
V)
V)
IC
V)
,-.
on
r:- r:-
<;>
V)
..;,.
::::.
IC
IC
.-.
..
._,
.;...
00
.....
,-.
...,
'-'
.....
':'
on
V)
IC
V)
V)
0\
.;..,
V)
'"
"'r
on
...,.....
.;..,
...,
r:-
V)
IC
V)
00
on
V)
on
";'I
IC
"'r
IC
'-'
IC
r:-
V)
'"
on
";'I
.....
"?"
IC
V)
"?
~
IC
.....
00
~
on
..;,. ..;,.
V)
.....
00
'"
.....
"':"
V)
.....
~
on
IC
IC
on
IC
':'
...,
':'
IC
"?
'-'
IC
V)
,-.
N
'-'
V)
r::::::.
':'
";'I
r:-
..;,. ..;,.
~
0\
00
'-'
- -N
'"
V)
- -
.....
<::)
~~
IU
,-.
...,
'-'
"?
N
'-'
.tIj
.5
.....
':'
V)
,-.
"'r
<;>
N
'-'
on
~
0\
V)
on
r:..
0\
N
,-.
'"
00
V)
~~
~ ~ ;:
~
N
'"
<u~
";'I
::::.
...,
IC
V)
N
0\
00
.;..,
~
IC
..... ~
IC
.;..,
.;..,
'"
::
0\
~
IC
'" '"
V)
~
V)
";'I
IC
..
'"
'0
..J
S
l
:3
Q,
is.
Po.
,-..
:$
~
<:)
.~
" 8.
-a
e
~
i.
'0
..J
u
.
'iii
"3
'+l
S -ae go ~
8.'"
l ~ Po. l ron~ Po.0
~
Q,
16
1.3
Average Yield of Paddy in Lb./Acre for Plots (~r D(fj'erellt Sizes
Paddy Survey in Krishna District (Madras)
TABLE
------------------.----~-.-----
Area in
Sq.ft.
Whole field
50 x 20 (links)2
3' circle
2' circle
-_
,-_
- .__ ... _-
19392
Pe-Tcentage
OverEstimation
107'3
43560
108
19541
105'0
0'8
2827
216
2025 9
1258
45
1257
216
21132
1291
90
--
It is seen that while the yield estimate from the official plot size of
50 links X 20 links is in close agreement with that from harvesting the
whole field, those from small plots are considerable over-estimates.
The instructions for locating the starting point and for marking
of plot were as objective as they possibly could be. Nevertheless,
anyone who has had experience of measuring the length of a field
17
and walking from a given point in the field along the direction
of its length will agree that the starting point of the plot and the
direction along which it is to be laid in the field could at best
be determined only approximately. Even if the same ohserver
were to locate and mark the plot determined by a given pair of
random numbers at different times in the same field, the plots
may occupy different positions. The inclusion or exclusion of
particular plants on the border of the plot in demarcating it will
similarly depend upon the judgment of the experimenter. The
area actually cut may also vary from the one intended to be cut
due to unevenly sown crops and errors in measurement. If all
these deviations could be ascribed to a random element, one would
expect the errors to cancel out. The results given in Table I .3,
however, indicate that this is not the case. They show that small
plots significantly over-estimate the yield, although the degree of
over-estimation becomes smaller with larger plots. It is obvious
that the overall influence of the various non-sampling errors
relative to the produce harvested becomes smaller with the increase
in plot size until, when the plot size is large enough, such as is
used in official crop-sampling work, the bias becomes nt'gligible.
The above examples will show the need for exercising care in preparing the design of a survey so that, as far as possible, biases are
absent. Where it is not practicable to ensure absence of bias,
one should at least satisfy oneself that the bias present, if any, in the
sample estimate is so small as to be negligible in comparison with
its standard error.
REFERENCES
Random Sampling Numbers, Tracts for Computers,
XV, Cambridge University Press.
2. I.C.A.R., New Delhi (1951) Sumple Suney.\' for the Estimatiofl of Yield of Food
Crops (1944-49), Bulletin No. 72.
3. Deming, W. E. (1944) ., "On Errors in Surveys," Amer. Sociol. Rev., 9,
I.
Tippett, L. H. C. (1927) .,
359--69 .
4. ---(1950)
5.
Sukhatme, P. V. and
Kishen, K. (1951)
6.
Kiser, C. V. (1934)
.,
..
7. Sukhatme, P. V. (1947) .,
18
.........
....
....
OOOt"l"')-O
"'''''''''<I'N
00" ....
0\'"
""""0(')O
00_0'\001./1
rf'it'--I"..'!t..... t"'}
-OIXlC"lr""'l
f"')'\C-N"I1"
~tf")N&n1r\
~("PI"I-~
-V)"'NO
-"00"'"
-~-.::rf""'lO"
O\OO("f"i-OO
V>NN""''''!:t
""0000"'0
\O-NNCO
_t""'>O\ON
00
N
r--'-ONOI
"'010\0\00
0\ ......... ..,.'"
\0000_0
-O..,.",N
ON""O\N
00\0("\100-
O\NNOO'"
0"'00\ .....
'"
\Qr-t'--Nf"oo.
-00000,.....
0\ \0
r-.I.I")O-~
f'''d-f'-"'\O
0---'"
",r----t'--r---.
0\
.....
N
r"('f")t"'j
'"
NOO--O
0000"'0",
'<I'
N
V')-NMO'\
",ON-O
OOMV)"I1't--
OOOOOO-M
I,O-r-Ol'-
OO-_V'lO"l
~-Mt--OO
-NO\'O"'tt
\OOONt---r---.
\Ooot---O\('f')
'<1''''''' ..... 0
O\trlOClr-OO
"'0\-"''<1'
Oo.,r-NN
OO-~Of"')
O'\OONMV)
'<I''''V)M''''
\D 00 .............. \0
OOOO"'MO
'<1'00"'0 ....
tI")"III:tet:t""Il.I)
00
('I V)('.IV)
N""O"''''
('lC'I\Olor.O
-.:t
V)
..... -'<1'0'<1'
.... "''<1''''0
I,(')-'"e::t"~~
..,
!,""
'"
0\
00 0\
-c
00
\Or--r-r-~
r--OO-""d"N
00-_0\0\
_N\O\Otn
00"'0'''''''
~ ~
"
'"
f"'i"l:t"'(tt-v
"''<I'NOOOl
"'01 r-
V-f'-V>O
Or""lMOCN
'<I''<I'''OCN
t-ot-oo_
M~OOOOr-
-o\\C)NM
11',
tf")""'" tr, r-
NOO-r-\D
"'N'<I'''''''
0\"'0""00
--00-0
:::<u
t: ~
~
~
t-
00
r-
rf"!f"'-I 0
01
'<I'
c-oo_o
-VN_N
NN_r"'-("I'i
OOr--M~_
r-t('fj 00 000
....
'<I'r-OIN'"
N"'OIr-O
'<1'001 '<I'M
0011)'<1'0\
00 V')(X) 0\ N
OG(",IN\,Q\C
O'q"10r--OO
"'''0\000
\I')
N-lI') 0'\
V',('f'ff"\~0\
""'<I'NO'<l'
",,00 0'1'<1' N
O\V)~r--lI'l
('t"')
a.- O\lI"'.
-\00'<1'01
NII")--\C
VOO1r\OOOO
OOOOf""'lO .....
..... 0""00""
0-000"' ....
V>-M 0'1 00
Nt:'IMc\OI
"""'-t""lNII"'J
OO-Nt--"'I:t
..... '<1''''''''0
....,""'<1'''''<1'
0\00'<1'''0\
-c
~
0\
00
Vl-NO\O'.
-OOt--t"-t"-
OO-.:tr-----.::t
NooONN
00 0'\ ('C"'\N-
CXlO\t'-r--N
.....NO\'<I'O
00'<1'..,.\0.,.,
"'NNOO
.,.,
N'<tN 0 '<I'
O\~--CO
"'-NO ....
'<I"-'<t 1'1
'<I'
'<1''''1''10\1'1
""00\0\t-
0\.0\0""'"
.... 000\0'"
'"
t-~Nt--V\
.....
\Q
_("f'\_~~
N'<I'NOOOO
",-t"f")
l(").-tf'f"';\oN
0\"'0'<1'00
\0 """'It'\\O \0
11"\"""
M'O
r-
_. 11\ (."H... \0
O\"'ONO
I"-N\Q\Q...,.
V)'<t...,.'<t\Q
'0,.... """.,..,
_'Nt'f'\~1,()
\Qr--OOO\;:
.......................
",1"-000\0
.... - _ _ N
NNNNN
-Ntf'\'d"V")
~N~~V")
19
,""
-0-0_
ON- _ _
OOOC:
,...11""1 0\
O<:7''''N\O
_ N r---.-.:t 00
;:;::
r-0000'<1" .....
"''''0'<1"'''
oo-cx.O\-
-0-.::1''''-
'"0\
N'<1" 00 00 0\
N\O
r-",
tnN_OOO\
\OOOt"')NO
"''<1"''''<1"\0
<'I
r-'<1" 0\ 000
"''<1"\0'''0-
-'<1"0-0\
O\I.I')~r-..N
oor-ooo\O
00
N
<'1'<1"00"'11')
MM_OOr-
r-"'0'<1"0\
r-ooO\o-
OOMO_r-
rN
MOOQO-t"--
\00-0000
\Ot-O\r-M
Nr-r-N\O
00"'\0'<1"0\
\0
N
-r-0\'<1"
NO\\O'<1"1I')
ff"\1.r}'<.00\-.;t
II")
'"
oolt')-r---\O
NVMt"i"'d"
'<1"00'<1"'<1"'<1"
OOOf"1rt'")OO
t"iV-O\"'d"
'<1"
N
-r--r-oo
""'<1"00\'<1"
-0-r-r-'<1"
'<1"
r- ..... II')
-'
r-' 0\ 00
tnt.r)r-Mt---
O-NO",N\O
0...,,,,<'1\0
iX)0\ 0\
""N
-OOQOMO\
\o\oN'<1"N
r-t"iM-r-
V)("'f-O\_
\,CooO\r----
'<1"00-<'1
or-ooor-
N\OQOOOr---
0\ t-t"i 1.0\0
0\ 00 00 r-r-
0\ N NO .....
<'I "''''0 v
\OO-<'IOr-
Nt"it"i":::tN
V)-~\OO
O\OMt"if""l
""OO\\Or-
OOV)_M\C)
f""'loN\O""'_
'V 0\
00\0"''''
VO\t-<'IO
O'\t"I")l"I'il,f')f'I"')
0-
--
r-
.....
t-,
'"ci
II")
00
r\0
0-
f'H..' f"I")
IoCO"'<'IO
\OvO- oo
r-oo\OO-
O-q~'ll:tr---
0\----
"d"t"iMt---f'f"i
",0'<1"\0_
1"'\\00\0'"
r-oo_OOV)
\ON\O-f""')
O<'lr-O'"
O\OV\OI"'\
'<1"\01"'\ <'IV
'"
\01"'\"'1'"10
"'Or-\Ooo
",r-","'""
MM<'I"'O
NO\r-OO_
NooMII')O\
tnf"l"'l-r-f"oo
t"iO\V".-N
"'0 v "'<'I
"'-l
1"'\
r-I"'\r-lCv
\0\0\0-\0
O\OMO\ar'\
("f"'Ht')
0 \0
NOOOMf'-
r-ooOooO
oo-OOON
V",QOM\OV
O\-trl"'tOO
_\0 O\t"i 0
oo"'V\Ooo
OOMO\r---r-
r-r--t""-V')
V)f'..HI')NOO
0'<1""'-11')
r-O\-II""IQO
V\OMOoo
OM r-'<t '"
""II:t0\
0\
-r-'<1"vO'l
l'f")V"IO\\Ot--
r-MOr-<'I
<'IO'<1""'M
v'<1"ooO\M
oe
<'I"'M'<1"<'I
'<1"O\\o\oM
"":2"-OOf"l'\O\
V)
r-M 1.0_
r-IOII""IOCN
r-
0\,--1roC
\0 N "'00 ....
M <'I'" 000
oo\O'<1"r-O
<'I1C1I')",r-
\0
",000\0\00
....q"'-IO""r-
"'-MO\-
"'''''''M
-r-Noor-
'"
0000-, r- r-
-O"'Ct"""f"I"'I
00\0 '<1"'<1" r-
~
....
~
~
<
---
<'I
'<1"
lIi
V)ooV')"
00
r-tn 00
\O-"'Ct--
\O("f")-f'f")-
....
OO~f"')O-
O\-Noo-
r-oo 0'<1" 00
Nr-r-",\O
tn-rf')MN
0"'''' ........
OOo\Nr--- .....
0\ .... r- .... r-
00\0\000'"
-0\00\-
NO 0 r- ....
\0 00 0\0\ N
\0 .... 0'10'1'"
\0 \0 N\O '<1"
\0 r-oo 0\0
NNNN ....
Mff')Mt'\tf')
-Nf"I"'I~1I")
~~~~i
.~~;~
\Or-ooO\O
'<t'<1"'<1"'<1"",
CHAPTER II
BASIC THEORY
A. SIMPLE RANDOM SAMPLING
denote
Yi
)IN
1: Y.
i=1
(1)
1: ),,2 -
NYN 2
i=1
N-=l
(2)
21
BASIC THEORY
V(y)
.E (h
- YN)2
i=l
V(y)
N
N - I S2
N
(3)
(4)
Yn
Py.
= .....n
(5)
and
the sample mean square given by
n
S2 =
'"
~
(v
'", )2
. i - /"
n-I
n
1: y/ - n)"\2
n-J
(6)
22
(7)
00
=~
h =
We write
jill
(9)
S2
(10)
and
Est. S2
2.1
TABLE
Values
of Units
in the
Sample
2,3
25
05
-1'5
225
-25/6
625/36
2,4
3'0
2'0
-1'0
100
-16/6
256/36
2,7
45
12'5
0'5
025
47/6
2209/36
3,4
35
05
-05
025
-25/6
625/36
3, 7
50
80
1'0
}OO
20/6
400/36
4. 7
55
45
15
225
Total
24'0
280
7'0
4116/36
Mean
40
4,
Ii
1905
y"
Y,,-YN (}.-YN)I
S2-S
1/6
(SI-S2)2
1/36
BASIC THEORY
23
Markoff's Theorem
2a.3*
i:
(Yi -
1=1
yN )2 is minimum,
We write
E(Y.)
~ E {:,
t y,}
(11)
where Yi stands for the value of the i-th unit of the population,
and the summation is taken over all the n units in the sample.
Numbering the units in the sample serially, as 1,2, ... , r, ... , n,
we may write (11) as
(12)
where Yr' now stands for the value of the unit included in the
sample at the r-th draw.
By a well-known theorem in probability, the expected value of
a sum is the sum of the expected values. We, therefore, write
This section may be Skipped over at the first readinB witt-out losina continuit,
of the tellt,
24
Now, by definition,
N
.E Pi,Yi
E (y,') =
(14)
i=t
E (y,') =
\'v
;=1
(r ~ I, 2, ... , 11)
(15)
J'N
( 16)
where
a,
-co
if Yi is in the sample,
otherwise.
and
ai =
Now, clearly,
E (a;)
n
N
(19)
BASIC THEORY
2S
E(y/2)
= ;
L:
(20)
y/
i=l
Adding and subtracting jiN~ from th~ right-hand sid! and using
(2), we cm write (20) in an alternative form as;
E (y/2)
= y/
+ (1
- iv) S2
(21)
It follows that
E
{I w~ }'_.2} ,,~ fI ~
E
11
J'
ll1 ~ _r
'2J{
(22)
Again, if yr' and Ys' are used to denote the values of the units
drawn at the roth and s-th draws respectively, say Yi and Yj, we
have, by definition,
(23)
I
N
(24)
26
Hence, substituting for Pi, and P j ,,, from (24) and (25) in (23),
we get
N
N (~ ..:_ 1)
E (Y,'Y,') =
It follows that
n (n
I:
Y'YJ
(26)
l#i=1
~ 1) E {t y<y!} ,. .~
n (n '- 1) E {
.#J
t y,'y.,}
,#.=1
(~ -
I)
Yi)'J
(27)
<#1=1
{t
n (/:"_-1) E
YIYJ} = n (/_ 1)
1#1
E (a,ai) YiY!]
(28)
i#i=1
[t
'#1
E (a,) E (al
I a,)
(29)
n - 1
N _ 1
(30)
II
(n - 1)
= Iv (,v-=-l)
(31)
Ii (n
~1)
{t
'-i
YiYJ}
fir(~-= 1)
Y'YI
<-/-l
(32)
21
BASIC THEORY
- N<N-I)
(33)
or
8~
N
E (y'rY. ') -- YN
- 2 -
(34)
E(y.') ~ E{~ t Y}
=
\ E {
,!2 [ n
y,2
+ i' YIY;}
i~j
:2 [nYN2 + ( n - Z)
82
+ n (n
- 1) h
1)
- N S2
{n ~l
nl:-t {E t
i;
y/ -
~} ]
Ii 1) S2]
(35)
Hence
E (S2)
_ n (n
- 2 + (I
= YN
n
ny,,2)}
y,2 - nE (j,,2)
28
=
showing that
2a.6
S2
S2
(36)
Then,
nAsIC THEORY
V(Y n )
= E [Lv" = E (y,,2)
29
(5'"Wl
(E (y"W
(37)
G- ~ )
(38)
S2
= f!_ -
n S2
(39)
The reader may verify that the value of the sampling variance
derived from this formula is the value actually obtained in
Table 2.1 for a sample of size 2.
The factor (N - n)/ N in (39) is a correction for the finite size
of the population and is called the finite population correction
factor or simply the finite multiplier. When n is small as compared
with N, the multiplier will approach unity and the sampling
variance of the mean will approximate to that for the mean of
a sample drawn from an infinite popUlation.
Usually, the value of 8 2 will not be known. Its estimate from
the sample will, therefore, have to be used in calculating the
sampling variance. Thus
_
Est. V (y,,)
N - n
N - .
S2
it.
(40)
Est. S.E.
(y,,) = ,lN~~n . In
(41)
30
and
E (y - 1-'1)2 = 1-'2
We may write
(ji. - ILl)
= (ji"
- YN)
+ (jiN -
1-'1)
+ (YN -
ILl)2
+ 2 (y" -
{(y" -
+ E (YN - ILl)2
+ 2E (YN - ILl) E {(y" - h) IYJ'
... , YN}
or
E{V(ji,,)
1)'1. "YN}=(!
k)
IL2
It follows that
V {(Y.) 1Yl' ... , YN}
G- ~ ) f4z
31
BASIC THEORY
of 3 respectively.
(iii) In general, (p/" Pan, .. Ph TTh) is a p-part partition of the
number w, where Pi is repeated 7I'i times (i = I, 2, ... , h),
A
P = ,E 7I'i
'=1
and w = ,E Pi7l'i
'-=1
EYi
will be
l-l
denoted by g (3).
3' g (J3).
I"pJ-l
,E
,,,p,,,,.,t=l
Yi.vi Yk by
will be denoted by
(42)
32
(3)
= J; Yi 3
i=l
s (13)
{s (I)}3
{:t Yi
In general,
s (Pl n, P2"' ... l'~nh)
}"h
= {.?:n y/' }", {n
.?: y/,' }n, . . . {n
.J; y/,~
1-1
1-1
1=1
-f
log (I - Yi a ) ==
Slet
+ Sa a2 + S8 a3 + ...
I a I < maxy!
-_1... giving us
g (PI'" Pa"' ... )
= J; g, (P,
(43)
and
(44)
33
BASIC THEORY
= g (4)
(22)
(321) = g (6)
+ 2g (22)
+ g (51) + g (42)+ 2g (3 2 ) + g (321)
i.e.,
3
n(n - l ) ... (n - p + 1)
= -N(N-I)
--... (N-p+l)
(46)
34
Proof:
E {g (Pi", P2", ,Ph"I,)}
a = 1,
and
a
= 0 otherwise.
Hence
But
E (a)
=
_
1 Probability (a = 1)
n (n -
+ 1)
1) .. . (n - p
- N(N= 1) .. . (N -
+ 1)
ep
Hence
E {g (Pi"' p~"'" h"h)}
ep
.2
By definition,
V (S2)
= E {(SZ)2}
- {E (S2)}Z
= E {(SZ)2} - S'
(48)
3S
BASIC THEORY
Now
(49)
= {g (4) + 2g (22)}
~ {g
(4)
+ 2g (31) + 2g (22)
+ ii22
(n 2 - 2n
(n - 1)2
~--n2- e1G
+ 3) e2G (22)
4 (n - 1)
(4) - ---e2G (31)
nz-
n42 (n -
3) eaG (212)
+ 2!
e,G (1 ')
n
(50)
+ iJ~_;1)
n
eaS.
(51)
36
NP.4, S2
NP.2 and
(52)
It can be verified that the value 1905 for the sampling variance
= 0 for
Nie J
<j
and
n (n - 1) .. . (n - j
+ 1)
Hence
(53)
where
(S2)
= n-.::- i s'
(55)
and
(56)
37
BASIC THEORY
= I-". 4
I-"~
+ n (n 2-
I)
1-"2 2
where 1'-2 and 1-"4 denote the second and the fourth moments of the
population. However, sometimes, we also need to know the
behaviour of s, the standard deviation. This is obtained as follows:
Let
where
E ()
and
E ('2) == V (.1'2)
We may w.ite
s
= (S2 + (')!
s( 1 + ;2)*
. ~2
S{I + t
+ :t ~ ~ t) . (s~
r +... }
= S{I
(S2))
8IV-S',
(57)
38
v (s) = E {s = E (S2)
=
82
E (S)}2
- {E (s)}2
82 {1 -
i . V.(S.2).}2
-8 c
= 82 {I - 1 + ~{~~:)
~
V (S2)
-48 2
... }
(58)
39
BASIC THEORY
,Yn
-
. /fIfNn
-n s
-,
-YN ~V
(59)
I )'n
--
YN I ~
.JN::n~n_
(60)
Yn
t(a.
00)
:_ n S
VJNNn
~ -
~ -
+ t(a.oo) VIN-NIl-n
In
S (61)
Sample
No.
Confidence Limits
(I - '" = '68)
-----~.-~~---
Based on S
(I)
3
4
5
~
Confidence Limits
(I - '" = '95)
(3)
(2)
1'4,
19,
3'4,
2'4,
3'9,
4'4,
Based on s
36
4'1
56
46
6'1
66
1'8,
1'7,
1'2,
2'8,
2'4,
35.
Based on S
(4)
3'2
4'3
78
42
76
7'S
0'3,
0'8,
23,
13,
2'8,
3'3,
47
5'2
6'7
5'7
72
77
Based on s
(5)
- 2'0, 70
- 60, 12'0
-180, 27'0
- 1'0, 80
-13'0, 23'0
- 8'0, 19'0
40
Jjn -=-.,y_l
IN.--n
Nn
<: t
(62)
"(a,,,-I)
Y .. -
t(a . _1)
VIN-n
- Nn
<"5 ""-
_-
YN "". ),,,
tea, ._1)
V/N-n
Nfl S
(63)
For the six samples in Section 1.5 and for the same confidenct:
coefficients as given above, these cO:1fidence limits bas~d on 52 are
given in cols. 3 and 5 of Table 2.2. The values of 1(,32,1) and
f('05, 1) have been interpolated from the t-table, being 1 85 and 127
respectively (Fisher and Yates, 1938).
BASIC THEORY
41
with which one wants to make sure that the estimate is within the
permissible margin of error. Thus, if the error permissible in the
estimate of the population value of the mean is, say, Y n and
the degree of assuranc:! desired is 1 - a, then clearly we need to
know the size of the sample so that
(64)
11=
(65)
had I(a.
upon n.
(66)
42
Evidently, Np will be the number of sampling units in the population belonging to class 1, Nq the number of sampling units in class 2,
and Np + Nq = N. Now, clearly, the probability P (n 1 ) that in
a sample of n selected out of N by the method of simple random
sampling, nl will occur in class 1 and nil in class 2 will be given by
P(nl)
{N~qnl
. N N_\;;!_ 1 ...
~ :2++Tl}
BASIC THEORY
43
(68)
or
(69)
CJ
pft' (1 - p)"'
(n - 1)! (N - n)!
(N -I)!
44
= np \ ' --
W
.,=1
(NI)
n -'I
Now
1 -
represents the probability that in a sample of n - 1, n1 fall in class 1 and n 2 will fall in class 2. Consequently
W
.,=1
(NPnl -- II____)(Nq)
n
(Nn --I)
1
2
will
= 1
Hence
E (nI )
= up
(70)
=P
(71)
ni = Pn
n
(72)
Similarly,
Est. q
= ~2 =
q.
(73)
By definition,
V(nl)
Now
(74)
45
BASIC THEORY
SO
that
V (n 1)
E {nl
I)}
(Ill -
+ E (n l )
{E
(75)
(1l 1)}2
Also
"
E {nl (n l - I)}
= .[;
nl (nl - I) P (nt)
n!(N-n)!
N!-
"
n (n - I) Np (Np - I) '\'
_(Np - 2)_!
N(N-I)
W(nl-2)!(Np-nl)!
(n - 2)! (N _- n)!
(N-2)!--_ n (n -
I) Np (Np -
1)
(76)
-N (N - I)
since the sum of the terms under the summation sign is evidently
1. Substituting the result in (75), we have
V (n)
= n_(n -
N-n
= Fr-l
1) Np (Np - 1)
N (N - 1)
npq
+ np _
n2p2
(77)
v (p ) = li_- n pq
"
N-l
(78)
IN-n .pq
1 . n
VN -
(79)
46
y" =
S2
Sl
=
=
I:YI = ~1
;",1
N-l
..
1: Y 2
-*-__T
nji
(81)
= P..
Np _ Np2
.
N-I
n _nL2
1
n
= li..:__T
= N-I
n
= Ii _
p (I -p)
1 p .. (1 - p .. )
(82)
(83)
{n ~-1 p.. (1 -
p ..)}
N_ 1 p (1
- p)
(84)
n~ 1 l!;_I
p.. (1 _ P.. )
and not by
p. (1 - p,,)
(1
n (N - 1)
= (n-1) N P..
- P..)
(85)
41
BASte tHEORY
IN;;
n p.. ~l~t'll)
(87)
Size
The confidence limits for the proportion are derived on the same
assumptions as for the quantitative characters, namely, that the
sample proportion P is normally distributed. This will approximately be so, unless p is too small (or large) and n is small. The
limits are given by
(88)
=p
'II
I(
)
a,oo
n- 1
(89)
48
The size of sample required for estimating the population proportion with a specified precision is obtained from (88). If the
error permissible in the estimate is, say, p and the degree of
assurance desired is 1 - n, we then have
(90)
1+
For N large and
n
[2
(a. 00)
2p
q,
(91 )
Example 2.1
Material for the construction of 5000 wells was issued during
the year 1944 in a certain district as part of the Grow-More-Food
Campaign in India. The list of cultivators to whom it was issued,
together with the proposed location of each well, is available.
A large part of the material was reported to have been misused
by diverting it to other purposes. It is proposed to assess the
extent of the misuse by means of a sample spot check. Tn other
words, it is proposed to estimate the proportion of wells actually
constructed and used for irrigation p~rposes. The sample is
proposed to be selected by the method of simple random sampling
from the total population of wells for which the material was
issued. The permissible margin of error in the estimated value
is 10% and the degree of assurance desired is 95%. Determine
the size of sample for values of p ranging from 5 to ,9.
We are given N = 5000,
= 10 and
t(Q, 00)
= 1960.
--_.,._------ ---
'5
357
244
159
94
42
Since the worst critics do not place the misuse at more than
half of the material issued, a sample of 357 would appear to be
adequate for the proposed check.
BASIC THEORY
49
1) N, = N
;~l
Now, if a simple random sample of n is drawn out of this population then it can be seen by analogy with the distribution of two
classes that the probability that ni units occur in the i-th class
and n - ni in all the other k - 1 classes together is given by
(92)
(93)
where
P.
N,
N
and
N-n
V (ni) = N-='_l . IIPi (1 - Pi)
(94)
SO
Thus
P'I = ~--
(96)
Now
Ni,Nj
"/::1
=;
n (11 - 1)
(N - 1) . N.Nj
=N
(97)
N N,N/
N-n
- N:::T npjPI
(99)
51
BASIC THEORY
Unirrigatcd
SubNot
Not
Manured Total Manured Manured
(I)
(2)
Population
Nu
N12
Nl
Sample
1111
1112
III
.-~---.--------.------~-
-----
(3)
N21
lin
----.--~--~
Sub
Total
Total
N22
N2
1122
112
II
(4)
~-------
(101)
"11=0
52
= _nn
(102)
+ N12)
+ n12
--
n - N12)
(Nn -- N
n - n12
(~)
ll
-----
(103)
I nI}
(104)
fJ_H
PI
(107)
53
BASIC THEORY
--
E [E
2
_
{nll (nIl -n 1) + nIl I J)J _ /1111
N
1
111
By
(108)
1) Nil (Nil -
- n; Nl (Nl _:_ If
1)
(110)
(Ill)
Now let
where
and
54
nl ~ n~1 {I - n~l
n:;1 ~ - ... }
Hence
E
(nil) ~
npi
nl --
npi
{I
N - nIl
N - I n PI
(I _
PI
)1
f
(112)
+ NN-n
- I
n"PI:' (I -- PI)
(113)
( 114)
BASIC THEORY
5S
Proportion of wells in the population actually constructed; for convenience we will designate these wells
as belonging to class 1; and
q -
56
Evidently,
Np
+ Nq=
~:I
.5-"',) ~
E {E
=- E
(~
11N YN.
lip
( 116)
(117)
51
BASIC THEORY
Np
n, \ ' 2
Np L..JYi
n, (11, - J)
Np(Np -I)
j::q
Hence
E [{ Yi
rJ
E [E {( Yi
~p (t
rIn,} ]
",)
+ Np (N; _
I)
E (n,)
(.
~,N.)
(II'})
and
E (11,. 11, -
I)
Np (Np - 1)
N (N- J)
11 (n - J)
t. Y,'}
(n - I) N2
+ N11 (N:_
I)
P YN.
2-
; Z=7
{(Np - I) S,2
+ NpYN,2}
n (n - I) N2 2- 2
-1)
P YN,
InN
(120)
S8
N (n - 1)
+ n[R-=-l)
N"
2-
.p YN, -
N(N - n)
n (N _:T) (Np - I) SI
+ N2pYN,2
N2
2-
:P YN,
{nfN-_1lI)
+ ~~;;~
N- d
since
11 -
III
= nYn
It follows, therefore, that
v {: . I1 Yn,}
I
V {NYn}
N2. N -n. S2
1/
{L:
N
_-- N (N - n)
n (N _ I)
1.,
2_ N-YN2}
.1',
N(N-I1){~ 2_(Np)2_ 2}
L./'
N YN,
n (N _ I)
1=1
S9
BASIC THEORY
Np~-
and
{N.
-} .__ N (N
- n) { p S
n
n1Y"1
~ 11
. 12
+ P (1
- 2)r
- P) YN
1
(122)
Nil.
N
{C +
1
I -
Pt
(123)
(ii) those from (124), i.e., after neglecting the finite multiplier.
It will be seen that a sample of 690 wells will be required for
estimating the area benefited with a degree of accuracy as large
60
2.3
'7
Cla.~s
327
253
350
267
.-._-----
(i)
(ii)
-~-.------
690
536
419
800
600
457
---.-.----~------
.. . -
-_--------
2b.l
Introduction
61
BAstC THEORY
Clearly,
(i
1,2, ... , N)
(125)
and
PI,
};
1(#1)=1
(126)
where
(127)
It is thus seen that Pi, is not equal to Pi, unless Pi = liN. The
62
zn
E (z,)
= 1: P,z,
j
(130)
Zi
z.. , at
each
d~aw.
\' P
, = L..J'
E (z)
Y,_
NP,
'"'1
YN
(131)
It follows that
(132)
63
BASIC THEORY
E {Zn - E (Zn)J2
E (Z/) - {E (Zn)}2
zn,
we have
d{:tZJ-"
n2 {t Z.2 + t Z,ZI} =
Z..2
i~j
Now
E (z/)
p,Z.2
.E
(134)
i=l
and
E (Z,Zj) = E (z,) . E (Zj)
\=1
(135)
~2 {n
%. 2
1=1
0"2
(136)
64
where
u z2
= lJ Pi (Z;
(137)
ZJ2
;=1
and
N
u,2
.L:
(Yi - YN)2
("I
= 0'n11
N-I
S2
(138)
Lastly, we remark that when the selection probability is proportional to the value of the variate, in other words, when Pi is
proportional to Yi, say, Pi=yi/fL, z assumes a constant value for all
i, and in consequence (136) reduces to zero. In practice the values
of the variate will, of course, not be known in advance but values
of another variate correlated with the variate under study may be
known. We may, therefore, expect that when Pi is proportional
to the measure of size of Yi, the estimate may be considerably
more efficient than that based on simple random sampling.
Zi'S
= __!_
,
n-l
S 2
\ ' (z -
W '
Z )2
"
(139)
65
BASIC THEORY
(','l
~ n~ I {t z,' - n'.'}
{t (~!I
II-I
- nE
I',,'l}
(140)
Now, by definition,
so that
E (z,.2)
+ 2 .. 2
V (2,,)
.,
az
I~
11
1141 )
Iff
11-
II
L...
p
o"
i~;---",
---/1Z.-"}
1=1
( 142)
.1'/
s "-
tl
It follows
(143)
2b.4
66
and
= N
(s -
Yi
I
~;p)
(145)
Pj
where Yi and Yi are values of the units drawn at the first and
second draws respectively.
Clearly,
N
E (Zl') =
~Pi
w '
Yi
NP i1
1=1
(146)
= .YN
Also,
L:
N
E (Z2')
'-C"O
Yi
P J,
1=1
N(S -
~ip)
PI
(147)
= J'N
It follows that a simple arithmetic mean of Zl' and z~' will provide
us with an unbiased estimate of the population mean.
Let
2
Z, -
(s + 1 -
y,
1 ~'p)
(148)
P,
BASIC THEORY
67
Clearly,
N
E (%(nc2)
= t 1.: E (a,)
(150)
Z,
+ {s
~~
P,
{s + 1 _
~ipJ
Pi
Pi } P j
1 - Pi
(151 )
E (%1.:2')
.L: {s +
1 -
~i Pi}
PiZ j
i==l
Zi
(152)
x (S
+I
1 ~i
pJ
Pi
(153)
E (%21.: 2,) - YN 2
Z(n:2)'
We write
68
Pj
+ PhP!,"
PI
-F,
+P
(155)
On substituting for E (ai) from (151) and for E (aiaj) from (155)
in (154), we have
~!
V(i, ,,)
[t {s +
+ [PiP I {I
~'p,l P,z.'
I -
~ p~
~7J
Z;ZI] -
YN
(156)
i#j::q
= !
{t ( -+
I -
P,
1 _ P,
)2 .P 2Z; 2
j
1:1
{t(s+
I -
~'p.) P.z;}
E(a;}
(s + 1 -
~jp) P,z,2
;"1
(159)
69
BASIC THEORY
Also,
L
,01-/=1
E(a,a j )
(S + I
Pi ) (
I -- Pi
S
+I
Pj )
I - Pi
x __ z'=J_' __
1
+ I
I -PI
I -p!
(160)
70
Let
= __
z
I
nYl___
(162)
NE (a j )
{I W{;
n
fly;}
N E (a.)
nL;
(163)
(164)
which for n
Zi
2 reduces to (56).
We
71
BASIC THEORY
Z2 _
n
t Est,
N2
{t
y{
+,~,M}
= Z.2 -_
N2
1'2
. I
E (a,)
N2
yv
i, j
E (a,al)
(166)
i~j
('i
"",
N)
11
1:
=0
(aj) {I .- (ai)}
j(oi)=1
to recast the expression for the variance V (i,.) as a linear function of the
,qua res of the differences of the z's. Thu~, (165) can be written as
v (Z-
) =
Sul~stituting
N2
(a()
EN 1_--._
)'. 2
(a,)
1=1
.,
.j
LJ
j~j=1
for J'l in terms of Zj from (162), and using the ahove result,
they obtain
I
N
V{i.) = 2
L' fECal) (al) - (ajajl} (Zi - Zj)2
21< ,""'1=1
It is easy to ~ee that an unbia~ed e~timate of JI (2:.) is given t.y
_
Est. V (t.)
= 2n
i~i
2
ZI)
72
Pi,
+ (I
- Pi,) N _ 1
P;
+ (I
- P;) N
+ (I
~ 1 + {I
- Pi, - Pi)
Iv __: 2 + ...
- P, - (I - Pi) N
x N _ '2
N-n
N - 1 Pi
n-I
+ IV --1
I}
+,"
(167)
while
n-l
P; N . . :.:. I
n - 1
-
PI
n-l
IV _ 1
N
... - n (P j
N-I { N-2
+ (1
- P; - Pj)
(n -
I) (n - 2)
x (N -
if (N - 2)
n - 2}
+ P)I + ---N-2
(168)
BASIC THEORY
73
REFERE.l\ICES
J. David, F. N. and
Neyman, J. (1938)
2.
Sukhatme, P. V. (1938) ..
3.
---(1944)
4. Fisher, R. A. and
Yates, F. (1938)
5.
Neyman, J. (1934)
6. Sukhatme, P. V. (1935) ..
7. Bartlett, M. S. (1937)
8. Narain, R. D. (1951)
9. Horvitz, D. G. and
Thompson, D. J. (1952)
10. Midzuno, H. (1950)
74
APPENDIX
--._---
__________ :~__:_________ r
w= 1
(I)
(2) (1")
(I)
w=3
Q
(3) (21) (J3)
(2)
(3)
P
(21)
(13)
w=4
(4)
(31)
(I')
(4)
(31)
P
(21)
(21')
(I ')
.. 3
w=s
Q
(5)
(41)
(32)
(3 JI)
(211)
(21 8)
(5)
(41)
(32)
P
(311)
(221)
(211)
(15)
10
10
IS
10
(15)
76
;;;"
N
'-'
"""
=~
."""
on
...,
:::-
or,
<;>
...,'"
or,
.-.
=-
-D
'-0
0
...,.
..""":.;:
,~
...
- .,
r-
__
Ii
Q..,~
'
0>
Col
on
...,
'-'
>:
01
'<t
...,
01 ._,
..C
:::
..., ...,
r-
'<t
'-0
0\
on
on
S
0
r-
...,on
on
on
...,
Col
.C)
:::N
...,
...,
...,
::-
""
.-.
,.,
..., ...,
~
,-.
11'1
._,
::-
...,
...,
...,
...,
on
r-
""
on
'<t
...,
on
;;;"
VI
r-
,.-.
t::;.
~ :::._,
~
.N' ...,
on
._,
,-.
N
.., .....N
;;:: ,_,
._,
"'" ._ ,_,
Q..
- ....
~
;:::;
.-
,_, ~ C
M' ._,
;:::;
N
._,
"-'
Mst C TltEOR.Y
.................... ..
.,
._"
.....
::'
~
::
-c
:;-
..
~
~
:..
N
C
..
:!
..
~
........... _
- .....
. lrl-V\O
_.
C""'.V"';V')
-0
_...
'"N
..
><
....
x
--~~
'-"
Ol
'0...,'">
~
..c
...
M 8~~~
.:-
,-.,
::!,
OC
01
:t
"
. _
"<1"- 0"<1"0\0
NV)
.'
.. -'
. _N'
...,
.. -
. . Nv-
. v"'\O~~~~
:!
-.
:!
N
,;;-
...
..
M _ _
::.
~
...,
.,..
._,
__
.\OM-O\OV')oo
-N
-,..,
'-N-f'I"')-MMV')r-lr)V)
N
\C
._,
.-..
.-..
OC
'-'
. - -N-NNM
--N
'~""\O-N~~
-_Ntr)
'-MN_N~~~~OOO~~
-N
----------------------
78
.!(s
(P, Q)
(I)
(2)
P (I)
(12)
(21)
(3)
(2)
(3)
() 2)
P (21)
-I
(F)
_I
-3
w =4
Q
_ ......
__
(31)
(22)
-2
-I
-6
(4)
...
(2P)
(31)
-I
(22)
-I
-6
w=5
Q
(32)
(41)
(5)
(311)
(2'1)
-3
-3
20
15
(21')
(5)
(41)
-1
(32)
-I
P (311)
- 2
(221)
- 2
(211)
-6
(1&)
-30
-20
24
----------------
-10
(I ')
(J3:
79
DASle THEoRY
Tables of TTl!
TTl!.
gs (P, Q)
w=6
Q
(6)
(51)
(3 2)
(42)
(411)
(321)
(2 8 )
(31 8) (2112)
(21')
(1 8)
(6)
(51) (42)
(3 2)
(4)8)
- 2
P (321)
-- I
(2 8 )
1
-
- 3
(214)
24
-24
-18
( ]6)
-120
144
90
(31")
(2 2 1.)
-I
12
20
40
-90
-120
-15
------~----
-------.~--.
-4
40
~-.~-
- 6
45
-J.5
--.--.-~--.-.~--
SO
;;-
' -'
::N
'-'
,-.
;-
!:'
on
N
'-'
...,
"""
t-
on
on
~
0
N
"<t
&
on
:;-
N
...,
'-'
'" '"
-..
"<t
on
,-.
~
Q,.~
'-"
01
.....
"
I::
....
I::
...
~
..c
::!.
,-.,
it
;:
N
:::;
00
.,_,...,.
on
1'"
,-.
00
("
.....
...,
'"
on
"<t
"<t-
I
~
'-'
"""
"<t
"<t
'-'
\0
,_,
,-.
t-
'-'
'"
"<t
'"
oo
N
'"
'"
"<t
0
"<t
0
.....
'-'
h'
00
"<t
C()
V)
~
I
"<t
"<t-
0
on
'"
"<t
'" '"
'"
"<t
"<t
N
:!!
"<t
~
I
00
~
.....
!!I
BASIC THEORY
Tables of 711!
712!. . gs
(P, Q)
w=8
.-
~---
....
--.--------~
-~-.------------
---
Q
(8)
(62)
(71)
(53)
(612)
(42)
(422)
(521)
(3 22) (5J8)
(431)
(8)
(71)
(62)
(53)
(4')
(6]1)
(521)
2 -
(42')
(431)
2 -
2 -
2
I -
, (5]3)
(421')
(}2(')
(32'()
(2')
(3'2)
(4(')
24 -
24 -
12
24 -
18 -
12
(2111)
(31 6)
(2'1')
24 - 120
-
120
12 - 20
--
14 -
12
12
15
70
12
60
60
30
36
72
33
-384 -180
360
504
180
420
1260
-3360
-4032
-1260
-3360
96
84
64
2688
6
12
12 -
30
8 -
64
(18)
60
120
(21')
(321 8 )
56
5 -
20
20
28
160 -120
-11:20
1344
82
gs (P, Q)-(Contd.)
w = 8
------_._--
-~.-.-----~~-----~---------'-----'""-----------'~---
Q
(421')
(32 1 1) (21)
(31J1)
(41 C)
(321 8 )
(8)
,
(71)
(62)
(53)
(42)
(612)
(521)
(421)
(431)
(3 2 2)
P (513)
(4212)
(3 2 1")
(3221 )
(24)
(414)
(32J3)
(23 P)
-I
-15
30
100
45
lOS
-420
-1120
-420
30
20
IS
(2214)
30
12
32
(211)
-270
-120
-210
(18)
2520
1120
1680
to
(316)
-6 -IS
112
210
-28
(18)
CHAPTER III
STRATIFIED SAMPLING
A. SELECTION WITH EQUAL PROBABILITY
3a.l Introduction
We have seen that the preCISIOn of a sample estimate of the
population mean depends upon two factors: (1) the size of the
sam.ele, and (2) the variability or heterogeneity of the population.
Apart from the size of the sample, therefore, the only way of
increasing the precision of an estimate is to devise sampling
procedures which will effectively red~ce the heterogeneity. One
such procedure is known as the procedure Of stratIfied sampling.
It consists in dividing the population into k classes and drawing
random samples of known sizes, one each from the different
classes. The classes into which the population is divided are called
the strata and the process is termed the procedure of stratified
sampling as distinct from the procedure considered in the previous
chapters, called unrestricted or unstratified sampling. An example
of stratified sampling is furnished by the survey for estimating
the average yield of a crop per acre in which administrative areas
are taken as the strata and random samples of predetermined
numbers of fields are selected from each of the several strata. The
geographical proximity of fields within a stratum makes it more
homogeneous than the entire population and thus helps to increase
the precision of the estimate. In this chapter we shall consider
the theory applicable to the procedure of stratified sampling.
Stratified sampling is a common procedure in sample surveys.
The procedure ensures any desired representation in the sample
of all the strata in the population. In unstratified sampling, on
the other hand, adequate representation of all the strata cannot
always be ensured and indeed a sample may be so distributed among
the different strata that certain strata may be over-represented and
others under-represented. The procedure of stratified sampling
is thus intended to give a better cross-section of the population
than that of unstratified sampling. It follows that one would
84
.E N,
1=1
and
k
.E nl = n
1=1
YN
~L
N;YNi
'=1
(I)
Since
ni
85
STRATIFIED SAMPLING
(ji")
= YNi
and, therefore,
E
CV.,)
= E L~
Pi.i-".j}
pJ'N, = .I'N
(3)
E (V., - YN)2
i=l
(4)
i-l'"j=1
-)2 -
-YNj
--
(In, - N,
I ) Si 2
(5)
where Si2 is the mean square of the population in the i-th stratum
defined by
(6)
The value of the second term in (4) is clearly zero, since samples
are selected independently from each stratum, We therefore have
k
-) V (Y.
s -
LJ (1iii
\'
_N,I) PI 2S
(7)
i=1
S6
3a.3
87
STRATIFIED SAMPLING
(9)
C = en
V(y .. ) +",C
i=l
-t
+ terms
independent of n, (10)
= 1,2,
... , k)
(11)
88
Hence
P"SiCO
;/(::(--.1;
P,siV
,=1
(13)
C )i
PIS,
= n --1.:----
(14)
}J PIS,
STRATIFIED SAMPLING
89
(16)
+~
Vo
L:
i
p,Sj2
== 1
Hence
(17)
.
l-o
I \ , S')
N W Pi ,-
i~1
When
Ci
C,
(17) reduces to
k
,Ep,S,
IIi
,=
PIS.
1=1
VlI + ~
L:
P,SI2
( 18)
j=1
so that the minimum sample required for estimating the mean with
fixed variance Vo is given by
II
kL:
(19)
Vu
p,S,2
'=1
3a.4
We have seen that for ni's arbitrary, the variance of the mean
is given by
1) p,
- NI
2S 2
(20)
90
Substituting for
ni
k
k
V (Y.,)N
.L:
{ j=l
L'P,S,
npi:'-;-
_ l}p,",
N1
'~i
i=l
=:1
C )'
~ PiSI
I
N Lp,S,2
(22)
1=1
(24)
where the subscript P symbolises the variance under the proportional system of allocation.
('''12;k PIS. )2
n =
Vo
+~
LPIS ,2
I-I
(25)
91
STRATIFIED SAMPLING
--~'=~I--k~---
Vo
(26)
+ ~ .z= p,S,2
1=1
V (ji)
"- (
lOP
1 -- 1) \ ' p S
11
N
w"
(27)
~C
r)
- ~)S2
N,
2.: 1:
;=1
(Yo -
jiN)2
Nj
== ,1'}.;
.i=1
{Y'l - _VNi
+ YNi -
YN)2
i=l j=1
==
N,
J-=l
+ l) N, (YN; -
jiN)2
4-1
92
N-l~1
N
-
and
or
(30)
v (Y,,)us- V (j .. )p ~ NN~ n
L
k
Pi (hi - YN)2
(32)
i=l
The expression shows that the more the strata differ in their
means, the larger is the gain in prec;ision due to proportional
sampling over unstratified simple random sampling.
where
k
S'D = 1: p,S,
'.1
STRATIFfED SAMPLING
93
obtain V(Y.)N
i==l
t.=1
(34)
i=l
i=l
I: Pi
k
V (Yw)p - V (5'IO)N
i-
(8
SwF
(35)
j=l
We see that the larger the differences between the strata standard
deviations, the larger is the gain in precision of optimum over
proportional allocation. Further, on substituting for V (Yw)p from
(31), we obtain
N -- n
V(Y")N=
Nn
{s' . t
p , (S'., _.1,)'
- /'_ n
p, (S, -
S.)'}
(36)
Since the first term on the right-hand side of (36) represents the
variance of the mean of an unstratified sample of n, we may write
V(yJ" - V(Y.). '" NN-:' n
{t.
P, (YN< - Y.)'
+ N~ n
pj (S, -
S.,)2j-
(37)
."'1
94
e- ~)
We therefore have
V (Yn)US - V ll'w)s
G- ~)
[
i=l
G. - ~)
p,28,2
95
STRATIFIED SAMPLING
G- ~) 1:
Pi (PHI - YH)2
(39)
or
n., =n
S2
P.,
(41)
1: p,S,2
1=1
giving us (32). As the allocation departs from (40) or (41), the first
term may not only become negative but be larger in magnitude
than the second, thus making a stratified sample less efficient than
an unstratified sample. The result is important and suggests the
need for care in the allocation of the sample among the strata.
30.6* Practical Difficulties in Adopting the Neyman Method
of Allocation
There are certain limitations to the use of the Neyman allocation in practice which will now be pointed out. If more than one
character is to be estimated from a sample survey, then the
allocation of the sample into different strata on the basis of any
one character, using the Neyman method, may lead to loss in precision on other characters as compared to the method of proportional allocation. If, however, the characters are correlated, or
if certain characters are more important than others, then gains
in precision on the estimates of the more important characters
can still be secured by using the Neyman method of allocation.
However, the more severe limitation on the ut>e of the Neyman
allocation is the absence of the knowledge of Si'S. One method
of overcoming this limitation is to estimate Si'S from a preliminary
sample of n' (Sukhatme, 1935). These estimates will, however,
96
= SI
(i
= 1,2, ... , k)
(42)
The allocation of the total sample among the different strata will
now be made in accordance with the formula'"
(43)
t{t
p,2S,2
j=1
1=1
PJlJS?
I""J=l
~:} - ~ t
p,S/
'E1
(44)
(.Y~)p =
(! - ~) L p,S,2
k
(45)
.=1
Where, as in this case, the decision regarding the size of the additional sample to be drawn from each stratum depends upon the results of the fir~t sample,
the procedure is essentially what is called sequential sampling.
STRATlfIED SAMPLING
97
{t. p,'S.'
t. p,S,'
+ ,t,',p,s,s,} - :
(46)
where
1
+ 2n'
Substituting for e from
I}
= 1
E { V (y.,)
I n')J
5j
(47)
I
=
~ [I, S ,
)2
k
I
,2
N ~ [liS.
C- ~) I:
Pi S j
i=L
!L
p, (Sf --
~tr
i=l
P/S/}
i=l
(4~)
p.(S, - S')'}
2!n,{(t
i=1
p, (YN, - YN)'
piSI
)2 -
'(=1
P/S/} (&9)
98
The first part on the right-hand side denotes the variance of the
mean under Neyman allocation when the Si are known. Consequently, when the Si are estimated from a preliminary sample
of size n', this variance is seen to increase, on an average, by
(50)
L:
k
Pi (Si -
8.. )2
(51)
or
(52)
= S,
+ E;
where
and
where
Then Si/S; can be expressed as
S,
sJ
= ~j (1 + ~j)
=
(1
SJ
S.
~; ( 1 .~
i:) (1 -
~j)-l
SJ
~: + s;~
- ... )
99
STRATIFIED SAMPLING
VV (sj)/Sj.
(S,)
'"" 8/8, (1 + C2)
s/
(53)
E{v(r.)
- ~ I>,S,2
k
i=l
I:
k
V(Yw)p -
,~
P,(Si -8.,)2
'~1
(55)
100
From Sections 2a .10 and 2a.ll of the last chapter, we know that,
for samples of n', C2 is approximately given by (fJ2 - I)J4n', so
that the size of the preliminary sample should be such that
}; Pi (S,
(56)
- S,Y
1=1
101
STRATIFIED SAMPLING
where Si2 is the mean square in the sample drawn from the i-th
stratum. If the total sample had been selected by the procedure
of simple random sampling without stratification, then the variance
of the sample mean would be
(58)
<)
Sl", s~
, ... ) Sk 2
1=1
i=l
2.: Pi (J'NI -
(60)
YN)2
1=1
where
E (Ei)
:..0:
and
0,
= YN/ +
E,2
+ 2YNiE;
(62)
YN/
+ N,N,n,
---::!!. s ~
'
(63 )
102
Hence
PJNj2
'=1
.t N~~jn_,
p,S,!
(64)
i=l
It follows that
E,(.
(t P~N<') ~ t. P~..'
( 65)
(=1
4-=1
4=1
or
k
Y.,
YN
+ 1: P,f'i
(66)
i=l
YN 2
+'-=11: p,2f' 2 +
j
1:
p,Pjf',f'j
i~j~l
+ 2YN 1: PiEj
(67)
-1=1
Hence
(69)
Lr,
,-)
k
Est.
(YN,-YN)2
1:1',
.-1
I},
k
(Y,,;_}'.,)2 __
(I-p,)
!V.~~,nl s,
'~-l
(70)
103
STRATIFIED SAMPLING
~N~I
t.
(N, - I) .,'
+ N N_
{t
\>(
Est. S2 = N N
~J ~
1)
Pi -
Si
+ N N_
p,
li'., - Y.)'
If'
>_;)2
I ~ Pi (J"i
) '"
-P,I .,.}
On simplifying, we obtain
E,(. S'
o.
t.
p, .,'
+ N N_ I
{t.
Y.I'
p, (i.. -
-t
p. (I - p,) :::}
(71)
t=l
N-n
(N - J) n
-t.
- 2
Pi (Yni - y",)
;:1
P. (I - P.I
~:1
(72)
104
i=t
i=l
N - n
I) n
+ (N -
-t
P, (I
{~(_
'::!
~ p,)
_.
Pi Yn, - Y.;)-
f}
(73)
x{t,
PdP., - Y.)'
t.
p, (I
~ p,) :':}
(74)
in
2:
k
ll
PiS/
(75)
1=1
and the first two terms in (74) vanish. The net reduction in
variance due to stratification is, therefore, given by the last two
terms in (74). On substituting ni/n for Pi, this takes the value
Est. VU' ..)us
STRATIFIED SAMPLING
105
~-k
L:: L::
i=l
(Yil - }'n,)2
= .1',/, say
(77)
since
--=
T) 8
2
j
i=!
=8.,2
Let
k
}; 11;
U'n. -
.f'w)2
n/k.
i:':1
where i1
(78)
= (k - I) iis h 2
(79)
The quantities sw 2 and flsb 2 are caned the mean squares within and
between strata respectively, and are best calculated from what
is familiarly known as the analysis of variance table given below:
Source of Variation
D.P.
Between Strata
k-I
Within Strata
n-k
Total
n-I
Mean Square
Sum of Squares
10'"
.1' 1.;
J
1=1
(Y'I-Y")
106
s,/
n
(80)
An estimate of the reduction in variance is now given by subtracting (80) from (79) or directly from (76), and equals
- )
ESt. J? (Yn
us
ES,t V (-)
- k - 2 1 {-,
2
YID P mb
n
21
SID J
(81)
The ratio of (81) to (80) gives the relative gain in precision due to
stratification and equals
(82)
Sill
1 =
(n-k)sw 2 +(k-l)iis b 2
(n - 1) sw 2
(83)
The gain in precision estimated this way is n!(n-l) times the value
in (82), which is not likely to be c5f material difference in large
samples, provided the sample is allocated in proportion to the sizes
of the different strata. When the sample is not so allocated, neither
(82) nor (83) is likely to be satisfactory. The exact expression
given by the ratio of (73) to (57) should be used in that case.
30.8 Use of Strata Sizes for Improving the Precision of an
Unstratified Sample
Stratified sampling presupposes the knowledge of the strata
sizes as well as the availability of the lists of sampling units for
the different strata. The latter are not, however, always available.
Thus, the classification of a population by age is known from the
census tables although the lists of persons belonging to different
age groups may not be available for the selection of samples from
the different age groups. Consequently, it is not possible to know
in advance to which stratum a sampling unit belongs until it is
107
STRATIFIED SAMPLING
contacted in the course of the survey itself. While the sample in such
cases has necessarily to be selected by the method of unstratified
random sampling, we can always classify the selected sample by the
strata and treat it as if it were a stratified sample. In this section we
shall examine the gain in precision arising from such a treatment.
If the sample is to be treated as if it were a stratified sample,
then jiw would be the appropriate estimate of the popUlation mean.
This is easily seen to be an unbiased estimate of the population
mean, since
E (}'o.) = E {E (jill.
I ni )}
E(jiNI)
Hence
E tv.,) =}oN
(84)
npt
+ 1_~
~j + 0
n PI
14)
(86)
E{V(YID)us}~
1: {nk + !ii;j~j +
(-J,) - -X}PJPi
2S j
1=1
~~ {CI - Z- Dtp,s,+~ts,.}
+ 0 (~,)
(87)
108
E{V(.Vw)us}
~N~ n
p j S.2
+ ~2
(1 - PI) S/
(88)
It is seen that the first term in (88) is the variance of the mean
The smaller the strata the more alike will presumably be the
sampling units comprising them and the smaller, therefore, will
be the values of Sill. We may, therefore, expect that under
proportional allocation the precision of the estimate will generally
increase as the number of strata increases.
For small departures from proportionality, the effect of increasing
the number of strata is best studied with the help of (88). The
first term in this equation, it will be noticed, is identical with
(89) and will presumably decrease as k increases. On the other
hand, the contribution of the second term to the variance of Yw
STRATIFIED SAMPLING
109
will increase as k increases. For N large and Si2 equal to, say
Sw 2, (88) may be written as
E{V(y.,)us}
~ 1 {S.. 2 + k
tI
1S,/}
(90)
3a.10
Let Pi denote the true but unknown weight of the i-th stratum
and Pi' the inaccurate weight which is known. The sample
estimate of the population mean is then given by
k
Est.
YN = .E p/y",
(91)
1=1
.E (P/
1=1
- PI) YNi
(93)
110
) '=1
+ lOP!=1
]; p/p/ (y,,;
II
~1'}
,
Pk
(94)
For fixed p(s the second term is clearly zero, and we are left with
(95)
The mean square error will be the sum of (95) and the square of
the bias term in (93). We have
{t
'"'1
(96)
111
STRATIFIED SAMPLING
As n increases, (97) will decrease and so will the first term in (96)_
The bias term is, however, independent of the sample size. It
follows that (96) may assume a larger value than (97) beyond
a certain n, making stratified sampling less accurate than simple
random sampling.
An example will help to illustrate the point. Suppose that according to an agricultural census taken in an earlier year, 80% of the
holdings were below 5 acres. Information for the current year
is not available but we will assume that the percentage of holdings below 5 acres has increased to 85. Suppose, further, that
we have selected a stratified sample of 11 holdings allocated in
proportion to the known sizes of the two strata. Then, clearly,
the sample estimate of the population mean of the character
under study will be calculated from
Yw =
'80j'ft,
+-
(98)
'20Yft,
'85YN,
+ '15YN,
(99)
+ 20j'N,
(100)
= - 05 YN,
+ 05 YN,
(101)
{80 8 1 2
+ -20 8
2
2 }
+ {-OS (jiN, -
h.)}2 (\02)
}iN,
(103)
are as
t 12
Stratum
Then,
M.S.E. (ji.,)
= I
+ .0025
(104)
and
-)
_1'1275
V(Yn
us n
(105)
The table below gives the values of (104) and (105) for five different values of n.
n
M.S.E. (y",)
V(Yn)US
25
0425
04510
50
'0225
02255
100
0125
01128
200
0075
00564
400
0050
'00282
It will be seen that for small n, the actual mean square error
of the stratified sample is smaller than that of the unstratified
simple random sample, but the superiority is lost after n = 51.
With a larger size of sample, the bias assumes still larger
proportions. It must, however, be pointed out that the bias will
not be known in practice and consequently the variance of the
mean of a stratified sample will continue to be estimated by the
first term in (96), thereby under-estimating the variance.
113
sample, Qi the number of units in the i-th stratum and IIi the size
of the sub-sample chosen out of Qi (i = 1,2, ... , k). We have
Est. Pi =
=
~i
P.'
and
1=1
k
The estimate
( 106)
i=1
+ E( ~ (P:
- p,) YN'),
(107)
L:
&
E (P/2) N
'=1
N-;'jn
8j
L:
I-=:1
k
+ ~
j:j;l
(lell)
114
= p.2
Wij
have
(109)
E ( '_ )2 = N - Q Pi (I - Pi)
Pi
Pi
N - 1
Q
(110)
and
(1 J J)
v(r:
k
(=1
'-)
PIYni
'=1
\ ' N - Q 1 (1
) -, 2
-f- W
N - 1 Q Pi
- Pi hi
( 112)
1=1
(113)
115
STRATIFIED SAMPLING
(t
P(Y")
+Z
+
p,
(Y .. - YN)'}
So'
(114)
where
II;
Ub
1: p, (jiN,
- YN)2
(I 17)
i-I
since the first term in (115) will be small relative to the second.
116
(118)
(119)
where the letters 'ds' stand for double sampling and, as before,
_
.E PiSi'
STRATIFIED SAMPLING
117
+ c~
n,)
( 121)
~c~
82
Co
(123)
(124)
118
Q ...
'i'.)
p/2S j
~j
.:....J
1=1
2.,,_ UbQ2
(125)
1=1
Co = c1Q
+ C2 .E n
(126)
p. S,2
'n
i
III
+ uQ +
b
IL
c1Q
~
+ C2 ~
nj
(127)
PIS;
2ub VILe;
(128)
(129)
and
(130)
119
STRATIFIED SAMPLING
i=l
giving us
1
Co
YP-
( v' C1 O"b
+ \/C2
(131)
i!' PiS.)
Hence
Co
p,Sj
n =
VC2
VC;
( v'C10"b
(132)
~-v'C2 i~ p,S,)
and
O"b
Co
( v'C-;O"b
ni
(133)
+ v'C2 i~ PiS.)
10
(134)
L~ad
to higher
CU
i.e., if
;=1
i.e., if
(135)
120
where (S
-i
PiSiY
jb
V
Ub
so that
S2 = 5
Example 3.1
Table 3.1 presents the summary of data for complete census
of all the 340 villages in Gha7.iabad Subdivision. The villages
were stratified by size of their agricultural area into four strata
as shown in col. 2 of Table 3.1. The numbers of villages in the
different strata arc given in col. 3. The population values of the
strata means for the area under wheat (Y N ,) and those of the
standard deviations for the area under wheat (Sw) and for the
agricultural area (Sa) are given in the subsequent columns.
Calculate the sampling variance of the estimated area under
wheat for a sample of 34 villages:
(1) if the villages are selected by the method of simple random
sampling without stratification;
(2) if the villages are s~lected by the method of simple random
sampling within each stratum, and allocated in proportion
to (i) the sizes of the strata (Ni ), (ii) the products NiSWl'
and (iii) the products NiSa,.
121
STRATIFIED SAMPLING
3.1
TABLE
Size of
Village in
Bighas*
Nj
(I)
(2)
(3)
)'Ni
So;
Soc;
(4)
(5)
(6)
------_.
0- 500
63
1121
563
129'6
501-1500
199
2767
1164
2670
1501-2500
53
558 1
186'0
276'1
>2500
25
960'1
361 3
982'2
1 Bigha
1.
acre.
I __ . [\' NS
N - I LJ
.2_
,to,
i=l
\'S
LJ
(:1.
2+ \ '
to!
LJ
N.V N .2
v,
k )2]
(,EN~Ni
1=1__
iCl
70850
+ 55577000 -
39372000J
122
_
0
;i;rIl
cJj
_,
"i:
l
-.
""'l
N
::::'-
on
00
~
0\
("
8 $
;:;::
on
<?
l\0
N
rJl
"i:
on
on
\0
\0
"<t
'"""
0
....
0
00
00-
.5
"i
'~
00'
'-'
80::
l-
~
on
0
N
'"
on
00-
':'l
N
00
00-
00
on
-0
IN
f'I
M
M
"'"
0
\0
'I'>
'<t
r-r--
'"'r,
or;
II::
.~
N
!"'l
I.Ll
...l
QQ
r!
....
~
--'-~~
,~
.....i
S
G'
'-'
....
-II::
'cII::
.:a....
~
~
,-
0
\0
0
t-
-N
\0
or,
or,
r--
-0
......
N
00
or;
0-
...00
$00-
on
~ ~ ~
00\0
00
'"
<:>
<:>
.....
01')
~
r--
.st
;:s
~
...
-;
rJl
;;
rIl
';i.
,-.
,-.
C:!
0
......
..
":'
,
\0
'"
\0
<:>
8 Sf
'"
'"'" '""" on~
M
-:
0'1
0\
<?
\0
00
::I~
"'::1
~Z
::::'-
00
00
":'
~
on
on
'"
'"
""
E'"
~E
'"
123
STRATIFIED SAMPUNG
Hence
V(ji,.)us =
N-n
-
ir- . n S2
306
=
2.
1875
We have
306
x 7994000
(340)2 x 34
= 622
2. (ii) Neyman Allocation
The allocation of the sample to the different strata will be in
proportion to NiSWi shown in col. 9 of Table 3.2. On substituting
in (22), we get
V(Y.).
~ ~';'
(tN,S.) -},. ~
c= (3~)2
[(45~)2
rd600
=
N,S .'
7994000]
[611580c0 - 7994000]
460
nN,S.,
= ._._-_-
(1-,f NISal)
124
(310)2 {3~
1
115600
(100480)(21600) - 7994000}
55840000
483
VB
-1-
VA
VA
VB
3.3
Var:ar.ce
(Bighas)2
1875
R.E. compared
to Unstrat.fied
Sampling
R.F. cc mpmcd
to Proporticr.al
Sampling
Stratified:
(i) Proportional ..
fin
301%
(ii) Neymar.
460
408%
483
388~-;;
--.---.--~--."-
135~~
12C;~~
._------_.. _---_
125
STRATIFIED SAMPLING
Example 3.2
A yield survey on paddy was carried out in Kegalle District
(Ceylon) in Maha 1951-52 (Koshal, 1953). Twenty-eight villages
were selected, distributed in the various strata approximately in
proportion to the acreage under paddy. Three plots of 1/80 acre
were harvested in each village. The values of the means and the
mean squares of the village means for the different strata are given
in Table 3.4. Obtain the estimate of the district mean yield by
combining the strata means in proportion to the number of
villages in the strata. Calculate its variance and hence estimate
the efficiency of stratification as compared to unstratified simple
random sampling, treating the village means as the true means
of the respective villages.
TABLE
3.4
Nj
nj
)in'
Sj2
(Oz.jPlot)
(Oz./plot)2
189
369
4330'9
242
301
14812'4
146
368
17309'0
178
171
1658'5
287
10
305
3452'7
126
= Y., =
From col. 5
3016
=
=
29823 - 757
29066
(2~
=
=
1)~
1, we obtain
It t
p,','
p'y .' -
(t
p'y .)'
(0,034755) (10900)
379
Hence
Efficiency of stratification
379
- 291
130 or 130%
127
STRATIFIED SAMPLING
TABLE
3.5
Maha 1951-52
Calculation of thl! District M(!an Yield and its Variallce
Stratum
N,
II,
Yo,
P.
Number
(I)
(2)
(3)
(4)
(5) = (3)x(4)
189
369
'181382
669
24700
43309
242
301
232246
699
21000
148124
146
368
140115
516
19000
17309'0
(6) = (3)x(5)
(7)
178
171
170825
292
5COO
1658'5
287
10
305
275432
84'0
25600
3452'7
1042
28
3016
95300
~--
._-,,-
Stlatum
Numher
----
.".--~-
..
--- ----------------
PiS,"
p,2 S,2
p,s,
n,
(I0)=~8H(2)
p,ts,
_.. _------_ ..
_-
p,2 S,2
n,
N-;
(11)=(9)-;-(2) (12)=(9)-;-(1)
78555
14248
157 I
2850
0'754
344012
79895
491.:!
11414
3'301
242525
33981
8084
113 27
2327
283 31
4840
944
1613
0272
95098
26193
951
2619
0913
16464
29823
7'567
7885'21
-----.'-
-~-
--
--_._----._------ -
--------_.. --.
1, 2, ... , k)
128
Zn,
"I
1:
II,
Z;J
"I
1:
__ o.
YII
N~ P~I
(136)
where
aj 2
NI
1} PI!
(Zjj -
i l . )2
(138)
11
ii'
(139)
1"'1
and
Z..
1} P, i ,.
=.)iN
(140)
L:
N,i"j
1=1
(141)
For,
1} PIE (Zn,)
''''1
(142)
129
STRATIFIED SAMPLING
E [zoo - E (ZID))2
k
( .l: PiZn;
1=1
k)2
l: PiZi.
1=1
E {J;Pi(Zn,-Zi.)
}2
i=l
l: p.2 E
(Zn; - Z .. )2
i=l
+ l:
i-0.i'=l
xE
(Zn,' -
=.'.)
since samples are drawn independently from the i-th and the i' -th
strata. Hence,
k
V (zw) ''''
I>,2
(143)
i:l
in virtue of (137).
Using Section 2h.3, an estimate of V (zw) \\ill be provided by
k
.L: p/
\.
'i.:
(144)
11.
i=1
1 ..
n,- I
z's
L:n, (
' ), 2
Zil _ ' ;....
3b.2
The variance of the estimate, apart from the population Constants Pi and CTiz, is seen to depend upon the allocation of the
sample among the different strata. The cost of a survey will
likewise depend upon the values of ni. The principle of determining the optimum values of nb as stated in Section 3a.3, is to
9
130
maximize the precision for given cost or minimize the cost for
given precision. We shall illustrate the principle for the simple
case for which the cost of the survey is represented by
(145)
v (zoo) + ,.,.C
c/>
(t
Cin,)
1=1
where IL is a constant.
Clearly V (zw) is minimum for fixed cost, say Co, or C is
minimum for fixed variance, say Vo, when rP is a minimum. Now
4> can be written as
=
i=l
'=1
(146)
(147)
= --
P,u
.. Co
k -
---
(149)
131
STRATIFIED SAMPLING
n, = n
.E
n,
p,a;zyc,
'~l
(151 )
Substi-
( 152)
=--"
11
[{t Pla.. }
i=1
.t
PI (C- -
a",zFJ
( 154)
i=l
where
k
{T",.
= .E pja
.=1
(155)
It follows that the efficiency of optimum over proportional allocation is dl!e wholly to the yariation among the ~trata stH.G<:J d
132
deviations. If th! O"iz are all equal, the two systems of allocation
become equally efficient.
zn
"
Lz,
L
- -
y,
NP I
(156)
(157)
11
where
N
1:
a. 2 =
P,
(Zl -
( 158)
Z.. )2
1;1
and
N
Z..
= 1: P,ZI =YN
(159)
1'''1
:1
(j
(160)
1:P,
Nj
ZID
=E
p,!",
(161)
'''1
V(!.,)
LP/ ~~~
t&l
(162)
133
STRATIFIED SAMPLING
where
CTiz
N,
= }}
P il
/=1
2 .
(ZiJ -
(163)
)2
P, =P,j)} P,
=PjIP.
(164)
where
(165)
Also
Z,
P.
= Pi.
(166)
ZII
L
N
CT.
P, (Z, - Z..
)2
1=1
'=1
J=l
\ ' ,e(
Ll
-=1
Pi. "
+L
\' lPi' (!!i
P
'-I
i.
ZI _
Z
..
)2
(167)
134
(168)
(169)
or
(170)
1:
k
VUS
VP
==
I
11
p I. (P.
p. Z- ..
-- Z- ..
,.
)2
(171)
{~P.
LJ"
i ....
Piai.
Pi.
nI
L
k
P, Z
i. (
Pi.
i. -
_
z.')
2}
(172)
.=1
The efficiency of a stratified sample will decrease as the allocation will depart from the Neyman principle and a point may be
13S
STRATIFIED SAMPLING
reached where the first term in (168) will not only be negative
but larger in magnitude than the second term, thus making an
unstratified sample more efficient than a stratified sample.
For the special case when Pi. = Pi, (168) takes the form
1:
k
{VUS -
VS}(Pi.=Pi) =
i=J
+~
L pdt. - z.Y
(173)
i=l
IVu, -
VN)w,.""
~!f
p,(u" - '.')'}
( 174)
31J.5
t p.2
&.=2
(n~i. - I:.)
+ 1 Est. {Lk
n
iel
P/Zi. _!
p i.
..
21J(175)
whence
(176)
136
Uiz
E (2,/) - 2, ,2
l1i
whence
Est. Z,,2 = Zoo 2
Pi 2
_!;/
i=l
Est.
{
k
~
p~:,
Z,,2}
,'<,
L
k
Est. {Vus - Vs }
i=l
+1
It.
p,2 zn /
Pi,
Ie
-'J
Zoo
L Pi~;;Z:_ (~i. -
1)
'=1
(179)
Est. {V us - VShPi. = po
! I:
1=1
(180)
STRATIFIED SAMPLING
137
REFERENCES
I. Neyman, J. (1934)
2.
3.
4.
5.
Ii.
"On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling
and the Method of Purposive Selection," Jour.
Roy. Statist. Soc., 97, 558-606.
Tschuprow, A. A. (1923)
"On the Mathematical Expectation of the Moments
of Frequency Distributions in the Case of
Correlated Observations," Metroll, 2, No.4.
Sukhatm~, P. V. (1935) . .. "Contribution to the Theory of the Representative
Method," Jour. Roy. Statist. Soc. Suppl., 2,
253-68.
Evans, W. D. (1951)
"On Stratification and Optimum Allocations,"
Jour. Amer. Statist. Assoc., 46, 95104.
Neyman, J. (1938)
"Contribution to the Theory of Sampling Human
Populations," Jour. Amer. Statist. Assoc., 33,
101-16.
Koshal, R. S. (1953)
"Report to the Government of Ce)lon on Sample
Survey of Rice," E.T.A.P., Food and Agriculture Organisation of the United Nafions,
Rome.
CHAPTER IV
40.1 Introduction
In developing the theory of simple random sampling in the
preceding chapters, we have considered only estimates based on
simple arithmetic means of the observed values in the sample.
In this and the next chapter, we shall consider other methods of
estimation which make use of the ancillary information and which,
under certain conditions, give more reliable estimates of the
population values than those based on the simple averages. Two
of these methods are of particular importance. They are: (i) the
ratio method of estimation, and (ii) the regression method of
estimation. In this chapter we shall consider the former.
4.2 Notation and Definition of the Ratio Estimate
Let
Yt
139
rN =
rj
1=1
lation,
the simple arithmetic mean of the
ratios for the units in the sample,
y
-X
'lnd
R.
Y.
.i.
Ey.
n
EX.
140
so that
(3)
where
N-n
N
S/
n
(4)
Similarly, let
so that
(5)
where
(6)
(7)
We shall now suppose that the x's are positive and n is suffici
large, so that
~nt1y
141
Expanding* then
(1 + ?)-l
as a series in
n',
we have
ill
XN
in '3
'2
E. ..
En
' XN
in '4
XN3+ XN 4
YNX N
in in '3
+ ... J
YN XN3
(8)
N-n
so that
II)'. = NYN - (N - n) YN~'
Similarly
Hence
Rn = RN
(I -
N;; n
tf,~)
(I _ N-; n
~~n
x~-n) <
xN
Expanding
(I _l'!_;; n
x;;;t
by Taylor's theorem and working out expectation term by term. he reac~ed the
same expression for the expected value of R" as gi\en in this and the next
section.
142
Now
E( :)
~ ~.
=
E {(
t) (t:)}
J2 E ft.,' +
t fijfi/}
ioFl
[(t,) (t . . )
- tfiifit'}
'=1
N--n
1
Nn N-J
N -n J
--_
pS S
nil.
N
(lO)
xN )
(11)
RN
{!
-:-_n (~1I2
Nn
XN
- P~II~~)}
YNXN
(12)
where
143
and
2 -
pC~.
C)
(13)
C2
(l-p)
n
i.e.,
(14)
144
(15)
It follows that
E(RJdE(~ln
= E c{3x. 1
1. x.. r
={3
=RN
(16)
145
Appendix to this chapter. Using then (6), (10) and the formulre
in the Appendix, and writing in terms of the moment notation
given by
i=l
+ (N -
n) (N - 2n)
(N - I) (N - 2)
1 (
112
+ (N -
fLI2 _ _
fLIOfluI2
fLoa)
fluI 3
+ +
n) (N2 - 6Nn
N
6n 2)
(N - I) (N - 2) (N - 3)
n3
X
(/L04 _
fLOI 4
fLI3 )
fLlOfLOl 3
N(N-n)(N-n-l) 3(n-l)
I) (N - 2) (N =--3)
n3
+ (N -
We then have
E2 (Rn) = RN
10
[I
+ ~ (fL022 fLOI
fLU __ )
fLICfLoi
fLo22
n 2 fLoi
RN
RN
[I
+~
(C.2 - pC"C.)
(I
C. 2 ) ]
(18)
146
8
1
Bl
(1
C. 2 )
(19)
Equation (19) shows that the contribution of the third and fourth
degree terms to the relative bias of a ratio estimate is 3Cx 2 /n
times the value of the latter to a first approximation. Unless
n is small, the contribution can, therefore, be considered to be
negligible. G. R. Ayachit (1953) has assessed the value of contributions to the bias from successive approximations by means of
experimental sampling on a wide range of populations commonly
met with in surveys, and found that the contribution of higher
order terms is negligible. For appreciably large n, say 30 or
larger, even the leading term is found to be of no consequence.
40.5 Variance of the Ratio Estimate
By definition,
(20)
RN2E(~"
YN
~;)8
XN
(22)
Let VI denote the variance of a ratio estimate to a first approximation. Taking the expectations term by term, we obtain
(R.)
V1
= RN I
N
_ --n
-1 ( C I
N- . . n
'
+ C2 -
2pC,C.)
(23)
147
or
(24)
When
2 (relative bias)
(26)
llyn.
To obtain the variance of the estimate of the total, namely,
Y R , we multiply (23) by N2XN2, so that
(27)
N (N_n - n)
{s
N2 {V (y,,)
+ RN
1/
+ RN2S.
2 _
2R NP.,
SS }
V (i.) - 2RN
COy
(Y.. , i,,)}
(28)
+CI
2pC C = ~..2
1/'
YN
+ ~z 2
XN
~S~S_.
YNXN
148
(29)
and
NXi)2}
(30)
i~l
(31)
and
(33)
149
Hence
or simply
=
N; V (y I i)
VI(R.)
N-n
I
I
I"
N - I ' n' XN 2 N LJ Ni V(Yill x ;)
(34)
i=l
and
k
V I (YR ) =
N (N -- n) I " N V(
I )
N __ I
. n LJ i
Yo , x,
(35)
i==1
the value of x.
in which
(a) V (y
I x)
(b) V (y
I x) oc
constant say y,
(~
or V
GIx)
say
X2
say yx2, or V
)IX,
Ix) =, ;2
or V
(36a)
(36 b)
GIx) =
(36 c)
and
(c) V
Cv I x) oc
150
VI (Y R )
(a)
N - 1 . nXN 2
N2 (N - n)
N -I
N-n
y__
(b)
N - 1
nXN
. --. -
N-n
(37 a)
N2 (N - n) y_
.----.
XN
N -1
n
(37 b)
N (N - n)
N -1
(37 c)
4a.6
Just as Sy2, Yn, sx2 and .in provide unbiased estimates of the
corresponding population values, similarly Syx defined by
n
s~.
E(J',
-}n)(x,__~_xn)
n - 1
VI (:n)
N
= NN- n (~~:
n
Yn
+ ~~_: _ 2S~.)
Xn
(38)
.f"x,.
Est.
VI (Rn)
= l'!_~__n . _1 ___
1_ f' (y, RN
Nn
y.2 n - 1 W
R"x,)t
(39)
151
and
(41)
The reader will note that these are biased estimates but the bias
will be negligible if the coefficients of variation of y and x are small.
One special case of a ratio estimate, for which the estimated
variance takes a particularly simple form, may be mentioned. It
is the case of a weighted mean in which the weights are in the
nature of ancillary information, varying from one sampling unit
to another, with Yi of the form Willi and Xi = wi, so that
"
R n =.;;-,I/) = _l:_ItIi';TJ
(42)
1: IV,
The sample estimate of the variance for this case is given by
Est. VI
(~.. )
N-n
Nn .
(n
_~ 1)
L:"
,,2
(7], -
~..)2
(43)
and
(44)
152
On squaring both sides, expanding the right-hand side and retaining terms up to and including the fourth power in n and n'
and taking expectations, we obtain
E(Rn - RN)2
RN
E(n 2 +-
YN 2
.'2 _
XN
~n~:)
J'NX N
(47)
1-'02 _ 21-'11)
1-'01 2
/LIOI.LOI
2 (N - n) (N - 2n) I
(N - I) (N - 2) n 2
(N --- I) (N -- 2) (N - 3)
X (
1-'22
__ 21-'13
.,
2
3
1-'10 -1-'01
I-'lOfl'Ol
/'04)
4
/Lei
3 (n - I) N (N __ nL<N~ n ---:!)
n 3 (N - I) (N - 2) (N - 3)
X (1-'2flIl 02 + 21-'u
2
2
1'10 1-'-01
6}'U/L02a+
J.LIOJLol
~1-'o2.2)
(48)
JLol ..
1-'11 = (1
+ 2p2)
=0
3S.', I-'al
S,2S~2
153
so that
+ 32
{(I
J._
2 2) S.\lS~2 _ 6pS.S.3
L' 2X" 2
)-, "
.7N
N
N"N 3
,p
+ 3S.'}
.X- N4
V(Rn)
RN
3 C. 2
n
{V
(:;) +- ~ (C.
- pC.)2}
(49)
~~ RNy
--.C
VI (::)
+ ~ C/
{VI (::)
+ ~ (C. -
pC.)Z1
(SO)
Now
E (Rn - RN)2 = E [R. - E (R,,)]2
= V (R.)
+ [E (R.)
- RN]2
+ (bias)2
(51)
Hence, deducting from (50) the square of the relative bias term
given by (13), we obtain
V2
(~:) =
VI
(~:) +
C.2Vl (;:)
+ ?-;:~
(C. - pC.)2
(52)
154
C4
R:R) = 2C2
---fj-- + n2
(l - p)
{6 (1 - p)
+ 5 (I
- p)2}
(53)
I x) = f1x.
(55)
=)lX
(56)
155
YR
=};
A,nJ'.,
(57)
'''I
}; A,n,f3x, = }; N,f3x,
or
k
}; (n)'i - N;)
Xi =
(59)
};n,zA.z
i-I
== x,) = )IX,
V(y", I Xi)
(60)
156
i.e.,
where Si2 is the mean square for the i-th class. Hence
(61 )
where
J.L
Clearly, V (YR I nl> nz, ... , nk) is minimum when each of the
square terms on the right-hand side of (64) is zero, or in other
words, when
(i = 1,2, ... ,1..)
(65)
157
2;;
NXN
L (N(N;-!l :.r'
k
jml
whence
(i = I, 2. """. k)
(66)
(67)
and
(68)
and
(70)
showing that the ratio estimate under the conditions stated in the
beginning of this section gives the best unbiased linear estimate,
provided Nt'S are large.
IS8
When, however, Ni'S are not large, the estimate Y R will be only
approximately given by (69) and the effect on the variance of Y R
will be to multiply (70) by the usual finite multiplier (N - n)jN.
For, estimating Ni from the sample, we have
and hence
N,-l
Nj
=--,,; ::::
(71)
j{=1i
and
(73)
We notice that the variance depends upon the set of x's which
happen to turn up in the sample. In repeated samples, to a first
approximation, the average value of (73) is given by
V(Y R )
= '!_(!! -_!')
n
yXN
(74)
We have just seen that when (i) the relationship between y and
x is a straight line passing through the origin, and (ii) the variance
of y about this line is proportional to x, the ratio estimate Rn
is the best unbiased linear estimate of the population ratio for
a given set of x's, with the sampling variance given by
159
1_ (C 2 Vn.
r + C 2)~
2pC C
II
R~
Ca,oo)
}__ (C 2
Vn
+
C ~)l:s:::: I ~;::
"" --,
2pC C
_Rn
RN
1 tca, 00)
0n
(76)
(C.2 - 2pC.C,
+ C.B)'
Let
(77)
(S/ - 2RN
pS.s. + R
2S. I )
(78)
162
We have worked out ill Table 4.1 the ratios of YN, to xN , and
of V (y I i) to XN, for the different classes. The ratio of YN , to
xN , will be seen to be fairly constant, showing that the relationship
between y and x is approximately linear. V (y I i) also appears to
vary as x up to 1000 acres but not beyond it. It has, however, to
be observed that the coefficient of correlation for the last class,
namely, with villages having area larger than 1000 acres, is rather
large and the calculated value V (y I i) cannot possibly give for
this class a correct idea of the variance of y about the line
y = RNx. On the other hand, any further division of this class
to study the behaviour of the variance of y with x is also not
feasible owing to the fact that the number of observations is few.
As about 35% of the livestock population is accounted for by
villages with agricultural area larger than 1000 acres, it appears
advisable to study separately areas less than 1000 acres and those
with larger acreage. The ratio estimate may be used to provide
an efficient estimate of the livestock population for the first group
comprising all the villages in the first six classes.
Example 4.2
N-n
V(v)
(319 - 64)
318
= 10572000
8292
64
163
T.~<!!_D:)
{V(y)-2RN Py'V(xf-:-Y(j;)
+ RN 2 V(X)}
1134
......
367.5
03086
and, therefore,
Also
V(x)
= 39528
V(y)
= 8292
and
P y'V(x): V(y)
12378
319 x 255
319
[8292 __ 2
64
x 318 x
03086
12378
+ 009523 x 39528]
-
12710 x
12710
H~
x 4430
= 5631000
[8292 - 2 x 3820
+ 3764]
164
__I__ \' Nt
N-l
LJ
V (y
Ii)
N (N - n) X
! x J{ ~ 1:
1
Nt V (y I i)
i:C:l
= 319
255
64
4602
5849000
This value is only slightly larger than the one calculated aho"e,
as one would expect owing to the small values of p within the
classes and the close linear relationship between y and x.
It will be seen that the variance of the simple arithmetic mean
estimate of the total livestock population exceeds the variance of
the ratio estimate by 88%. showing thereby that the latter is 88%
more efficient than the former. This large gain in efficiency of
the ratio estimate is to be expected in view of the high correlation
between the agricultural area in a village and the number of
livestock in it.
165
",t::;l
1t '-'._
.,0.0
.:s;:_~
UCu
'-'
1t~.5
~
...
"'0.0
.:s""'s
U~o
<:II
-l::
......
'7
......
;:s
'"I:s
f6
,-...
810
' - ' .~
.~
;;..
C
......
- ...
v:,
"> .J:J
"'-l
<:;)
<:II
.J:J
...l
t:Q
;:s
";:.."
"<t
..,
<:II
0
0
10
'"
co
10
...,.
N
00
'"
g
'0
r-
"0
r:!
0-
co
'"
'"
"?
VI
0-
VI
00
00
VI
...,.
....
0-
~
":'
r-
10
r-
.;.
VI
.~
....
'"
00
10
-s....
on
~oo
M
...,.
0-
on
"!
00
N
0\
;
N
r-
....
0\
00
...,.
on
on
00
N
0-
"!
VI
10
....
10
":'
....
""M 6 :!
10
'"'
:a
<IS
~
::s:
<IS
rr-
00
"!
0\
...,.
6on
0
10
~
.....
on
rrN
0::
on
N
'<1'
N
0\
0\
r0
r-
....
00
r;- 9
on IO
...,.
r-
on
00
00
M
00
...,.
.;..
on
6on
VI
on
on
on
0-
....0
co
"?
00
N
6 6
on
10
r-
r-
:::::.
cd,
M
IO
~
~
<:II
<IS
'-
",0
i:'
~
-;~';;'
.Eel
"'''''
'" - co
t .. e
~~<
..!!!e..!!!
u'i~
-<
"!
on
0N
on
0
IO
0-
':'
~
N
....
r;-
':'
r-
0\
0\
II
~.
"
c:.
'"
:l
~
6
a
....e
~"
..
;::
:E ...:
::1,<:;
c.-
....0 B e
'i c::>
";,.
'0
';;':
""Vi~
oS
I
l. !
....0
6
e-
~ ; ~
.....,
" ~"
~ -; '-0
..!!! E
t
~
.0
'"c:
:;;:
III
IO
10
,~
...
.. ..."
8-
';;':
...,.
on
...,.
.~
',:j
....
r-
v:,
"!
r-
~
!::
":'
0-
v~
'S
~
-'" '"
0-
VI
E}
0
....
I
c:::':
<:II
t;,
8N
'->
'"
~
VI
.0
~
..........~
";:..::::
......
00
.....,
:.:
< ~
f:-o ~
<:II
.-.
.-.
.;:
...
eo
._~
"""u
;:s
.-.
C/l.-."'g
;:..
'0
it
en
::..
en
i
i
~
'10(
'i
';0..
::..
oS
Z'"
.:::
eoo&!
. ..
~
i'1:i ~I
ii:.t
:ii
.-.
c:
Q:
III;
::..
:t
81
:I~
'8_
!~
:I
166
40.11
NI}
.Ek f R n , x 2,'
X Ii
j
t=l
\.
1-=1
=-=
E R.,
N,x N ,
(84)
'=1
Now
k
(85)
1=1
167
(87)
E (R.,)
RN ,
VI (YR ) =
2:
N,2 XN /V I (R.,)
(88)
t=1
t=l
k
where Pt
L:
p, (N, - n,) (S
2..L R 2S
. _ hi
Nt
h
n,
2 -
2RNtP,~StillSI.) (89)
1: PrY.,
'-1 _____ _
~
1:
(91)
PIX.!
1=1
and denote this ratio by Rn. and the estimate of the population
total by YR. in order to distinguish them from the corresponding
168
E Pt.l'nt =YN
t=l
+- 11
00-= iN
+-
n'
(92)
t=1
where
E (n) = 0, E (n')
= ()
t=1
(93)
t=l
(94)
(96)
pC C )
, ,
(97)
It follows that even when the size of the sample within each
stratum is small, a combined ratio estimate can give a satisfactory
estimate of the population total provided the total sample IS
sufficiently large.
169
+[ N,N-,n,n,
. p,
t=1
8,/
X., 2
'!~_~ n,
N,n,
N, -
n,
n,. p, fS
(' I.2 TL
R N2S I2
2RNP,,,
S 8 ,_}
t=1
(~8)
n,
R Nt 2)
2 (R N
R)
Nt
S SIx }
PI' tv
t=l
=L
t=l
X (R Nt S,,2 - PtSt.S,.)}
(99)
It will be seen that (99) depends upon the magnitude of the variation between the strata ratios and the value of
(R N ,S,z2 -- P,S"S,x)
170
(100)
n,
(101)
Clearly,
r/>
can be written as
+ terms independent of n,
(102)
n, oc
N,8,:
(103)
171
Example 4.3
From the livestock data referred to earlier, it is proposed to
draw a stratified random sample of 73 villages (amounting to 20~~
of the total) and to estimate the total livi!stock population for
the entire subdivision. The villages having agricultural area up to
1000 acres constitute the first stratum and the remaining villages
the second stratum. Calculate the optimum allocation of the
villages between the two strata if the method of estimation to be
adopted is (i) ratio method with a common ratio for both strata,
(ii) ratio method with separate ratios for the two strata, and
(iii) simple estimation within each stratum.
Also calculate the sampling variance of the estimated total by
each of the above methods and hence compare their efficiencies.
The relevant calculations for each method step by step are
presented in Tables 4.2, 4.3 and 4.4. The tables are selfexplanatory. The results are tabulated below:
- - - - - - - - - - - - - - - - - - - - - - _ . _ - . - - - - - _.._-----_.Method of Estimation
Number of Villages
Sampling Efficiency
in the Sample
- - - - - - - - - . Variance
Stratum 1 Stratum 2
54
19
8707000
1915
54
19
8688000
1920
53
20
16677000
1000
stratum
172
4.2
Stratified Random Sampling with a Single Combined Ratio
for Both Strata
TABLE
Ne
= 1519 =02919
RNa2 =008521
520'4
Stratum I
Agricultural
Area
0-1000 Acres
(1) N,
319
45
8292
58402
12378
107673
3613
31430
39528
377255
3368
3214b
4448
28317
66'7
1683
21300
7600
54
19
84535
1170
6963000
1744000
(2) V(YI;)
(3) pl.v V (x
r;)V-G-T7)
..
S,.'2 = N~
I [(2) - 2. (4)
(6)]
(10)
..
n,
= 6963000
Stratum 2
Agricultural
Area
> 1000 Acres
+ 1744000
= 8707000
.,
S..
= 2951
= (151'9) (364)
= 55300
..
%S..
55300
N,S,:
173
4.3
45
030!!6
02650
0'09523
0-07023
8292
58402
12378
107673
3820
28533
39528
377255
3764
26495
4430
28464
66'6
168'7
21200
7600
(I) N,
(2) RN, = YN, / .tN, ..
..
(3) RN/
(4) V(y
I i)
(5) p, V (x
I i) . V (y I i)
(6) (2).(5)
(7) Vex
I i)
(8) RN/V(X
I i)
= (3) . (7)
__
=-N~i
(4) - 2.(6)
+- (8))
(10) Sty'
(II) N,Stu'= (1).(10) ..
54
19
84535
1170
6935000
1753000
(12) ntt
(13) N, (Nt-n,)
(14) V(Y )
R,
_ (9) (l3)
{I 2)
..
V(YR )
= 6935000
Stratum 2
Agricuit ura I
Area
> 1000 Acres
+ 1753000
= 8688000
..
S.E.
=2948
%S..
2948
=55300.100 = 53
The steps leading to this figure are reproduced from example 4.2.
t Obtained by distributing the total sample in proportion to N,S,,' shown in
row (11).
174
4.4
Stratum 2
Agricultural
Area
> 1000 Acres
319
45
8292
58402
(3) Sew'
8318
59729
912
2444
29100
11000
(I) N,
N,
(4)
S,~
(5) N,S ,u
(6)
n,
(7) N, (N,-n,)
(8) V (YRI)
(7) . (3)
--(6)"
V(YR) =
Y
S.E.
53
20
84854
1125
13317000
3360000
16677000
= 55300
-= 4084
% S.E.
5~~ .100 =
7'4
175
and
Let Yi (i = 1,2, ... , N) be assumed to have the value 1 whenever it falls in class 1, and 0 if it falls in class 2; and let Xi
(i = 1, 2, ... , N) be assumed to take the value 0 whenever it falls
in class 1, and 1 if it falls in class 2. It is then easy to see that
R"
.E y,
.
.E x,
III
(105)
112
and
N
RN
.E Yi
i=l
N
.E
Xi
NJ
(106)
N2
1=1
(107)
176
N-I
Np - Np2
N-1
- -N- _ . p . q
-- N - 1
(l08)
Similarly,
(109)
and
N
N __::_
S/=
(110)
1 pq
Lastly,
N
1:
pS.Sv =
NXNYN
N _ 1
XiYi -
j=L
_ 0 -- Npq
-
so that
E
(Ill)
N-l
= -
1.
G:) Z: {I -rcoc
Nl
N2
Nl
N2
N N~ II (N
1. ~ + N ~ I)}
{I + NN-- nI . 1(Pq + I)}
II
{I
+ N-:- n
N -1
I}
(112)
nq
exceed 2%.
q
500
250
167
125
100
177
v
1
(Rn)
= NRN
N - 1
II
J (pq
n
+ P fq
2)
N-n
N-J
N-n
N - 1
npq
(113)
= n"
_ N,
N -
IlJ
NI
(114)
YN
p"
xN =PI
(115)
8;1.
N ~ 1 Pi (1 - Pi)
(116)
8 ,.2 = N :._ 1 PI (I J2
PI)
(117)
118
and
(118)
so that
P=
. /
P,PI
V(I--p,)(I-PI)
(119)
(120)
It will be seen that the formula is identical with (112) except that
q is now replaced by Pj.
To obtain the variance, we substitute from (115) to (118) in
(24) and obtain
+ N~ -1
p,
==
(121)
179
B.
4h.l
z i --_ NP.
,
}~N
-+-'
E,
and
Xi
E(Vj)=XN
iN
E (1\) = XN
E (zn) =
L
N
P, (z. - SN)2
i=l
= n PC1.C1.
ISO
{I + n1(~.22
_ YNX
~(T~(T.)}
xN
N
(122)
(124)
or
(125)
It follows that
VI (YR )
~I
f.t
L=l
Pj
(Zj -
RNVj'f}
(126)
or
(127)
181
"
1 _12 _1
" (z _ R V )2
n V" n - 1 W i " I
(128)
and
(129)
Xi
so that
z.
VI
and
NE (zn)
=NYN
showing that the ratio estimate for this case provides an unbiased
estimate of the population total. Further, from (126), we have
182
Example 4.4
Table 4.5 shows the total cultivated area during 1931 as also
the area under wheat in two consecutive years 1936, 1937 for
a sample of 34 villages in Lucknow subdivision (India). The
villages were selected with replacement with probability proportional to the cultivated area (including fallows) as recorded in
1931. The total cultivated area in 1931 and the total area under
wheat in 1936 for all the 170 villages in Lucknow subdivision
were known to be 780]9 and 21288 acres respectively. Estimate
the area under wheat for the subdivision for the year 1937 using
the ratio method of estiIllRtion and calculate the standard error
of the estimate so made.
What would be the standard error of the estimate jf the
information for the previous year were not used?
183
4.5
TABLE
Total
Area under Wheat
Cultivated .
1936
1937
Area in
1931
( ..
1000 x
Ie IOOO~v
v
O'45!j94
b-:43M94
(Acres)
(Acres)
(Acres)
'a'
'x'
'y'
(2)
(3)
(4)
(5)
(6)
401
75
52
187
130
634
163
149
257
235
1194
326
289
273
242
1770
442
381
250
215
1060
254
278
240
262
827
125
III
151
134
'7
1737
559
634
322
365
(I)
-----.
1060
254
278
240
262
360
101
112
281
311
10
946
359
355
379
375
11
470
109
99
232
211
12
1625
481
498
296
306
13
827
125
III
151
134
14
96
52
63
15
1304
427
399
327
306
16
377
78
79
207
210
17
259
78
lOS
301
40S
18
186
45
27
242
145
19
1767
564
515
319
291
412
20
604
238
249
394
21
701
92
85
131
121
22
524
247
221
471
422
184
Serial
No. of
Village
Total
Cultivated
Area in
1931
4. 5-(Contd.)
-~------
-.~
1936
r~ l000x
a
1937
[.o_!_OOOy
a
(Acres)
(Acres)
(Acres)
'a'
'x'
'y'
(2)
(3)
(4)
23
571
134
133
24
962
131
144
25
407
129
103
317
253
26
715
192
179
269
250
27
845
663
330
785
391
28
1016
236
219
232
216
29
184
73
62
397
337
30
282
62
79
220
280
31
194
71
60
366
309
32
439
137
100
312
228
33
854
196
141
230
165
34
824
255.
265
309
322
(I)
Total
Crude Sum of Squares
Crude Sum of Products
0-:-45894
= 045894
( 5)
(6)
235
233
136
150
9511
8691
3166531
2505925
2727616
Let ai denote the cultivated area in the ith village and Xi and
Yi the areas under wheat for the years 1936 and 1937 respectively.
Then Pi the selection probability for the i-th village is given by
N
Pi = ai/A, where A
= 1: ai = 78019.
'=1
Also
Zi
and
18S
z,
l000N
A
z,
0.45894
l000N
A
0.45894
and
1I '
Vi
VI
Now
f I,
1: 1/
8691 = 0.9138
9511
=
=
(09138) (21288)
19453 acres
Rn X
= 3166531
\~
(/
.L.J,
_ R ,.iI ')2 = ~,
;.. I 2
2R ,,~,'
;, II'
+R
2 ;
n ~
= 165082
or
)2 L' (I, -
i: (Zi -
A
R.V)2 ..c= ( l000N
R.l,')2
= (045894)2 (165082)
= 34770
Hence
(170)2
. . s.e.
895600 acres 2
YR
y895600
946 acres
1'2
l
186
If the information for the previous year had not been used,
the estimate of the area under wheat in 1937 would have been
(0,45894)
N
n
19943 acres
and
(170)2
2
(34) (33) (0,45894) (2505925 - 2221573)
0
3 (0,21063) (284352)
1542700 acresl
or
S.E.
Y=
1242 acres
72,3%
REFERENCES
David, F. N. and
Neyman, J. (1938)
2, Cochran, W. G. (1940)
I.
3. Sukhatme, P. V. (1944) .
187
188
APPENDIX
(ii) E (i"i"'8)
where
and consequently
Let
N
1:
,G/a.
Np.G, a.
(1)
(2)
Also
NP.a+b, a.+p)
Y -
Np..Hte, a.'t~+')')
+ p..+e,
h')' fib,
189
!3 E [(
= n\
f ,) (f E,JJ
[e1NfL12
+ ea (- Np12 -
2NfLu)
+ ea (2Npl'l) ]
fLu
(N - n) (N - 2n) fLI2
(N _:-l)(N ="-2)' it2
(5)
It follows that
(N - n) (N - 2n) fL03
(N~ i) (N- 2) n2
(6)
Similarly
E (i,,,;1)
= E [(i:) (i"i,,'I)]
E(l"i,,'8)
= ;,
190
= n'
- Np.18)}
= ii'-
[(e1
7e.
+ 12e. -
+ 3 (ell
e, {6Nl'la- 3N2p.up.gJ ]
6eJ Nl'la
- 2ea
+ eJ N'JuP-OJ]
(7)
191
_ N - n [ (N2
N - 6nN 6n 2)
n3
(N - I) (N - 2) (N _ 3) 1-'13
3 (n - I) (N - n - 1)
+ (N -
I) (N - 2) (N - 3) Nl-'lll-'02
(8)
It follows that
E (En")
= ~4
6e,) NI-'04
+- 3 (1'2
or
2e a + e,) Ntl-'ozz]
(9)
P.04
3 (II -- I) (N - n -- I)
+ (N-I)(N-2)(N-3)
Nf'OI
.1
(10)
Lastly
14 E ["
.E, Ej 2'2
Ej
n
+.E
(2,E;E, '2 + , 2'2
EJ + 2, 2'"" 1
i=/-J
ft
+ 2E,EjE; ~/) +
'<FJ=/-k-F1
J,
[e1 Np.31
'1/ t'
i:
'-!<i+t
+ ,2/.' +2E1",,'".')J
+ e. (Nlf'sol'OI + 2N11-'1l' -
7Np.n)
12NJ.'n)
6NfsI) ]
192
N;;;
+ (N- 1) (N -- 2) (N -
3) N(fL20fL02+2fLl/)
(12)
CHAPTER V
where
and
E (jj2 I i) ,.'" constant, say, y
(2)
Y ==" Na
+ P 1: N,x,
(3)
1=1
When a and f3 are estimated from the sample and the population
total of x is known, the right-hand side of (3) provides an
estimate of the total. This estimate is known as the simple
regression estimate. To distinguish it from the estimate of the
population total based on the ratio and the simple mean methods,
it will be denoted by Yz, and the mean by )il'
13
194
Y,
L: n,"Ji",
(4)
j=1
L: n,", (a
~1
+ fix,)
= L: N, (a
~1
+ fix,)
L:
1_1
(n,", - N,) (a
+ (3x,) = 0
(5)
where
=y
(6)
Hence
(7)
195
</>
= 'Y :E
1.0
n/Il - p.
i=l
:E
(n,A, -- N,) (a
i=l
+ {3x,)
(8)
(L
(i
04>
()a
p.
:E
(njA, - N,)
co,
I, 2, .... k)
=0
(9)
(10)
,~[
and
( II)
+ {3x,)
a'
2')'
+ {l'x,
(12)
'Y
where
a' == p.a and
f3' ~__
p.f3
(13)
+ Wx ) = N
(14)
and
(1 S)
196
p' =
'-1
(16)
n, (x, - Xn)2
and
(17)
On substituting for fi' and a' from (16) and (17) in (12), we
obtain
(Xi - Xn)
1f (.
I 2
1= ,
, ... ,
(18)
(19)
+ NxNS
and
.E"
'-1
n, (x, -
.E neAl
'-I
(21)
x.>"
Yz, we substitute for
from (18)
197
V(Y) =
I
{I
(X N -Xn)2}
~
I1j (Xi -
(22)
Xn)
i=l
N,
E E
(-=1
()'jJ -
a -- f3X,)2
(24)
J=1
and
~
Ni
E E
f3 = '=1 k J0:..1
E N
)';J
(x. - XN )
(x, - XN)2
i-I
or
EX(X
-xN )
1: (x
- XN)2
198
= P - , say
u.
(26)
(27)
(28)
A'
'm1
fA - fJ (x,
-_ XA )}2
199
Consider then
(29)
and calculate its conditional expectation for fixed nb n2' ... nk.
This is best done by expressing Q as a function of the ~ 's, which
are defined by (1).
From (1) we have
(30)
whence
t
+ {3nx" + }; n,E .. ,
1.: n,ji", = na
i;:::l
'_1
i.e.,
-t BXn
(31)
Also
I.
p=
1.:
n, (x, -- Xn)2
1~1
1.:
n, (x, --
'=1
x.f
k
.in)
1.: n, (X,
,.,
E n,
+ '=1 k
- .i.)2
}; n, (X, _- Xn)2
,.,
f3
(32)
200
1=1 _;
~(
X:l.!. n.
(x.
~ _:.l ,,,}2
}; nt (x, - Xn)
1=1
(33)
we obtain
i==1
11/ n,
k
,.'::1
~~-----
II
- (x; x
xn) };
'=1
n, (x, I
x.) in,
.f n, n,
(X, -
nm2
X.)
}~
201
E ( . 2)
n,
and
we obtain
E (Q) = (n -- 2)
(35)
It follows that
Q
Est. 'Y = ( n - '"')
~
(36)
(37)
ma
Without loss of generality we may assume J.'t to be zeto.
Obviously, the expected value of mil/me can be determined
202
where
n (n -1) (n -2) . .. (n - j
N (N _:_. 1)(,v -. 2):-.. (N -- j
+ 1)
-f I)
4n _ N_l + nN
!_~
2
6_)
NI
(40)
203
4
--I-
., )
nN ' N?
9
n2
12) (41).
21 _ I _
nN
N
A"
Substituting in (39) from (40) and (41), and using the known
result that
E(ml')
112
(1n __ N1)
we have
where
To obtain the variance of the mean 11, we have only to divide the
expression on the right-hand side of (44) or (45) by Nt,
204
= .E n,.:\Jl..,
(46)
'-I
i.e.,
k
1: n,>', (a
+ fjx,) =
1: N, (a
+ (3x
i)
or
t
1: (n,\ - N,) (a
'-I
+ fJx.) =
(47)
'=1
(48)
205
where
(49)
and
cfo
i=l
On differentiating
zero, we obtain
cfo
(/ =,
P and
equating to
1,2, ... , k)
(51)
whence
(52)
k
/1. J) (nj,\j -
N j) = 0
(53)
j=l
and
(54)
Substituting for
a'W
Ai'S
+ {J'Wx", =
(55)
and
(56)
where
a' -
p.a.
2' ,..R'
(51)
206
and
(58)
JV_Y:N -
X.,)
(59)
1: w,x/ -
WX.,2
'''1
and
a'
NX .. (XN ~
Ws../
x..)
(60)
where
(61)
A,
= ~~I
n'n,
{I + XNs...- / . (XI-X..)}
(62)
(63)
where
_
Y.
1 \'
= W LJ.
WcY.,
(64)
'-1
and
(65)
207
0.-=
N2
W
[1
+ (XN
-X"y]
S",.2
(66)
i", = in'
S",.2
!~
n, (x, -
Xn)2
iEI
and
k
1: n.Y' i
(Xl -
x.)
i-I
k
1: n, (X, -- x~)2
i-I
and on substitution we find that (63) and (66) reduce to (19) and
(22) respectively.
An examination of the expression for Y wL in (63) shows that
a knowledge of the true weights Wi is not necessary for calculating
Y wi; numbers proportional to Wi are sufficient for the purpose.
The variance of Y wi, on the other hand, requires a knowledge
of the true weights Wi. This raises a practical difficulty, since
the true weights Wi are rarely known. Often, however, the
relationship between the variance of y for a given x, and X can be
guessed, and numbers proportional to Wi can be known. It is,
therefore, important to investigate the form which the variance
of Ywl takes for certain well-known relationships between V (y I x)
and x. We shall consider only the simplest situation where
V (y I x) is proportional to x.
Let
V(y I x) ="x
(67)
208
nj . 1!_,_~}
)'x,
N, - n,
(68)
where
w/
(69)
and
W=
W'
(70)
wi'
and the
209
Now
Eij
is
where
E
(Eli
I i) =
and
E (<:J I i) =
V (y I A)
(73)
where
yx; N. - n l
n.
N,-I
(75)
w/
Also, from (74),
k
,_
I: w, Y., =
i=1
I:
i~l
i~l
i;::J
w: + f3 I: w:x, + I:
IV/i ft
= a + f3x .. + E..
(76)
where
(77)
14
210
[t
[t w:
x"')
W: (Xi -
Yni]
1=1
(x, --
.'.J (. + fox,
,.,1]
W'Secr 2
I..
L:
={3+
W/ (Xi - XU')
(78)
ini
;=1
On substituting for Yij from (1), for Yw from (76) and for
(78) in (72), we obtain
p from
t t :i,' {a +
,,=1
{3x,
(X, -
ij -
a- {3xw -
".J (p + w!'
flO)
flO
211
'=1
><
_ 2
f- f'
WW
[k;
11','
(x, -- X.,)
i.,12
'~l
11';'
tI;
i=l
, lw
f~ ' " _.- (II'J1
II"
.~,
(\'
X-)
II'
l t=1
tt
f'
f..II'i'
WW
iCl
fli
1:
k
W'S",,2
12
,el
i.,}2
,"',
'=1
j~l
1
{~
- W'S:,2
W
.".,-""
Wi
'2-
.j
2(
- )2
Xi _. X.,.
1=1
(79)
212
W,'
and
we have
(80)
Now, let
(82)
and
(83)
and we have
21~
=y
nl
then
WI
= y ., Y.. =y"
k
nj
.E .E
1=1
(Yii - ji,,)2
n
k
"i
.E.E
=-~ i= 1
E St y =
Q
n-2
==
nstIJ / (I --r2)
11-2
where
214
V(Y..I ) =k n.
\'
"-'
(R9)
(=1 a~'
where
and
J!'(Y.., ) = _n_
V (Y, )
~,
Y "-'
(90)
n,
,=1 Uj.
Now, let
uj
= y + Il (an
(911
so that
and
_ Il (u/)
y
provided
I~.~() I <
+ {1l(a/}}2
y~
__ ... )
(92)
215
(93)
where C 2"..2 is the square of the coefficient of variation of
The average value of (90) is, therefore, approximated by
V (YIll , ) }
E { VCY,)
~ 1+
at
(94)
C 2"./
Example 5.1
Table 5.1 summarizes the data for a simple random sample of
64 villages drawn from the total of 319 villages referred to in
Example 4.2. Assuming that villages within a class are of the
same size, equal to the mean value per village in that class,
TABLE
5.1
",
Serial
No. of
Class
A rea of
illage
6373
16
256
15533
333
26895
24568
18
1810
281314
34440
16
1991
330113
.5
49156
13
1815
287079
76749
10
2352
605510
64
S317
Tolai
ni
(Xj)
L' J'ij
"i
1: .l' II 2
1531167
-~--------
216
and, II.
V(ylx)=yx
Method I
The relevant formulre for the regression estimate of the population total and its variance are given by (63) and (85). To
evaluate these we require the values of xw , jiw, swx2, Swy2 and the
estimate of f3. The calculations leading to these values are given
10 Table 5.2.
From this table we obtain
Total of col. 5
Tofal of col. 1
x'"
=
29424
Total of col. 9
ji", = TotaiOfcor-7
10272
k
.E w;'
(Xi - XID )2
W'
Total of col. 16
- -total of col. 7
7952
27297
29131
y-, w/
L...J n. .L: Y./ ft;
S..,2
W'
W'Y.,2
217
46464 - 28802
27297
17662
27297
= 64703
S",.
8044
=
k
E W,' Yn,
(x, - i .. )
leI
23146
(13731) (7.7297)
84793
13731
6175
k
E W/Y.,
(XI -
i,.)
231455
7952
2911
+ P(iN - i.H
= 319 [1027 + .2911 (367, 5 = 319 [1027 + 2911 (73'3)]
Y..1 = 319 [y ..
294 2)1
= 319 f1240J
= 39556
Est
..1
(73'W]
29131
218
101761'>( 511136 [1
+ '18441
101761 x 60'5389
';,-;:, S..
Y",I
V60'~54
100
..
-124
~~
77R
124
627
Method II
The relevant formulre are given by (19) and (R6) respectivt'ly.
We have
8317
64
129 95
24902
64
.'" 38909
"
2,'
ni
.E YIj
1=1
(Xi -~.
Xn)
.E
111 (Xi -
Xn)2
1=1
644382
c= 2455455
=, 02624
+ 02624 (367'5
=c 319 (124'28)
39645
Again
1531167 - 1080768
64
70375
- 389,1)1
119
1: n x,2 - nx. 2
'-I
24554547
64
=
3836648
and
k
x,,)
64438212
64
1006847
7037.5 _ (10068'47)2
3lS366'48
=
43952
V(Y) =
I
(I _.t (367'5338360.5
-389'09)2)
= 101761 x 7175
0/
/0
SE Y
. "
J()()Y717; = 6 81
124.3
220
5.2
nj
Ni
n, (N,-I)
NI-ni
w/x,
XI
wi'
Y"I
(I)
(2)
t3)
(4)
(5)
(6)
(7)
(8)
2
3
2
5
18
11
48
84
20
235
9
43
1494
66
22222
5 4651
226364
6373
15533
24568
03487
03518
09214
8'00
6660
10056
4
5
16
13
60
77
44
64
21 4545
154375
34440
06230
12444
49156
03141
13962
10
39
944
988
380
29
131034
76749
01707
235'20
"----~--~-~---~.-.
Total
64
--~--~-----~--.-.~
27297
"I
1) YH 2
(9)
(10)
~------------.----.------.----.
(10)
_,_.-. __ ._._----
(l1)w/
Xj-x ..
(9)' (I 3)
(XI-X .. )2
w/ (XI-X to )'
(12)
(13)
(14)
(15)
(16)
-230,51
- 6431
53135
1853
-13891 -325 47
- 4856 -44994
38887
5016
19732
865 35
19296
679
2358
2516
38935
217
157
1223
223966
3823
nj
(II)
256
128
1892
4'5
23430
26895
5379
3
4
5
92656
77526
281314
330113
15629
20632
14401
12854
43855
287079
22083
6936
40149
605510
60551
10336
Total
.. -.----.-
803191
W/Y"I
02790
-~~.-
319
Serial
No. of
Class
---
47325
46464
280406
190005
2314'55
7952
(l - p2)
-----
(95)
221
Comparing first the simple regression with the mean per sampling
unit estimate, we notice that the regression estimate is always
more accurate than the arithmetic mean estimate. Comparing
next the simple regression with the ratio estimate, we observe
that the former is more accurate than the latter if
i.e., if
i.e., if
(pO'" - RpfT.)2> 0
(98)
222
V(Y1) __
V (YR ) --
0',2(1 _p2)
UU 2 -
2RNpu,uz
+ RN2U/
44163
4416'6
~I
There is thus little to choose between the simple regression and the
ratio methods of estimation, since the regression of the number
of livestock on agricultural area is almost a straight line passing
through the origin.
5.9 Comparison of Simple Regression with Stratified Sampling
The regression method of estimation achieves the same purpose
as stratification by size of the sampling unit, namely, to eliminate
the effect of variation in the size of the sampling unit from the
standard error of the estimated character. A comparison of the
two methods is, therefore, of interest.
We have seen that the sampling variance of the estimated total
in stratified sampling is given by
. Sl
nj
aw'J.,
223
(100)
(I _p2) (I + I)
II
(101)
224
+ b (Xl{ -
y, = y"
x,,)
(102)
where
I,
I; nJ.i
b
(XI -
j'n)
leI
I; n, (X,
- Xn)2
'''1
Since Q' is a random samp~e of N', x'l ' provides the best unbiased
linear estimate of XR" Hence
Y". =y" + b
%' (X'll -
x,,)
(104)
tlk"
Yds
Q')
E (5'..)
Z'
E {b (X'll - x,,)}
We
(105)
= a + px"
(106)
and
(b)
=P
(107)
225
Also
I,
.E
fli
(Xi -
X n)~
(x, --
Xn)~
'1=1
}; 11,
1=1
Substituting .j'N ,
-,
+ 8Xi,
we get
(108)
(109)
226
-2
-_ E (v.)
N'2 {2.."
N2 E b (XQ
_ Xn)
- 2}
Now
(111)
Next
and
(Il2)
xSJ.
227
1.
n }; nj (X,
--i--
.._
Xn)
1 . . ___
tfW,
f' . ~
;=1
[{t +
x{t,n,
nj (a
{3X j ) }
1=1
(0
=
=
(a
HxJ (X,
.l)]
+ {3x.){3
(113)
EG.) E(b)
> E (X Q
+2
Z'
(a
+ {3x.) {3 E (x
Q' -
x.) -
h2
X.)2
(114)
228
Lastly,
E
UQ ,
.\',,)2
E (x~/ - XN'
+ XN' -
_(1Q' - N'I)S2
~IN'
Xn)2
+N2(_
N;~
XN -
-)2
X"
(115)
where
N'
S2~IN'
I
\'
N' _ 1 W
(Xi -
'eN' )"
i=l
Also
(116)
(117)
(118)
229
Also
y
i~ nj (Xi -- Xn)
}.~
~~
(120)
y
Q'n
Y [
E {V (Y,h)} ~ n
+ nI + Q'I ] + p2S,
Q'
(121)
E (Yd,)
(a
230
= E {yo -
Un -
+ b (x
Pin + (b -
a - piN
a -
Q'
+. - in)}Z
+ f3 (iQ'+n =
+ E {(b -
E {y" - a - Pi.}2
+ f32E (iQ'+n = ~
yE {
~XQ'"tn
- ,i,Y }
1: II, (x, - X )2
iN)}'
XN)2
II
i~l
(124)
If we
assum~
th~
(43),
( n1 -
1). 1
Q'+n
(125)
n - 3
~~
{I
+ Q'~-n
n~ 3} + J2~2n
(126)
231
y
r~~-'-'
1: n, (Xi
Xn)2
i=l
from (115)
Est.
(.~N -
.\',,)2
Z:2
{(-\'O' - -".)2 -
(~, ~ ~.)
S2,,0'}
fiCn - 2) {I
Q'
{i' -1- II
Q"----jII
-- l
n -- 3j
{.2
.1" -
n .- 2 j
(127)
232
+ .p2 v (V",)
We have easily
.pVO.... ,)=(I-.p) V(z)
(130)
Thus
v (z)
v (v;,)
~F v
(z)
(131 )
and
V (I'
)
. n,
S2
= -yn
2
where
Yn'h
233
- "Ill
X
Yn",.
h,.,
h 1
}; y (x - i.,)
,,'I,
1.,' (x - Xn'h)2
Sh 2 (I --
p2 h,h_I)'
we write
n'h
(135)
234
But
- )
V(Yi-I
+ "S2",
h_ 1
nh
2C OV Vh-I,
("
..,)
Xn'n
(136)
since
(137)
The validity of (137) follows from the well-known result that the
correlation between any unbiased estimate and an efficient estimate
tends to the square root of the ratio of their variances. The
approximation involved will affect only terms of order (ljn'h)3 in
the expression for V (Zh)' The result is, however, exact if units
are common only between two consecutive occasions, for, we
have in that case
COY (ji,.
l'
Xn'.)
=c COY [{(I -
Ph 1) =h 1 + Ph I.V."h-J, X,,'IJ
~=
+" .l'n"h-I
,~- 0
- fLl,--tn
since
Thus
(138 )
x {n'I _
.1.
'1'.-1
I} + P2
n.
.-1
i, la-I
S 2 P.-I
".
n .-1
(139)
235
Writing
p
'2
II, ~-l
I-p\,
_
2
=
P A, 1--1 -
--,--- -
n,.-
i-l
(140)
'l
-p
l'
+ P h, ;-1'f'''-1}
'2
Ii-I
nAn
./,
"'h
h-I
(141)
Ii
"'1
v (z)
( 1 -
V(v:,)+ V(z)
)2 V(z)_
1_ (
I
V(z)
__ )2 V(- )
)''''
V (V",) t- V (z)
V U'",) V (z)
VU'",) + v(i)
(142)
236
- (1 - p2)
!/J =
J(l -
p2) {I _ p2
+ 4;2 !l~:
n'
p2
we obtain
(145)
237
3.
4.
5.
6.
7.
8.
9.
10.
Neyman, J. (1934)
"On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling
and the Method of Purposive Selection," Jour.
Roy. Statist. Soc., 97, 558-606.
- - and D:lvid, F. N.
"Extension of the Markoff Theorem on Least
(1938)
Squares," Statist. Res. Mem., 2, 105-16.
Hasel, A. A. (1942)
"Estimation of Volume in Timber Stands by Strip
Sampling," Ann. Math. Statist., 13, 179-206.
Sukhatm~, P. V. (1944)
"Moments and Product Moments of MomentStatistics for Samples of the Finite and Infinite
Populations," Sankhya, 6, 363-82.
Cochran, W. G. (I 942} .. "Sampling Theory when the Sampling Units are
of Unequal Size3," Jour. Amer. StatiM. Assoc.,
37, 199-212.
Yates, F. (1949)
Sampling Methods for Censuses and Survl'Y.~,
Charles Griffin & Co., Ltd., London.
Patterson, H. D. (1950)
"Sampling on Successive Occasions with partial
Replacement of Units," Jour. Roy. Statist. S,le.,
Series B, ]2, 241-55.
Narain, R. D. (1953)
"On the Recurrence Formula in Sampling on
Successive Occasion~," Jour. Ind. Soc. Agr.
Statist., 5, No.1. In Press.
T:kkiwRI, B. D. (1951) ., "Theory of Successive Sampling," Unpublished
Thesis for Diploma, I.C.A.R., New Delhi.
Sukhatme, P. V. (1953) ., "The Variance of the Regression Estimate in Double
Sampling from Finite Populations," Metron, ]7,
Nos. 1-2. In Press.
CHAPTER VI
60.1
Cluster Sampling
239
. t.
to
the population,
I\,_
NWY..
(2)
'=1
Y..
240
and
the mean of cluster means in a simple random
sample of n clusters, given by
Yn.
_
Yn .
=
=
Clearly,
1 \' _
n W Yi,
(4)
In
the
V (5'",)
N - n S2
Nn
"
(5)
where
N
N -1
1:
(5'1. -YN,)2
(6)
1=1
S2
nM
(7)
(Y'i - YN.)2
(8)
where
N
N~ -1
1: ~
241
V (f'nM)
V (5'n.)
(9)
NM -nM
82
NM
nM
N-n
. 8 1,2
Nn
(to)
If we set up an analysis of variance for elements in the population, as shown in Table 6.1, this efficiency will be seen to be
equal to the ratio of the overall mean square between elements
to that between clusters in the population.
TABLE
6.1
Analysis of Variance
Source of Variation
Degrees
of
Freedom
Mean Square
L:
N
Between clusters
N-I
N~I
(_i'j.-YN.)2
~~ MSb"
i=l
Within clusters
N (M--J)
Total population
NM-J
i=l
;=1
Example 6.1
To show how efficiency changes with the size of a cluster, we
give a numerical example from data relating to the use of
clusters of different sizes in estimating the area under wheat.
Table 6.2 gives values of the mean square between survey
numbers in a village (8 2) and the mean square between clusters
(M8 b2 ) pooled over 11 villages in the Meerut District (India).
1tl
242
6.2
..
138'6
1083
0 . 7t!
1807
245 1
1083
1083
060
044
16
_._-_
333'9
1083
0'32
It will be seen that the efficiency decreases rather rapidly with the
increase in the size of the cluster, clusters of 2 being only about fourfifths as efficient as individual survey numbers, those of 4 about threefifths, while tho~e of 16 are only one-third as efficient as individual
survey numbers. In other words, the sample of clusters of size 16
will have to be three times as large as the sample of individual
survey numbers in order to give an estimate of equal precision.
If clusters are random samples of M from a population of N M
elements, and consequently composed of elements which are not
more alike than those of other clusters, then the mean squares
between and within clusters will behave as random variables and
their expected values will each be of the same order.
For,
E (Mean square between clusters)
243
N
=
~ I {t (
jN'
N':,~ M . ~)
,.-
N. - N. .,}
S2
(II)
Similarly
E (Mean square within clusters)
I
{NM
N (M -- I)
(.J'N l +- NMNM-- I
-- MN ( YN l
-/-
s~)
Sl)}
NM- M
NM
. M
(12)
6a.3
244
p =
(13)
= E {(Y,J
+ Yl.
+ (Y,k -
+ (Yi)
+ Yi. -
YN.)}
+ Ui. -
.VN. P}
(14)
since the expected value of the two middle terms is clearly zero.
To evaluate the first term of (14), we firs, work out the expectation
for a given i. We have
E {(YII - y,.) (Ylk - h) I i}
(~ -
1)
L: ?,iJ -
/,pk=l
( 15)
i=l
(16)
245
where
L 8/
N
8,.2
i= 1
The values of the second term in (14) and that of the denominator
in (13) are known, by definition, to be (N - 1) Sbz/ Nand
(NM - 1) S2jNM respectively. We thus have
N - 18
~",2
M
2 I,
( 17)
NA1_~ I 8 2
NM
(18)
= - NM-l
Now, by definition, S2, Sb2 , and Sw2 are related to each other by
the identity
(19)
NM - I
8 f
M (N _ I) . M (I
+ (M -
I) p
(20)
Sw=
NM-I
il/M
2(1
-p
(21)
N - n
= --11/ - . nb
= ~N-n
. J-'(:v::A) .:~ 11 + (M
-1)
p}
(22)
246
= !ifM =-1 . {I
+ (M -
I) p}
(23)
For N large, the formulre (17) and (19) to (23) can be approximated by the simpler expression s
8
Sw 2
M
82
(24)
(25)
2
8b2~ 8
M
Sw 2
1 +(M-I)PS)
~ 8 2 (I - p)
VI,,)
\Y..
::::
8 '1
N -n
N
. 11M
(26)
(27)
(28)
and
E
1
..- --~------1 + (M -- I) p
(29)
NM - 1 8 2
NM
cr,
(32)
247
0-2
(1 -- p)
(36)
and
-;
__ N_Il.0- 2 f
}
V(}".)--N_I nMl1+(M--I)p
(37)
248
6.3
16
028
022
0,)8
014
028
066
126
210
(M-l)p
..
6.4
-.~--~------.-
DescripFreProportion}; (Yij--!)2
Class lion of
quency
of Males !.::i=~l~_~
Household
PI
rl.
4
,E (1'11-y;')2
1=1
2 male
children
fir
1 male
child
-!
-!
No male
children
llr
"
n-
------~-~~"-~-~.---~-.--
Total population
-!
..-..
7
..J~
1H
-~---.--
..-.
table
Yij
249
Yij =
= 0, if it is a female.
=[p,
i=J
(38)
Also
7
32
(39)
and
S,,2
~=
11,,2
L" Pi 0\ - p2
,=1
I
32
(40)
4
The values of
1 .E
(Yij -
j=1
j=1
the different classes, and those for the whole population representing a 2, uw 2 and ab 2 are also shown in Table 6.4. On substituting in (33), we obtain
I
2SO
Source of Variation
Degrees
of
Freedom
Between clusters
n -1
Within clusters
n (M-I)
Mean Square
tI
\' \'(,
~-(M~.-I)W
;
nM-1
_ )2_-2
-- s.,
,lil-Yi.
j-1
"
Total sample
;M1_IL
--_.-_-----------._-_.
(_1"I-Y.,)2=S2
__._----- -
1-1
251
Hence substituting from (41) for Est. (S2) and writing .\'b 2 for Sb 2
In (10), we obtain
Est, (Relative Efficiency)
(N--l)Msb2+N(M-l).~}
- . (NM ~ 1) M S.2
,~,
1 + (M - 1)'
.~,/
MSb2
(42)
for large N.
Example 6.2
~ Sb
M-l_2
+ '--ijS",
25!'4
1301
+~
(112.8)
252
Hence
Est. (Relative Efficiency)
1301
251="4
= 0'52
TABLE
Analysis
6.6
0/ Variance 0/ Area
under Wheat
Degrees
of
Freedom
Source of Variation
Between villages
Between clusters within villages ..
Mean Square
10
2901
33
2514 =
..
308
Total
..
351
MSb2
112'8 = SUI'
253
S~ 2
S2
(43)
MD
Table 6.7 shows the values of the mean squares between plots
within fields for plots of five different sizes, viz., equilateral
triangles of sides (a) 33', (b) 25', (c) IS', (d) 10' and (e) 5' each.
The data relate to the crop-cutting survey on wheat conducted
in Kangra District (India) during the year 1945-46. Altogether
76 fields were selected for the survey and in each, 10 plots, two
of each of the above sizes, were marked at random.
6.7
Yield Survey on Wheat, 1945-46 (Kangra)
TABLE
Size of Plot
M
(Sq.ft.)
Observed
Fitted
4715
051
056
2706
083
0'75
974
121
126
433
214
191
108
363
388
~-.---
..
----~.-.-
= log S2 -
9 log M
= 1117 -
0'51110g M
254
_!:
(44)
N(M - J)
M
52 (1 - M-P)
M - 1
(45)
where S2 will now represent the total mean square in the infinite
population of which the finite population is itself a sample.
Equation (45) also shows that if we regard the popUlation itself
as a single cluster and M is consequently very large, the within
cluster variance Sw 2 will approach S2 as expected.
= M (~=- i)
{(NM - I) 52
- N (M - I)
52
(NM
M (N -I) (Mu -
}
1
M~-I
(I - M-,,) 52}
(46)
255
(b > 0)
(47)
(48)
The constants S2, a and b are evaluated from the data. For
this purpose, we require: (1) an estimate of the mean square
among elements in the population, and (2) an estimate of the
mean squares between elements within clusters for at least two
values of M. If we regard the total population as a single cluster
containing N M elements so that
S2
= a (NM)b
then we have
I) a _..(NM)b
S.2 -_ (NM
.-_.- - - ----_-----_- ~ -N (M -1) aAfb
M(N -1)
(49)
256
Fit Jessen's law to the within cluster mean square values for
clusters of sizes 2, 4, 8 and 16 survey numbers and the one
formed by all the survey numbers in the village, given in Table
6.8.
TABLE 6.8
Values of Sw 2 in Clusters of Different Sizes (Acres)2
M
Observed Values
Fitted Values
78 10
8153
84'28
8425
88'92
87'05
16
9350
8995
NM= 1176
10833
11022
+ 00473 log M
6a.6
257
+cd
2
+ c nl
(50)
N -n .
~b:
n
(51)
258
(52)
259
260
()v
'On
(54)
we get
(55)
= '::-:_CB_{r:IIIl_i~lfc01)~
2c1M
(59)
261
oV
oM
(60)
()V
-V 5M
is independent of n. Equation (60) can, therefore, be solved
directly for M. An explicit expression for M is, however, difficult
to obtain and the solution has, therefore, to be obtained by trial
and error method. On suhstituting the value of M so obtained
back in (59), we obtain the optimum value of n.
Since V decreases as M increases, we may expect the left-hand
side of (60) to be approximately constant whatever the value of
M. An examination of (60) also shows that the left-hand side is
independent of the cost factors while the right-hand side involves
M only in combination with the cost factors. It follows therefore
that M will respond to the variation in Cl , C2 and C in such
a way that cl CM/C2 2 is approximately constant. It follows that
M will be smaller if (1) c) increases. i.e., the cost of enumerating
an element increases; (2) C2 decreases, i.e., travel becomes cheaper;
and (3) C is large, i.e., the amount of money available for the
survey is large. The algebraic solution thus confirms the calculations deduced from the actual data reproduced in Tables 6.9-
6.11.
262
6.9
Numbers of Sampling Units which can be Covered, given
Several Cost Situations, Two Expenditure Levels,
and Seven Different Sampling Units
TABLE
Sampling Unit
Mileage at 2e/Mile
No. of
Length of Farm Visit
Farms/
Sampling --120
15
Unit
60
Min.
Min.
Min.
Mileage at 5e /Mile
---~---~~----~-------~--
--------- -
--
.,_
15
Min.
60
Min.
120
Min.
A.
Individual farm
1000
1644
650
371
1088
517
315
Quarter section
0914
1745
699
401
1140
551
339
Half section
1 828
1073
392
218
764
336
192
Section
3656
624
213
116
475
186
105
Two sections
7'312
347
113
60
278
102
56
Four sections
14624
187
59
31
156
54
29
131'616
21
17
Thirty-six sections
B.
1452
803
2886
1223
712
0914
4293
1569
871
3057
1314
769
Half section
1828
2494
852
462
1900
744
421
Section
3656
1388
451
241
1112
407
225
Two sections
7312
749
235
124
623
217
118
Four sections
14624
396
121
63
338
113
61
131616
44
14
38
I3
Individual farm
1000
Quarter section
Thirty-six sections
-.--.------~
263
6.10
_------_--
Items
Individual
S.
Farm
--------
S.
2S
4S
36S
I.
Number of swine
..
.,
267
282
274
290
336
411
999
2.
Number of horses .,
..
J 83
J 93
187
198
227
2'80
687
3.
Number of sheep
961
976
g'80
316
774
744
744
161
170
166
178
2'07
257
634
3 17
321
290
269
255
2'45
245
255
267
255
265
298
362
8(>6
7.
..
..
198
207
200
209
237
288
679
8.
234
245
232
239
264
3 15
717
9.
.,
299
311
293
297
324
379
855
..
154
163
I 57
164
187
228
558
II.
..
195
206
198
208
237
287
688
236
259
266
3 05
378
491
1276
82
90
94
109
136
17B
473
84
88
84
86
96
115
271
.,
4. Number of chickens
5.
6. Number of cattle
..
Corn yield
6n
7'06
760
914
117R
1571
4307
16.
396
436
451
521
648
R46
2236
..
316
349
364
423
529
693
1839
354
382
384
426
S'D
657
16l\2
264
6.11
Summary of Sampling Unit Efficiencies
TABLE
Expenditure, Mileage
Rate and Questionnaire
Length
--~--------------.-~-.--
Sampling Unit
84
4S
36S
I
2
I
2
S2
2S
--~-~~-
T. 2",/15 min.
II.
2",/60 min.
1938
1939
6
61
1938
1939
13
14
10
~
III.
2",/120 min.
1938
1939
16
16
IV.
5",/15 min.
1938
1939
I
4
12!
9
1938
1939
7!
10
1938
1939
II!
12
V.
5",/60 min.
..
I
2!
3
~q
4!
4
Expenditure of $2000
1938
1939
..
I
2
I
2
I
2
1938
1939
16
15
1938
1939
16
16
1938
1939
5
6
1938
1939
12t
12
3t
4
1938
1939
12!
14
3t
2
I
2
I
2
II
8
..
I
I
265
B. UNEQUAL CLUSTERS
M;
\'
(61)
= Aft LJ Yo
1=1
M.
J; J;'Ylj
}' .. =
;~I
i=l
(62)
EM,
Several estimates of the popUlation value of the mean per
element can be formed from a random sample of n clusters.
We shall first consider the simplest, viz., the simple arithmetic
mean of the cluster means given by
1["_.
v --- n
n.
(63)
}i.
It is easy to see that this estimate will not give an un biased estimate
of the population value, for
1L
n
E (Y . )
E (j.,)
1 \' ~
= N LJh
It=t
(64)
266
YN.
~'I:h
j=l
+ (bias)2
= 1!_=_n
. Sbn
N
+ (y
_y- )2
N...
(66)
= N~-l
1:.=1 (h -
YN.)2
f'M-'.)i.
nM L..J
nk L
n
E (jin.')
E (M'Yj.)
(67)
267
1 . -n
nM N
Mih
i==1
= y"
(68)
v (Yo,')
where
So"
1 S '2
n b
N~ I
t (Mt' - ~ t Mt)'
N~ 1
t (Mti' _y,.)2
(69)
(70)
t=1
(71)
N-nS"1
Nn
b
(72)
268
where
N
1
\ ' M,2 (,'
N - 1 LJ M2 Vi.
-)2
Y ..
(73)
i=l
(i
variate z by
=
Zjj
M;J'_iI
MoP;
(74)
whence
z_ = }; P;Zi.
;=1
=5' ..
(75)
269
It follows that
=z .. =Y ..
E(znJ
(76)
where
u ,I: 2
= U bon
(77)
= .E
Pi (Zi. - Z.. )2
(78)
i=1
(79)
where Sbz 2 is the mean square between cluster h'S in the sample,
defined by
n- 1
V(zft. )
= U_l(_
n
(80)
(81)
where
(82)
270
and
E st. V(z-n.)
= S,,2
n
(83)
where
(84)
{Cv
ij -
(85)
By definition
E (5'i. -
Y.. )2
rJ}
(86)
Further, let
E {(Y'i - Yi. )2 Ii} =
rJ j
(87)
and
(88)
M;
.} 2
M,
== 0 _ J", (y .. - y-. )2
'J
j;::;'1
t.
271
Hence
(89)
2
C1W ,
we have
(91)
\'
NLJM;_:_1
1=1
We therefore
(92)
(93)
Also
(94)
Hence, eliminating
Ma~2
CTW 2
= a2 {I + (M -
I) p}
(95)
272
(96)
(97)
V (j n M) = nM
-
6b.4
(M _ 1)
p
(98)
273
Yn.
V (i ) =
n.
1 \'~. (-. _ - . )2
nN
(99)
fi.
M.S.E.
Vn.) : : : : n~
(y,. - YN.)2
+ UN. - Y.. )2
(100)
1)
Vi. - 5' .. )2
i=l
:E (ji i.
.l'N. )2
+ N U'N. -
.V .. )2
1=1
M.S.E.
Vn,)
~ n~
(f';.- Y.. )2
= - n~
L (~i - 1)
(Y . -
Y.. )1
"'I
+ (1
- D(h. -
Y.. )2
(102)
274
n~
L (~i - 1)
(Vi. -
Y.Y
1c1
= - nNW
J \' (~;
M
Ni
1) W}
\' (J). -
-V.. )2
(103)
1: Ni = N.
i=l
E (Vj.
I Mi) = Y.. ,
Clearly,
N'
1 \ ' {:;,
-)2
N, W Vj. -Y ..
1=1
J \'
nN
W if
= - n~
Vi.
ji
)2
'"
L (Z' - 1)
Ni
(Yi.1 M i )
(J04)
.=1
275
and
III.
V 0\ 1M,)
Case J.-
VO\! M,) =
(1 -!) (YN. -
5'..)2
(105)
Case II.-
1W
\' (M,
nN
if - 1) (-Yi. -
-)2
Y ..
i=l
i=l
(106)
276
which is again positive, so that for this Clse also we can expect
1) 0\ - _V .. )2 ,~ n~~2 ~ eMi
1=1
so that for this case also we would expect zn. to be more efficient
than Yn ..
These results are in fact obvious from an examination of the
first term in (102). This term, ignoring the sign, represents the
covariance between Mi and (h - Y.. )2. Now, ordinarily (h-Y .. )2
will decrease as Mi increases so that the covariance will be
negative and consequently the expression on the right-hand side
of (102) will be positive.
Yn./I
The mean square error of Yn./I when the finite multiplier is
( - /I)
MSE
. . . Yn .
1 ,\,Mj
,..,_,
nN W
2,.,
-)2
M2 \,n - Y ..
(108)
j~l
= nNwM
_! '\' ~i
(.Y -
Y' )2 (~'
M
j. . .
1)
(109)
'=1
277
Case I.-Let
V(j\ I M) =y
L Zi (Z - 1)
N
M.S.E.
CY ...")-
M.S.E. (z... )
:N
t:=l
(110)
It follows that
zn.
Case II.-Let
V(Yi.1 M i)
Case IlI.-Let
VCY . I M i ) =
};2.
n:Sf2
(M, - Sf)
(~)
i=l
iiJM3
t,..,
~ - nJM'
(Mi -
M)
(1 + ~~-8_rl
l:
j .. ,
(M, - Sf)!
(111)
278
Thus for this case, in contrast to the previous two, the estimate
is expected to be less efficient than the ratio estimate in
simple random sampling.
zn.
Example 6.5
Table 6. 12 gives the number of villages and the area under
wheat in each of 89 administrative areas* in Hapur Subdivision
of Meerut District (India), and Table 6.13 gives the analysis of
variance on a village basis. It is required to estimate the total
area under wheat in the subdivision using an administrative
circle as the unit of sampling. We shall assume that a sample of
20 circles is to be selected. Calculate the sampling variance of
the estimate of the total area under wheat for each of the
following procedures of sampling and estimation:
(a) equal probability, mean of the cluster means estimate,
= Mo2
{fI!_N~--n . N ~-1
(Vi. - YN.)2
+ (h. -
Y ..
)2}
1=1
= (299)2 {-~
. 1 (6499209)
89x20 88
+ (387'35-328'02)2}
+ 3511'75)
89401 (28629
5699 X ] 0 5 acres 2
279
6.12
Number of
Villages
(i)
(M,)
2
3
5
4
Area under
Wheat
(Acres)
(Miyd
Circle
No.
(i)
Number of
Villages
(M i )
Area under
Wheat
(Acres)
(MS i .)
- - - - - - - - - - - - - - - - - - _.. _-1562
29
2
583
1003
30
4
1150
1691
31
3
670
271
32
499
5
6
458
33
736
714
1081
1224
996
34
35
36
9
10
475
37
7
3
34
11
12
13
14
15
389
2675
868
1412
1027
1393
692
38
39
2
2
40
41
642
524
42
2050
2530
247
421
445
706
602
43
16
17
1522
2087
44
45
18
19
8
2
2474
46
687
461
20
21
846
1036
47
48
22
23
24
25
26
27
28
948
1412
438
941
710
387
3516
2002
3622
--_
3
5
2
3
49
50
51
10
5
977
52
53
54
9
2
2
814
319
55
56
3
8
2111
1400
1584
830
167
-----------.---.~---"-.----------.,---
280
----
Circle
No.
6.12-Contd.
- ._----------------
"~"--"----
Number of
Villages
(i)
(M i )
Area under
Wheat
(Acres)
57
58
59
60
273
2
2
781
622
591
2
2
63
64
65
79
928
1141
1633
902
1286
1299
69
70
71
72
73
2
7
74
.
1604
1621
2
6
1764
2668
1076
4
4
1224
1490
299
98078
87
88
348
89
574
852
51
82
. 1947
741
669
1187
1265
1423
794
83
84
85
86
(Mjy!')
--.---~
80
81
1208
67
68
---_,_,,_",-----_.
77
78
601
66
(M I )
---------.-...
75
4
76
1101
799
Area under
Wheat
(Acre;)
Number of
Villages
(i)
(M;y,. )
- ------,_-,
61
62
Circle
No.
Total
2554
-- -
~---.---.
TABLE
-------------------_-
6.13
Degrees of Freedom
----------,.
Sum of Squares
Mean Square
_. ---"'--- --'_'.. - -
88
10924581
124143
210
9588011
45657
Total population
298
20512592
68834
Between circles
281
It will be noticed that the bias exceeds the standard error proper
owing to the large variation in Mi. The method must, therefore,
be rejected from further consideration.
(b) Equal Probability, Mean 0/ the Cluster Totals Estimate
89
I~~O
. 513613
1577 x 10 5 acres 2
(c) Equal Probability, Ratio Evtimate
69x89
... _.... 342043
20
10
+ 203. (34'074)
'360)2 x 5
1050 ( I
11 07 x lOS acresl
282
V (M - )
oY...
= M 02
\"!!II
(- _ '"
ji )2
L.J
Mo Yi.
1=1
_ (299)2. 1 . (36537)
20
= 1633 x 106 acres!
NM-nM
liXY'-
nM
Mo -1 [
(Yu-Y .. )2
;=1
~99_x 6~ (68834)
20
Sampling Method
Method of Estimation
Relative
Efficiency
Equal probability
(b) Circle
Equal probability
(c) Circle
Equal probability
Probability proportional to size
Equal probability
(d) Circle
(e) Village
Mean of cluster
means
Mean of cluster
totals
Ratio
Mean of cluster
means
Mean per village
12
45
64
43
100
283
6.14
----
360()<3800
<3600
<3400
<3200
<3000
<2800
~
<2600
~
c::
<2400
<2200
~
~
....
<2000
<1800
<::s
<1600
"'=
<1400
<1200
"....
<1000
< 800
2
3
2
<600
200< 400
0< 200
JO
The relationship between the two, i.e., Mih and Mi, is seen to be
approximately linear, the value of the coefficient of correlation
being 064. The variability among areas under wheat in circles
of the same size is seen to be rather independent of the size, and
explains the relative superiority of the ratio method (with equal
probability) over the simple arithmetic mean estimate when the
clusters are selected with probability proportional to size. The
284
2.
3.
4.
5.
6.
7.
8.
9.
CHAPTER VII
SUB-SAMPLING
7.1
Introduction
286
= the value of the j-th second-stage unit in the i-th firststage unit (j = 1,2, ... , M; i = 1,2, ... , N)
j\
and
N
iM 1: 1:
Y..
Yij
Further, let
Ylm
f' Yli
W
I
1 \' _
nw
Yim
"
..
selected sample, the sample mean Ynm provides the best unbiased
estimate of the population mean Y... In the next section we
shall derive the expected value and the sampling variance of
this estimate but the proof that it is the best linear estimate is
left to the reader.
SUB~SAMPLING
287
Expected
ElY...)
~ E {!
~E
- E
I>.J
{! t Ii)}
{! ty<}
E
(v,.
= YR.
Since the first-stage units are equal,
YR. =Y..
Hence
(1)
thus showing that the simple mean of all elements in the sample
gives an unbiased estimate of the population mean.
By definition, the variance of the sample mean is given by
V (5'"..)
E {ji" - E (y ...)}Z
= E(y".. -YN.)S
(2)
E(Y...
YB.)}
(3)
288
1 \' _
Yi.
nw
Now
Y" .. - Y., =
! L"
so that
E (Yom - Y.Y
(YII. -,Y,,)
=!2 [
E
(Yim - Yi,)
=!2
i'}J' (4)
The value of the second term under the summation sign is clearly
zero since sub-samples are drawn independently from the i-th
and i' -th first-stage units and the value of the first term under
the summation sign is given by the well-known result
E{(Yim-Yi.)21 i} =
(~ - ~)
Si2
(5)
where
whence we obtain
E
'u
y-)2 = 1 E f' (J__ l)S2
Vnm-",
n2
W m
M
j
(6)
289
SUB-SAMPLING
where
(7)
(8)
where
For,
~ E [(y, - 'N) x ~
=
t.
S',.- 1..)
Iii]
E [(jill. - h.) x 0]
(9)
=0
(10)
290
Vnm
111,
(!n _ !_)
s + nm
S..
N
2
(12)
= ~b2
+ nm
~",2
n
(13)
291
SUB-SAMPLING
and Si2 denote the mean square between second-stage units drawn
from the i-th first-stage unit defined by
S2
,
(15)
I)
Sb
whence
(n - I) E (Sb 2) = E
(t Y; ..
2) -
nE (y"",2)
(16)
(i Yi",2)
= E
{i' E (Yim
~ E[ t
=
[t
Ii)}
{y" + (~ Y.2 + N
~) S,'1]
(~ -it)
SW
(17)
'''1
The value of the second term in (16) can be directly obtained from
(10). For, by definition,
V(Y,.",)
= ECY ....2)
-YN. 2
whence
nEcy"",2)
(1 - J) Sb
+ (~
-1) S..
+ nYN,1
(18)
(N - 1) Sb 2
= .E Yi,2 - NVN. 2
1"1
we obtain
(19)
292
whence
(20)
We thus have
i
=
(m~- 1) s?
n (m - 1)
(21)
and
(22)
(!-n - N}!) s
+ N_!
(23)
293
SUB-SAMPLING
When (N _- n)/N can be taken as unity, (21) and (22) will still
hold, giving on substitution in (II)
Est. V 0\,,) =
(24)
So
Est. 8,,2
= S/,2
- 2
Sir
(25)
and
Est. V U'"",)
-=
C- ~)
S/,2
+ ~m
:V",2
(26)
Est. V (5'n",)
(27)
n
Mean square between first-stage units in the
analysis of variance of the sample
nm
294
C=cnm
= (8 2 _ Sw2)
"
/0
1 _ 8,,2
n
N
~~-"':
Co
(30)
Co
c
-~-
(31)
Co
= eM
SUB-SAMPLING
295
n =
+(1_1)8
m
M
IC
(34)
(35)
= c1n + c2nm
(36)
296
SW2/M >
or, approximately by
(39)
where
m=M
SUB-SAMPLING
and
297
If Co is less than
and
C1
c~M,
iz is 1.
298
7.1
--
-.-.----------.-----~~.-----
-----
__
Wheat
1962
4838
49914
0884
Gram
552
1212
7002
1476
Maize
556
603
6982
0795
Sugarcane ..
526
1106
11277
0893
-----...
----~~~_
. ..
----- ..
------------.-~---~--
Example 7.1
A yield survey on paddy was carried out in West Godavari
District (India) in 1946-47. Five villages were selected in each
of the seven strata into 4which the district is divided, three fields
were harvested in each village and one plot of 1/100 acre was
harvested in each field. The data are reproduced in Table 7.2.
Obtain pooled values of Sb 2 and sw 2 for the district and the
estimates of Sb 2 and Sw 2 Finite multipliers at the sub-sampling
stage may be ignored.
Calculate the sampling variance of the estimate of the district
mean yield and the percentage standard error.
Assuming that the sample of villages is to be allocated in
proportion to the numbers of villages in the several strata
and that the cost in rupees of the survey is represented by
C
7n
+ 2nm
SUB-SAMPLING
299
Pooled
S 2
(n, - 1)
1=1_
S'I,2
n- k
4 X (46655,2)
28
66650
k
.E n, (m - 1) s,,,,'
n(m - 1)
66979'5
= 95685
On substituting in (25), we obtain
Est. Sb 2
66650 -
95685
- 3
= 34755
and
Est.
8102 =
95685
where Pt = Nt! N.
300
7.2
Yield Survey on Paddy, West Godavari District (India), 1946-47
TABLE
N,
N
n,
..
-----~--
88
~~-
..-.--.."--- ..
~--.-.~~----~--
109863
0012070
3475
14521
27915
2978
19370
274225
177278
0031427
148564
0022071
142
119
2011
71072
1864 0
90
4389
9603'9
118240
112360
0012625
114
2829
207020
13628 2
142322
0020256
102
3019
25108
20075
127341
0016216
146
1867
33422
7441'8
182272
0033223
Total
801
35
466552
669795
0147888
Hence
Est. V(Ynm)
=- 34755
{O'147888
5-
0001248
J + 95685
3
x 0029578
= 19280
whence
S.E. (Ynm)
= 1389 oz./plot
But
k
.E p,Y.,m
'=1
Y nm
282'90 oz./pJot
whence
% S.E.
== 49
SUB-SAMPLING
301
whence
+ 8",2
m
1483
Putting m = 1, 2, 3, 4 and 5 successively, we obtain the corresponding values of n. Substituting these in the equation for cost,
we get the corresponding values of cost. The relevant calculations
are given in Table 7.3.
TABLE
7.3
(I)
n =
148.3
(2)
(3)
(4)
(5)
9568'5
130440
88
792
47842
82597
56
616
3189'5
6665'0
45
585
23921
58676
40
600
19137
53892
36
612
302
further sub-sampling.
The variance of the mean of a simple random sample of nm
elements selected by procedure (i) is given by
(41)
To examine how this compares with the variance of a twostage sample, it is convenient to express the latter in terms of the
intra-class correlation between elements of the first-stage units.
Substituting for Sb2 and Sw2 from (20) and (21) of the previous
chapter in (10), we obtain
V(Y"m)rlOo . tao. =
li~kf! .!~
+
= NM-l
NM
~-~----
sa
nm
[(1 - Z) (l -p)
Z=~1 _ij.
{I
+ (M -
1) p}
[1 _M(N-I)
!!!..(n -__!1_
+p
{N -:_n T1J (M - 1)
_ M
;m}]
N-l
(42)
303
SUB-SAMPLING
8
nm
'"
-n
+ P (NN -=-1
m - 1)]
(43)
(z- ~-7
(44)
m - 1)
1) 8
M _
( nm
N
(45)
1(Mm _ 1) (8 2_ 1M S 2)
pSI
(47)
! (Z -1)
(46)
II}
showing that the smaller the sub-sampling rate mjM, the larger
will be the reduction in variance of a two-stage sample over a
cluster sample. When Sb2 - Sw 2 jM < 0, sub-sampling will lead
to loss of precision.
7.7 Effect of Change in Size of First-Stage Units on the Variance
We have seen in Section 7.6 that the variance of the mean of
a two-stage sample consisting of n first-stage units with m secondstage units from each can be expressed as
304
V(Y"m)
!V~MJ ~;
[1 - ~~~~R
+ Pl {Z~=l ~ (M -
1) -
~M_m}
- nC
+ pz { Nii=
C
MC (MC - 1) -
MC MC
m} J
NM - 1 S2
NM
nm
[m (n - I) (C -
Q~
M (N - 1) (N - C)
+ OVll - ~Pz ]
where
al
Z=~ Z
az
N-nC m
N--=-C MC (MC - 1) -
(M - 1) -
-i-t-m
MC-m
--Me
Since
_
al
= Tn
M
{(c:_-(N1)-(n 1)- (N
1) SNM - C)
and
m (n - 1) (C - 1)
M(N=-l) (N - C) ~ 0
we conclude that
I)} ~ 0
-
SUB-SAMPLING
305
Let
N
first-stageJ
units
P
in each of NM second-stage
units in the population
and n, m and p the corresponding values in the sample
Further, let
Yill;
YII.
20
==; L
Yilk
306
y... =
LL I:
I
NMP
'''1
/-1
Yljk
k-1
and
Yli(~),
ji (nmp)
}'I(mp),
or, simply
I\,_
= N
LJ )" ..
;=1
y_.
(49)
SUB-SAMPLING
307
v (.v.n",)
E (v""", - .I', .. )2
E ()'""", - J....
+ .I'" .. -
1
[{ n 2:" C)'
2
-I II
imp
_
2:
C
"
{
_ ....
I'
)
I' i,n" -
+ (Y"- .. -
Vf .. )
J2 -t C "..
. -~.
.V... )2
(. VfI..
- J' .. , )
2:" (.)'"",,- J
+ 11 fL
_,2
Y )"
F .. )2
Yi .. )
If - -
(y" .. - y ... )
J
[ f' l1(-Yil/,,' -- Ye,-)"- I .1]
n 2 EWE
(50)
308
sampling from the i-th and if -th first-stage units is carried out
independently,
V
(Y.~) ~ !, E
t {( ~ -k)
s; +
G- ~)
+ E (Y .. - .v..J2
+(1n __N1)S2
b
where
M
M -1
(j'ij -
y,.. )2
j=1
and
N
N~
8 b2 =
(y... - }.... )2
1=1
Sw 2 =
1L 8,2
and
'''' 1
we have finally
V(Ynmp)
(_!_p _ J)
Spa
P nm
(51)
SUB-SAMPLING
309
) _ Sb
Ynmp
2 +Sw 2
nm
+ nmp
Sp2
(52)
Sij2,
S,2
giving us
E~=~
G- t) S,2
= soot +
G- ~) 8,,2
E (S,2) =
8,2
whence
E ($..2)
(54)
310
APPLICATIO~S
Finally
E {(n - I) S,,2} = E { i; (.vim" - jinmp)2 }
_. { .I.;..
-, 2
+ (m
I -
__ n S(1
_ N1) S/, + ( m1_ MI) 8n
l n
+ (1p _ P1) nm
8,,2
+ .I-,... Jl
2
= (n
1) { Sb 2
+ (~ - ~) 8.,2
+ (1 __~) 8,,2l
pPm j
or
Substituting from (53) and (54) in (55), and from (53) in (54),
we have
zo; 11 - S 2
Est."
.. - .
(!p _ P1) ,p
p
(56)
311
SUB-SAMPLING
and
Est. Sb 2
=Sb
(~
whence
Est. V (_V"IIII,) =
~)
G- ~)
S,,2
G- t) '~~
(57)
+ (~ -- ~) :\~ +
G- t) ~~
S,.2 -
(58)
When the other finite multipliers are also ignored, equations (54)
to (57) reduce to
- 2)
E (SM}
E (s
=S
2)
c; 2
~IQ
+ 1 S= 2
2
2
I,
(60)
+ Swm + mp
S.
(61)
(62)
(63)
where
Ch
c. and
Ca
(64)
312
From (64) and (51), deleting the bars over 8 W 2 and 8 p2 for
convenience, we consider the product
(v + ~2)
_
1 S 2) 1 +
- lf( s 2_ M1 S'" 2) + ( S2 _ p
Pm
I,
10
1
mp
SI'2}
+ {C2
(s 2
I,
1 SIn
M
+ {Ca ( s 2
+ {Ca (S 2
M1
IV
I,
~ S
2)
(s.,.2_ p1 S 2) mC1}
l'
2) P + .pS,,2}
.
C2
I'
S2) mp
IV
+ C1niiiS"z}
(65)
(v + S~2)
= {~Cl
{J
+ {J
+
Ca
Ca
T~~;--:_-~-Sl';);
(Sb
1 S~2)
_ ~~~~~2r
mp -
~c:n~2r
(66)
SUB-SAMPLING
313
Clearly, (66) is minimum when the last three square terms are
all zero; then m is the nearest integer to
(67)
The
Sw 2 -
and
Case I
Suppose Sb 2
Sw2/M~ 0,
,2
2
(Sb -
.1 S. /~)
+ pm
J.
ClS,'
(69)
is minimum.
If
it- SIOZ)} ~ 0
the expression (69) is minimum when p has the maximum attainable value.
314
(70)
Case II
Suppose next that both Sb 2
negative or zero.
SW2/M and Sw 2
Sp2/P are
(8 "2-
1 S 2)
M
II)
1 S 2) C
+ (s 2 pPm
t
lQ
>
0, but Sw 2
Sp2jP~
O.
(73)
SUB-SAMPLING
315
m,
i.e., .E M,
'=1
i.e., .E m, or .E a,m,
'=1
Yn(ml)
f'-
= It W Y'(m,)
f' m(I W
f' YjJ
Ii W
and
Further, let
Mj
y"
= YI(MI)
A-~i
Yi!
j-l
YN.
YN(MI)
= ~.
M,
L J, L
'-I
j.l
y"
316
and
f'-Yi
n1 w
(lII j )
where the summation extends over the units in the sample. This
can also be written as
N
y,. =
aLVilm;)
(74)
i;:l
where ai is a random variable such that ai = I if the i-th firststage unit is in the sample, and otherwise zero.
A second estimate which we shall denote as
the first-stage unit totals, given by
y,'
ys' is based on
= y' ,,(mlo>
n
n~
L Mi
Yilm;)
ya' = Il~
a j (M j Yi(m;)
(75)
where
M; YI(mj) is the estimate of the population total for the i-th first-
stage unit
and
a, = 1
SUB-SAMPLING
317
where
and
x,'
= n~
R,
-,
= ~',
x,
M;xl(m;J
and
(77)
We shall study the properties of the different estimates in the
next section.
318
Ys
{~ t
a, E (Yi(m;> I i)}
1=1
= ~
E (ai )
. li.
(78)
t."
E(Y,)
1L h
'=1
(79)
319
SUB-SAMPLING
1 \' _
/...J
y,.
'~1
+ E (jin. -
.PN.)2
+ E U'N.
- .V .. )I
(80)
~ n~ E [
t
+
E {(y" , -
t
'-Ft'
y,,)' Il)
n]
(81)
320
where
S.ll
, =
..
E (5'no
-)2
YN.
n1 -
1 ) Sh 2
N
(82)
where
N
Sb 2 =
N~l ~ Vi. -
YN.)2
1=1
i::e
{(j".c'-y,.)
Ii} (j .-YN.)}
~~
=0
The fifth and the sixth terms are obviously zero.
fore left with
M.S.E. (.ii,)
G-1)
Sb
+ n~
L: (~, - ~)
'''1
- V(Y,)
where
+ (YN~ -
Yu)2
We are there-
S,2
+ (YN. - Y..)!
(84)
321
SUB-SAMPLING
7.4
1. Amritsar
--------_.-- - - - - - -
Simple Arithmetic
Mean J',
1029
Weighted Mean
,i','
1041
2,
Gurdaspur
829
862
3.
Jullundur
839
881
804
796
4. Hoshiarpur
5.
Ludhiana
1247
1246
6.
FerozepuT
1052
1079
7. Amba1a
854
820
8. Karnal
839
868
Hissar
1090
1142
9.
1004
997
Gurgaon
766
752
Province
920
927
10. Rohtak
11.
21
~---
322
ys'
For,
E(ji:)
~ E {n~
t.
E (M'p" .. ,
o}
~ e{nVt M.P,}
N
N~
AlIYi.
i=l
(86)
= Y..
To obtain the sampling variance of YS', we write
V (ji,/) = E <y,' - Y~.)2
= E<y,' -
+ 2E <Y,' -
Yn.')
<Y.: - y.. )
- y.')'
~ E {n~
~ n'~ E [
+
'7'0"
t
t.
(87)
M, (ji"." - j;,.j
y,.)' I i)
Ii,
n]
323
SUB-SAMPLING
L: M,~ (~I -- ~J
(HH)
8,2
i1
Y.Y
E(j".' -
=--'
(! - NJ) 8 /,
fI
(89)
'2
where
N
8 /, '2
N -I
L (M .
M
j.
_
j.
i'
. .)
N-I
L (UJ-. -
(90)
.i'.)2
,::::1
The last term in (87) is clearly zero and we are left with
V (j' ') =
8
(!n - N_!) S
'2
b
+ fiN
) W
\' M2
~/ () -- ))
mi
Mi
(9) )
illll:j
324
E (j") _ E (j_.')
8
E(un)
(92)
= Y~
since
E(un ) = 1
+ Y..2 V (un) -
V (YB") = V (y,')
2.Y .. COy
(ii.'.
(93)
an)
= (~n
I) S
(94)
where
N
'\'
Su2-- N 1
_ I W (u, - 1)2
and
COy (ji,', un)
= E {(u"
- 1)
('va' -
Y.,)}
= E [(u" - 1) E {(y.'
- _vu,') , nJ
G-1)
I_-'f
{t
(ujy.. -
I)}
, .. l
0=1
- 2Y ..
.=1
(uj -l)2
'''1
1)1
325
SUB-SAMPLING
(96)
j."
where
N
8/ =
N~ 1
L: ul(Yj. - Y..
)2
i::;::;t
7.5
y,
y,'
r/
-------
37
14'0
47
25
100
S'7
..
55
15'0
1l3
1945-46
..
69
14'0
132
326
+ nN
I\
( m,
I
W' 2
U
V (-')
x, -._ ( n - NI) S I.. '2
- MI ) S I~2 (98)
j
1=1
V(Y,')
(!n - N_!) S
'2
bv
+ nN
_!_ W
\'
u2
j
(_!_
- MI-) S j~
m;
(99)
''''1
and
+ CY .. .' -
- x...')}
327
SUB-SAMPLING
~ ~E[
UjU j '
>]
j#I'
Il)]
n~
1:
u.
(~I
- ~)
(101)
Slyr
i==!
where
(102)
where
(104)
= (IIi - N1)
8'
b
+ n~
L:
,",1
UI
(~I -
-k)
8 , (105)
328
(-,
J 1 YR -
(!n _ N_! )
+
j S'2
t /..
If'
_!_
u. 2
nN
+ L=
X..2
2ji " S
'2 _
X..
bx
(J __ __!_)
M
'mj
'1
bU')
2
'.
i=l
+ Y..
:X:,,2
L
N
Ut X ;: -
- 2
Nx ..
1=1
(!n _ .!)
__l_ '\'
N N-l W
U. 2
(~'.
,V"
Ri j
'
)2
i=l
1 \'
nN W u.
i-1
2(1m, _ Mi
_!_)D;2
(106)
SUB-SAMPLING
329
where
(107)
When
Xij
= 1, (107) becomes
=.
n ---1
(108)
1",
- n {)iN. s + V (ji,)}
330
Sbl!
= N~1
(Y;.-YN.)2
we obtain
(109)
Also
where
S,2 ,'"
m,
.L:
mi~
(111)
(Yii-_}';(lIIi,)2
Hence
1
n
S2 I,
(112)
Est. V(y-)
(1n _ N!)
S2
I,
1 '\'
nN W
(1m,__ MI1_)
S2
(113)
33]
SUB-SAMPLING
(114)
Sf.
'2 __
--
"
1
\'
n __ I LJ
(M.if -; __J.,,)2
(115)
.liC",,)
- n (Y ..2
+ V (.V,')}
Sh'2
+ ~-
L 1/ (l. - J)
Sj2
(116)
i=l
Also
(117)
Hence
(lIB)
332
(119)
'2
S"
(120)
ys"
Let
(I21)
On substituting for
333
SUB-SAMPLING
M2(n-l) E(SI:21 i) =
I:
Mi 2
+ (Im
~) SIZ}
[~M
(- 2 + ( m!.
W 3 e"
2
n
J:M
{.1\2 + (~-i
_ M
I ) S
'
t
_ M.
'.. ) S 2}
2} +. L'.MM.
' ,}-, . Y-.]
1'1- 1'
+
(
"
.E Ml' ""' M 2
.EM, )1 W
ft
(!m, - M,1,) Sl
334
y's
putting
)1 ft.
we have
)_1_ L..l
f' M'l.
h!J (".
In-
_ )-, ")2}
n.
\,ft.
S"2
b
S:'
so" -
(123)
)1 _ _~!'ii._ + 1; M,2 1
1: M. (1; M.)2 J
(124)
335
SUB-SAMPLING
(I _ N
~) S. N2 + W
f' l( 'M2
1,2 (1m, _ M,1) S,2 }
IJ
{~- n~-d!- ~)
-l~, + (i~j)}
or, to a first approximation,
(125)
YR
Est. VI (YR)
e-
~) n ~1
+ nN
1W
'f' U 2 (_Im,
j
_ M,1) d
(126)
where
(127)
It has been pointed out already that unless mi's are equal it is
not valid to evaluate the variance of the estimate using the
analysis of variance table. Nevertheless, for moderate inequality
336
in mi's, the method with certain adjustments has been recommended for use (Yates, 1949). The method consists in calculating
a number ~ given by
n
Em,
(128)
n - 1
large;
and
(d) Si2 is constant for all i and equal to, say, Sw 2
337
SUB-SAMPLING
=-=
"
1 \'mE(I'-)
mo W
l
I.
(131 )
mo
,,2
, J ~~
_. __.
E l"-' m i (.1 ,('''i) J -. ) I
E
{ "
-.,
I: m:., (1'0(",,).1' .. )-
I
I"
., E (4..1
I ~ m - ()'
-mo
i{md 2
I'
-I-
I' -
r )
"j . . . I .....
-I- 1: mim,'
i-r-
(5'i("'il -
(y ,'(mi') -
t'
+- I: m,m.'
- + Y.'.
- -
{U'd",,) -. J'j.)
U'i'(m() -
<Y;, -
-)1f
Y..
y;,.
.v... )
- ji .. )
ji .. ) (j.,,- .I'..
h.)
)}]
338
=~
[1; m,2 {E
(YHmj) -
)'1.)2
+ 1; m,m
i#i'
i,
+ E ()'I. -
)' .. )2}
since the expectations of all the other terms are clearly zero.
Hence
VU~) ~ .,;.[
t me {(~; - It)
S.' + S,' }
-t
m,m,'
j~i'
s;;]
(132)
= n=1
[E t"' {y"
+s.'
(~ - ~)}
- ... (Y..'
.... __
I _ [862
n -1
vu_)}]
SUB-SAMPLING
339
whence
(134)
340
,.
sample, namely ell;[; mi. This second component will, however,
vary from sample to sample of n first-stage units. We shall
therefore consider the average cost instead of the actual cost of
surveying a sample, given by
(136)
t (~ M,S,')
(137)
+ N~~ ~ (~:,
MlS,.2
i>i'=l
+ '!!i:
m
where
M.2S
"
2)
(138)
(139)
SUB-SAMPLING
341
i>i/~l
+~
L: (
,lc2 ::1m,
i=l
(140)
and this is minimum when each of the two square terms is zero,
giving us
(i ,..-" 1, 2, ... , N)
(141)
342
kM,
(142)
(v(i':) + S;'') c
(cdc,kR)
"
i=1
(143)
SUB-SAMPLING
343
y/, he will notice that the sampling variance has the same form
as that for the estimate Ys' except that Sb'2 is replaced by Sb H2
It follows that the optimum value of
is so determined that
/C;-. ~i s.
m i
mi
c2.J'
where
nI "1:
_
MJ'I(fIIll
344
Zl
Zl
= n~ 1: g.' = 124
_
n(11 -- I)
S~g2_l/g-2l
l ..,
19463 - 7750
50x49
478
z. = 13-34xO'80 = 10-7
SUB-SAMPLING
= ~2 E {( i; Mi)
E (zz)
Z2'
(t E
we have
U'j(",;) ,
= ~2
= ~2
1
n
1
II
i))}
{i MS;. +,f,MJ,,.l
{IL + (n _
{v
345
1) (
N
N-I
J\f.l'N _
It)}
N-I
+ (11- I) MYN
}
'
for large N.
Define
E (M --
M) ('" -)' )
_ _ _ _ _ .. :_ .. ______ _.::___!_:_ __ N.
'I
where
and
L (- -)'
N
l' - ,
,;,
)N, -
1=1
or
346
....,
-~
-Ii
-----
~ ~~~~
~~a:~ ~
6000aoooooobobbbbbooooobb~bb~06
::
- Ii
_I
.(o~~:;l8
:~o
~-:~"?'~"':'
00000
~~ooo
'i
.....,
N
600000~0000bobb~~~00boob6~~~~ob
;;-
.......
;::
.......
!:i'
.:::
Ii
II
~~~oo~
\0
l"-
e-
~ ~...
...
Il.I
~o~~oooo
1('\0
1('\01('\1('\0000 1('\
"'''''''00000
4)
-5
c=:J
0;
~c::
~~
'-J!
c
0""'
"'I('\I('\OO~O~
6NOOOOOO~oo~~6~~~~~o6oo66~~~~~~
,;;:.:
I~
~
:;..
00
~~OOOOOO~OO~~~~N~~~o~oo~b~~~~~~
~
-NN M
-~-MvM-
ON
347
SUB-SAMPLING
"..,
- Ii
,
..
~
.....
V"I
. .r
.....,
~~~
~~~
II
Noo~060666060666006
0V">00
V">V">00V">
OV">V">
OV">
~o~~6~O~~N~NO~~NO~~
~~-
---
O~O~
~~~o~
~~~
~oN~~60666~6066NO-0
.... '
....,
~I",
x~E,
o~
!!....I
IG
00
-D
'". ...,_,
'l
M
_~~~o~~~~~v~-~~~o-~
_
_ N _ _ _ _ _ _ N_NN
-N
N
000
"'-
348
p = r = 0261
8M 2 = 8 M 2 = 65'58
SM =SM =810
= 07347 -
50 (14,42)
=04463
S.
= 0668
Hence
138
Y,,(ml)
= (OJ>~)5~~~8) + (178)~g'7347)
423
+ 2616 + 0773
SUB-SAMPLING
349
and
M.S.E. (za)
= V (za)
+ bias
= 423
+ 1'90
= 613
7.16
Stratified Sub-Sampling
By far the most common design in surveys is stratified multistage sampling. In this design the population of first-stage units
is first divided into strata, within each stratum a sample of
first-stage units is selected and each of the selected first-stage units
is further sub-sampled. Crop surveys with the subdivision as
the stratum, described in Example 7.1 and the corn borer survey
with the district as the stratum described in Example 7.2, are
examples of this design. In this section we shall give the formul:e
for the estimate of the population mean in stratified two-stage
sampling, and its variance. We shall consider the unbiased
estimate only.
Let the population be divided into k strata with Nt first-stage
units in the t-th stratum, so that
M t
= 1:
_
Mil
NIM,
(=1
= 1:
n,
1=1
350
y"
where
and
- E .\,Y,,'
(146)
'''1
SUB-SAMPLING
351
(147)
where
and
(148)
(M~~::":'T)
SIl!
MU
I:
(Y"j - ,"" )2
j;;:"1
l:
(149)
2
(Ylli
l'
II(,"ti) )
,V,.
=-..0
1=1
where
"' M,
_1_ \ ' \ ' Ylli = Yh
n,m,
and
.\, =
l..J
l..J
,
J
352
l: 'lf(~ _
A2
11,
Nt
)S".2
'v
_l
1-
1
11,
1=1
where
Nt
Slb
Nt
~1
2:
(jill, -
11 ..)2
1:=1
and
and estimates of Stb2 and Stw 2 are provided by the same formula!
as in (110) and (112), namely,
= StIJ 2 - (~-,
Est.
Sjb2
Est.
S,..' = 8,.,2
..JJ ~tf<2
(150)
SUB-SAMPLING
353
nm US
"""
(ln _ N
I) "S
(153)
\ ' (j
f.r--=:-I W
'i. -
-)9
Y. -
(154)
i=1
and
Sw 2 =
the mean square between second-stage units within firststage units in the whole population
(155)
Y., =
1}
t=l
PtYt.
where Pt is the weight for the t-th stratum, and its sampling variance
(156)
= 1.:
(ji,._ Y.. )2
jml
Nt
23
Y..)2
354
N,
+ Y, .. - Y.. )2
.E .E (y", - y..,
'~l h"l
= .E (N, -
1) Su 2
'=1
+ .E N,Y, ..2 -
NY ..2
(157)
'=1
2
The estimate of Stb is known from (150), so that our problem
reduces to estimating the second and third terms in (157). Now,
from (10), we have
V(Y,n,m)
18
1
1
Cn,1 - N,)
S'b + (m - M) -;,~
2
=
whence
- 2
Est , Y,..
=-
Y"
2 _
(__!_
_ N
_!__),Sb 2
n
,
(158)
(159)
or
L
'-1
k
Est.
L'=1
k
N,y, ..2 =
L
k
N,y,.2 -
N,
(~,
k) S'b~
t=1
(160)
Also
V{Y ..)s=
E(J; P,Y,. )1
'-1
_y...
so that
Est. (NY._2)
= N(J; p, y,.)'
'&1
1) S,2
~ }
+-n,1 (1m
_-M
'"
(161)
SUB-SAMl-LiNG
355
Est.
{~~
N~'
1.1.
NI'
2 -
. ..
}~
2.
"
-,!
-, 2
Nlt'I'
-.ltD)
N 2
NP, )
I
S11,"
(162)
"N
k
S.2
= N-:_1 1!...J
(ji,2, - J'"
- 2)
t=l
(163)
SID
(164)
356
[t.
+ N~ {t
~ G- ~)
On
p,;;,,'
p, (V,. - Yw)2
t~l
+ (I__ _
1 )}_
n
[1 _!!_~
N-I
(165)
.
k
Est V V(:;,)
.. S -
\'p2
W
I
{(~
n
tb
Nt
(166)
Hence
I;
N - n
= n (N
_ 1)
\'
(- )~
P, Y,. - Yw-
1=1
\'
+ LA
_ PI!
n,
{N Nn
-n
+ 1!1}
N
P,
_ _!!_ Pt (1 - PI)
N - 1
n,
S ,'I.
tu
1 1) \' {PI
+ (m - M W n k
P, (1 - P,) _ PI?} S 2
X
n,
n,"
N- n
n (N - I)
(167)
357
SUB-SAMPLING
(_!_m - _!_)
M
S2
II!
Est. {Vus -
N-n
\'
Vs} =- n (N _ I) W P, eVil - )'",)2
t=l
_ p,2
n,
N-n
lv-I
PI (I ..._-.p).)
. I \
n,
j
Ib
(168)
which
IS
1.
Hansen, M. H. and
Hurwitz, W. N. (1943)
3. Sukhatme, P. V. and
Panse, V. G. (1951)
4. Yates, F. (1949)
2.
CHAPTER VllI
SUB-SAMPLING (Continued)
8.1 Introduction
In the preceding chapter we have developed the sampling
theory appropriate for sub-sampling systems involving the use of
equal probabilities of selection at each stage of sampling. When
the first-stage units are large and vary considerably in their sizes,
this system of sub-sampling is not usually efficient. This is even
more so in cases where practical considerations demand that the
survey should be confined to only a small number of first-stage
units within each stratum with equal number of second-stage
units from each first-stage unit, although the amount of subsampling from the selected first-stage units would be necessarily
unequal under optimum allocation. A system of sub-sampling
involving the use of varying probabilities has been used with
considerable gains in efficiency in such cases. In particular, a
sub-sampling design in which only one first-stage unit is selected
from each stratum, with probability proportional to the measure
of the size of the unit, and a fixed number of second-stage units
is selected with equal probabilities from each of the selected
first-stage units, has been found to bring about marked improvements in precision, compared with sub-sampling systems involving
the use of equal probabilities. The developments are due to
Hansen and Hurwitz (1943, 1949). In this chapter we shall give
the theory of sub-sampling systems involving the use of varying
probabilities.
SUB-SAMPLING
(continued)
359
will be drawn therefrom independently of each other, each subsample of mi being drawn without replacement.
Let Pi denote the selection probability assigned to the ;-th
first-stage unit of the population (i
Further, let
Z'I
M,
Mo
. ...
Pi
Yij
1, 2, ... N) and E Pi
I.,
1.
(I)
(2)
where the summation is taken over all the /I units in the sample.
Then it is easily shown that Zs is an unbiased estimate of Y...
For, we have
E(:,)
~E
{! t , ,. .}
~ E {~
=z ..
E ('".,,\
i)}
(3)
where
(4)
N
\ ' M, _
l...J Mo YI.
'-I
= y_
(5)
360
= E(z, - Z.. )2
= E (i"lmO - i". + z". - Z.. )2
= E (Zn(",;) - Zn.)2 + E (Zn. -
+ 2E (Z,,(mi)
Z.. )2
- Z". ) (Z". -
z.. )
(6)
~ E {~
~, n~ E
.t
{.t
(i"." -
z,.)},
(i",,,, - z,.)'
f' (+W
zi(ttq) -
- )(.
Zi.
Zi'(mi') -
- )1J
Zi'.
1f t '
- i.J'
~ 2.
{.t
E (i"." - i,.)'
.ii")}
i==i'
(7)
where
M.
Sjz' = M,l_ I
(Zjj -
Zj.)2
1=1
= M~P:2
Mi
. M,I_]
(Yij - .vi.}B
(8)
(continued)
SUB-SAMPLING
361
E (z . - Z.. )2 =~~
n
(9)
where
N
(Tho
\' P (MIS'..!.. __
LJ
MoP i
Y..
)2
(to)
v (Z,)
a~/ + :1
==
(~i
Pi
~J
Si. 2
(11)
{=l
v (Z,)
{f M/p
LJ
1=,
M2-
Y.,2}
--nI
L
(12)
Z~
(i
Obz2
1,2, ... , N)
362
whence
N
\ ' M, (_!_
LJ Mo m,
ld'I" _ Y..
- )2 + n~
M v "
v (z,)
I-) S.2'
M,
(14)
If'
LJ
~ n ~ 1- {t
-2
((mi'
nz-2 n(m;)
jt
E ("".,,) - nE (Z'",.,,) }
(16)
= n ~ 1 [n {V (z'(m;
+ z. !}
- n {V (%"( ... ,)
+ z. t} ]
(17)
L
N
E (Sht)
= 17b/ +
;-1
P,
(~,
it)
SuI
(18)
SUB~SAMPLING
(Continued)
363
It follows that
Est. V (z,)
S 2
(19)
_!J.!'._._
= cln' + C2 Em,
(20)
where
denotes the different first-stage units included in the
sample,
n'
1:
mi
C1
and
the cost per second-stage unit of collecting the
required information.
C2
(n')
= Cl .E
364
Cl
Cl
1: {I - (1 - PS'}
(21)
4=1
Cx
1: {I - (I - Pi)"}
+c n E
2
i=1
(23)
Pim,
i=l
= c1n + czn E
(24)
P,m,
1=1
{ ,
p.
{Cl + c2 p,m,}
(25)
SUB-SAMPLING
(continued)
365
(z,) . C = clLl
~:::l
PI \rc~-C;-,ifS~~2-
2C2
P,P.,SizSi'z
i>i';=1
(2~)
(/ = 1,2, ... , N)
(29)
366
Pm_' =
_'
M,S,
constant
(i=I,2, ... ,N)
(30)
a constant
(i
1,2, ... , N)
(31)
Knowing k, the value of n is obtained from (11) or (24), depending upon whether the cost of the survey is minimized for fixed
Vo, or the variance is minimized for fixed Co. In the former case,
(32)
Co
Cl
+ cBkMo
(33)
SUB-SAMPLING
(continued)
367
alternatively, they can be related in a known way to the characteristics of the units to be selected.
The optimum values of selection probabilities are given by
minimizing the variance of .is for given cost. In this section we
shall determine them assuming that: (a) the sub-sampling rate
mdMi for a specified first-stage unit i will be such that equation
(31) is satisfied, and (b) the cost function is independent of Pi'S.
Following the Lagrange procedure, we consider the function rP
given by
<p
V (z,)
+ >. (.&, P
1)
(34)
\ ' MS2
jj
nkMo2 W
(JrP
~P;
SMly,,2 _ MjS,zl
I'T J
- - nMoi t Pl'
+ >. =
Hence
(i
1, 2, ... , N)
368
P, =
(i
= ], 2,
"" N)
(36)
S/
M;Y~2
+ Ca 1:" mj + Ca
I:" M j
(37)
SUB-SAMPLING
369
(continued>
c1n
+ can 4=1
1:
P,m,
+ Can 4=1
1: P,M,
(38)
C11l
(39)
i=1
for Pimi = kMi. The cost function (39) will now be seen to
depend upon Pi'S. However, if Mi's are unknown, Mo will also
not be known and the estimate Zs can no longer be used. Several
alternative estimates can be formed. We shall consider one
such estimate in this chapter, namely, the ratio estimate, and
thereafter resume discussion of the problem considered in this
section.
~-
(40)
MI
MoP, Y'j
Zij
VI!
Mo-P i x,!
;;
-,
v,
11
L:
L
"
II
M;
MoP,
M,
M~p.
YI{>II,)
(41)
-",(m"f
370
YR'
= - 2 {V(Z.~
V(j)
Y.,2
Y..
+ _rev,) _ 2c.()y_~i"
X.. 2
Y.. X..
v~}l
J
(42)
V(i,)
(t Z;~;2 - Y_2)
'~l
(43)
and by analogy
V(v) = 1
L ;~i (~, - ~)
N
+ n~2
S,/
(44)
i;;;l
Further,
COy
(.f., v,)
= E {(z"Cm,)
- in.
+ zn. X
= E {(z.Cmi)
Z.. )
+ (z . -
(45)
since the expectations of the other two product terms are zero.
Now taking the first term in (45). we have
E {(i. Cmi )
= E
z.) (ii'Cmi) - v. )}
.l)]
(i.(.,,-
SUB-SAMPLING
(continued)
371
(46)
- v. )
n Shz"
(47)
COy
(z" ii,)
(48)
372
where
(50)
and
R = )',.
x..
where
Uj
= ._--Mi -
MoP,
and
"
ii.
LUi
(52)
Also,
+W
~
1=1
M/
Pi
(53)
,~
!.
~1
L ~~il
(ji'I"'i) -
R,X 1lnli )2
(54)
SUB-SAMPLING
(continued)
373
where
R, = Est. R
8.7*
!.
VB
(55)
}; Pi (Zi. - RV i .)2
i=l
where
(56)
Cl
+ Ca
};
P,M,
(57)
i=l
(58)
where
LJ'
If Di is constant we reach the same result as (31), namely,
(59)
314
+'" (C -
Co)
+" (1
,-I PI
--
I)
(60)
</>=
M;2
_,
P, (Ji, - RxiJ
,i
+ ~
1
k
P,M i )
M,D,
Co}
+;1. C~Pj--I)
(61)
Differentiating
c/>
zero gives
(i
SUB-SAMPLING
(continued)
375
= P.nC3 })
P;M,
+ ,\
(65)
i=l
nk~o2
(66)
MjD,2 = p.nkc 2 M O
i=1
"~p.
(cln
+ c2nkMo -+ c n }) P.M;)
3
(67)
;"1
(68)
(69)
376
Di
~-=
(-Y . - R X,.
-)2 -
D.2
M
(70)
Solving the other equations the reader may verify that the
values 01' k and n are given by
~-
(71)
and
n=
('1
-I- c2 k Mil
(72)
+ C:; }; PiM,
The optima will naturally vary with the cost function and the
sampling system, and care is necessary to determine from pilot
studies the nature of the cost functions before deciding on the
optima to be adopted for the surveys.
8.8 Relative Efficiency of the Two Sub-Sampling Designs
We remarked in the introduction to this chapter that a subsampling design in which the selection probabilities are proportional to the size of the first-stage units, and a constant number
of second-stage units is drawn from each selected first-stage unit,
may bring about a marked improvement in precision compared
to the sub-sampling design involving the use of equal selection
probabilities. In this section we shall compare the two systems,
using Zs as the estimate for the former system and the simple
SUB-SAMPLING
377
(continued)
U.S.E. (zJ -
liN
L
'=1
'\'
nmN
~4
M
S,2
(73)
E {(Yi! -
Mi
(~i -
- 5'..
+ 5\ -- ,I'.. ) I i}
Y. ) Ii} + E {(Vi.
- J' .. )2 Ii}
Mi
I)
(Yi) -
+ (5'i. -- .V.. )2
!f'k c 1
Ai.(~, ~I)
[It -hI}' ~~ t
(Yo,
(y"
~-J'd'J
+ u.. = -
S.2
U.
.1".. )2
(74)
a2
nN
Li-'
"!.' Pi + __1___
M
nmN
M'S2
if
~i
(75)
)78
+ n2N
L 5.
N
1
5MJ
l
(1 -
~)
(J'N. - S' .. )2
,=1
(1 _ 1)
n (I'
.
N.
(76)
_ .Tt .. )2
The difference between the two mean square errors is, therefore,
given by
M.S.E. (Y,) - M.S.E. (f,)
=
N
'\' 5.2
(A!I
nmNW'M
1=1
(1 - D(J'N. -
J.Y
SUB-SAMPLING
(continued)
379
and
Zi,
m1'.
zs', given by
(78)
such that E
y.
1"'1
is given by the (r
{PI
+ (I
- P,W
namely,
P {y.
+ I )-th
r}
y.
\
is equal to r
380
E (z:)
~ E {t Y;Z;, ""ti}
i=l
= nE
{; Yi
z}
,.
=""
! I;
E(y) Zi.
(79)
E (:,')
1)'
L.., nPiz ..
i=1
= z.
(81)
=L
= Y..
(82)
SUB-SAMPLING
(continued)
381
V(z,') =E
1
n
1', "
~"J
_,,\,
'""j
i=t
~ E {t L
N
_
1',zi, '"I', -
)
n
1=1
( Zi'.
2: Y,*i,
N
1
1-
=i'.
I
"1-
Zi'.-
l'
2:
N
__
z 1'",
z..
12
,el
1=1
III)' -
11
=..
-)}
(83)
+ 2 (Zl, '"'" -
E [f,//{E
z,,) (z.. -
(Z","l'j-Zj,) 2j
z,)} ]
i,Yl)+Eii,-i,YIi,y.)
382
1:
;#;'=1
Y;Y/ (.i"
m')',
I
+ (.i;.,",)" + (.i
=
E {
.f
l.
.i;,
-.i,,) (i l ,.
i .. )}
In')' , j
i,'.
+ .i;',-.i,,)}
(85)
i==l=t'=t
(86)
= ;;" ,2 P (y = r)
,-1
'
l: r e)
P/(l -P,)-'
SUB-SAMPLING
(continued)
383
= E{E
E(y,y)
(Y;Y,1 Y,n
(88)
(89)
I-Pi
= E
{I' , . (n -
.__n_!:.J..
t - Pi
y.)
Pi . }
, I -Pi
0
(1'.)
..
PI
1- P,
E (I' 2)
I
I -PI Pi {P
n, (1 - P)
,
+ n1P,2}
(90)
Using (87) and (90), the third term in (86) may now be written as
E
f: (1',2) (i,. -
i .. )1
'=1
{n (n - 1) p,p/}
'.,k /-1
z.. )
384
+ n (n
=
N
z.. ) is
- 1)
) 2
nu bz2
(91)
clearly zero .
12 -
(92)
Il
Il
{t
N
n -1
(93)
1 \ ' M, (-)2
V (z-,') -_- n W
M~ h - Y ..
'-1
+ n1
\'
A!,
LJ Mo
(1m _ M1_) 8
i=l
,r:
N
n - 1
nMo
'=1
M,
M o- 8,2
(94)
SUB-SAMPLING
(continued)
385
"
u,,2
+ S",2
n
(1m _ M~)
n-l
(95)
/I
n- 1
n
8.10*
Estimation of the Variance from the Sample when SubSampling is carried out without Replacement
= E
I'
i==l
z2,_"'1' - nz,'2
1.
(96)
{;E,
'Y,E
(Z\
- nE (2,'2)
20
nE(.!.'II)
386
n-l
-n-
(97)
1=1
Hence
~
S b. 'I
n.: .:. 1
\'
LJ
~~I
M,
_1 _.
n- I
..
\'~:
LJ
".r.
(98)
SUB-SAMPLING
(continued)
.\' ( I
or
,,(n 1_ I)
my -
387
1) S,.
M;
'2
.
L (~ - ~) S,,'2
\ ' p, S,.'2
n W 'M,
(99)
where
m"Y,
1.:
(Zij -
Zj. nY,)2
my, - 1
all. -
2
W.C'
E (n') - I
n -- I
\'
whence, we get
E (n') -- I
[;
11-1
(100)
where
n'
"~i
1.: 2,;
.,.'2
(Zjj -
-)2
Zj. mY,
nm -n'
S '2 _
b
S '2
.,
{Em(n'~
-=J _
(n - 1)
In
the sample.
,~ + _I... }'
(101)
NM
388
+ ____
Sb.'2
_!
__ _
mn (n - I)
(102)
S'2
" '2
"b
fE (')
It
_.~--.
nm
t n-
1
1
Si'2
and can be
nm }
+ N--M:;o"
(103)
SUB-SAMPLING
(continued)
389
En, =n
'~1
M"
= -M-
1
. -p
'0
Ie
y",
U=
(104)
ft'
I" =
k~
%"("111
(lOS)
390
Z",
= ,.,
1)
A,
(106)
ita
(107)
A, = -
where
N,
= 1: Ph
'''I
(W9)
(Zli. - i l.. )2
and
M"
S2 , l("l
= M;;-=--r
(ZUj -
(110)
%11.)2
jet
i,
where
(112)
.,
V ( ,,)
+ ! \'
nnw
(1'b(.)
P ( ]
I
] ) S'
til, - M,
1(.)
(113)
where
(114)
SUBSAMPLING
(continued)
391
and
(lIS)
/=1
P,
p,.
(116)
where
N,
Pt. = 1: P,
A,
P I.
I:
_ A,
-z, .. P
t.
_
+Zl..
A,
P
2..)"
2, .. -
z..
I.
N,
L: L
p,.
tal
PI<
{~,~2 (2h. -
Z, .. )2
+ (~,.
y}
"=1
(117)
'-I
''''I
Also
).,t P SI
p.
I.
II
"til)
(118)
392
a2b(z)
1;
I: -p---: I:
NI
At 2
Pt;
~~) S2 ,Hzt )
(_!_
m
ti
1=1
1"1
(119)
{Vus- Vs }
At
n~t,
Z
(
n~) a2
tbt ./)
t:::;l
_~n
{f'
z2
W At:_
P t , t..
p}
..
t= I
+W
'\'AZ( 1
nP,. -
1)
n"
NI
.L
Pt;
(~ti
~J S2,H
t)
(120)
i=1
L
1='
A,2 (
n~,.
SUB-SAMPLING
(continued)
393
11.(.,)
(122)
S"II,(.t)
11, -
Also
S'2 II(,/)
S2
(123)
li('/)
We have
Nt
1: !" -- ~,~)
+ :,
P li
S2'i/)
i=l
so that
Est. (Z, ..2) =
nl
a2 Ib (ot)
- 2
Zto
;'7
111
1: (~'j
~J
S2'H:t)
(124)
Similarly
Z..2
E (Z., 2) - V (Z.,)
whence
k
Z..2
1:
'=1
+ n,1
At 2
{~:'~;~
.,
L (m!,
- 1') SI,,(.,)}
(125)
394
}
Z- 2
'\' ",
-_ %, 2 W P,."
k
",2
= "\'
.--. Z,- 2 W P,.'
'-1
- 2
Z
'"
1=1
.,
(1mil - .Ml
x )'
II
) S2 ti (zt)
(126)
L P,.
A,2 _
_ 1
Est. {Vus - Vs } - n
ZI8
Z,n
1=1
A-.
L A~~
(nn;". -
1 -
nP,.
+ ~) a2"'hll
1=1
(127)
+ "\'
W
1 + IiI)
A, (~ _ 1 - nP
n,
nP ,.
I.
S tbI.,1
(J28)
SUB-SAMPLING
(continued)
395
= Y,j = Ylij =
Z,ij
Also
(PI.=AtI
{t
A,
Z,,2 - Z.,2}
1=[
k
+ L n~,
Example 8.1
396
8.1
,.
N,
N,tti,
n,
n,'
YI", ..
Stb
434
71670
19
19
1291
05058
10726
05058
10726
405
44114
13
12
2597
37764
81920
3 7724
79921
565
33107
23
23
2078
41255
90720
41255
90720
2'098
18717
44411
18105
4'4334
47173
121021
Stratum
SI ..2
S,b
'2
'I
851
93734
34
32
271
24631
14
12
2'552
49885
123426
471
51776
18
18
1675
06760
36231
06760
3'6231
347
44028
15
14
2'100
16487
33946
16482
33214
Sums
3344
363060
136
130
- _._._-_------
independently;
(b) A single sample of size 4y was selected from each selected
II
S'b.
n,
SUB-SAMPLING
(continuedi
397
'\' \
2 SII,!
= f....J ", nt
where
N,M,
M to
NM- = Mo
The weights At are computed in the first column of Table 8.2.
Substituting from the table, we have
Est. V(Yw) = (019741)2
~i~?~ +
(012151)2
+ (0'091189)24.1255 +
23
~'i~64
(0.25818)2 18717
34
+ (0'067843)24.9885
+ (0'14261)2
14
+ (0 '12127)2
= 0001037
06760
18
!_-_?~87
15
= 001451
8.2
Computations for Estimating Gain in Precision due to Stratification
...------ _...._--_._----._._-_ .. _..._-_._ .. "_ .. _NtM,
Stratum A, = ~
-...n,-I (n- I)~,
(4) >.(5) S'b 2
A,Ytn, .. ~,y2'n,.
nn,
NM
TABLE
-.----~--.-.---
(I)
1
2
3
4
5
6
7
Sums
019741
012151
0091189
025818
0067843
014261
012127
(2)
(3)
02549
03156
01895
05417
01731
0'2389
02547
03290
08195
03938
11364
04418
04001
05348
1'9684
40554
(4)
000076397
000068727
000029152
000055835
000035632
'000058256
'000059446
(6)
(5)
-8,6504
-4'4038
+9'6895
-1'8543
-13,8412
-22524
-23714
-- '0003343
-'0011430
+ '0011653
- '0001938
+ '0006828
- 0000887
-'0002324
_.._-----_-_-------
- '0001441
398
For example,
Est. V' (y,,)
(1
2 1)
(34) (4) . 33 - 93734 (4,4334)
1'8105
---34- +
= 0053250
= 0055178
= (0'19741)2
(0026606)
+ (0'091189)2 (0,17910)
+ (0'12151)2 (0,30281)
+ (025818)2 (0055178)
+ (0 '067843)2 (036971) +
(0 ,14261)2 (0037486)
+(0'12127)2 (0'11376)
= 001481
We note that contrary to expectation this value is larger than the
first estimate, a fact which may be attributed to sampling errors
in the estimates of O'tb2, O'tb'2, Stw 2 and Stw'2; but the difference
is negligible.
The difference between the variance of the mean of a stratified
sample on assumption (0), and that of the mean of an unstratified
sample is estimated by equation (129). The necessary computations are made in Table 8.2. We have
Est. {Vus - Vs}
=! {t
A, Z,,2 - ZIlJ 2}
'-1
A;
+~
\' nn,
~
{en, - I) - A, (n - I)}
r'b""
SUB-SAMPLING
= f36
(continued)
{4'0554 - (1'968)'}
0001341 - 0'000144
0001197
399
+ (- 00001441)
0001197
0-:-01451
0082
or
82%
400
= M Io + M lo
+ (1
- A) jim."
(130)
where
(131)
SUB-SAMPLING
(continued)
401
(132)
where
N,
alb
1: PI, ()\i' - 5\ )2
1==1
N,
a 21J 2 -_
1: P 2i
.i 2 . )2
Ll'2). -
j=l
(133)
and
Mti
8Ii 2
MIl
-1
(YH, _. .i'H. )2
,:1
(134)
V (Y.) =2
{a
+ LP,(~,
-k)
8,
2
}
(135)
11
where
a.
1:
1=1
P, (Yz. -
Y.. )2
(136)
and
M,
8l
26
M,-l
L
,.1
(y" - YI.)2
(137)
402
P, =>J'u
(138)
U= 1,2, ... , NI )
(139)
NI
Au lb 2 + (1 - )..) U 2b 2
)..u lb
-+ (l
+ ).. <YI .. - Y.. + (I -)..) (jiz .. -)..) U2b 2 -+ )..h.2 -+ (1 - )..) h.2 - y}
)2
Y.. )I
(140)
We know that
(141)
SUB-SAMPLING
(continued)
403
and
(142)
giving us
Est. Yl . 2
Y.. ,,2 -
Ulb
(~lC
- ~~) SI,2
(143)
and
(144)
(146)
1
= 2(1=',,)
(-
\2
Y.... - YIIl}
+ 2,\1
(;,
VOl," -
- )2
YIIl
404
[a + >. (2~, 2
b
~1) 81,2
For>. =
Est.- v (ji,) = !
(Y "'S' - ji ",.6)2
(ISO)
SUB-SAMPLING
405
(continued)
denote
= E
X,
Xii
and
1-1
EXi
i:1
and
= E Yi
i=1
Yj
i-Xi
and
y
X
(151)
406
and
(152)
For,
M,
(r ll )
L~
'Ii
and
'''''
y
X
=R
Consider now the simple arithmetic mean of the ratios
_
rn ..
rij
given by
",
l , , \ , ) ' Yo
nm W W Xi}
(153)
~ n~ E {t> (t;
'" Ii)}
~ n~ E {m t>.}
=
1L"
=R
(R,)
(154)
SUB-SAMPLING
401
(continued)
E(rnm
where Rn = (lin)
-R. + R. -
R)2
1: Ri,
E (f nm
R.)2
+ E (R.
(155)
- R)2
{! t (',. - R,)},
~2 {t
t
E(' - R.)' ~ E
=
(r tm - R;)2
.,o.i'
(156)
the second term vanishing since sampling within the i-th and
i' -th units is carried out independently.
From (87) of Chapter VI, we may write
(157)
where
M.
fT,
L ~:
I-I
(r'l - R,)I
(IS8)
408
R,,)2
E (f,,,,, -
= n2
~~
{1:"j
G 2
(159)
E(k _ R)2
n
= un
b
(160)
where
(161)
Gb
+ nm
u
w
(162)
where
N
_ _ '\' Xi
ali) W X
(163)
Ui
i=l
n-I
{.t
E (',.') - nE ('.")}
n {V (r,,,,)
n-l [
+ R2} -
n {V U... ) + R2}J
SUB-SAMPLING
(continued)
409
whence
Est. V( i'om)
Sb
(166)
+ a",.. 2
- 2
+!!_,,__
nm
nmp
(168)
410
Let
_
nM,Yli
(169)
MoE(-;;:j-
Zji -
= ~
(170)
z,(mj)
~E
t t '". J
= E
E(f"."
lil}
For,
SUB-SAMPLING
(continued)
411
(171)
=Y..
(172)
Y ..
412
(173)
Y..2,
Expanding
V (i,)
=L
I - E (a,) M,2h 2
~ M~2- E (;;i)
1=1
and
= N
ys'
-,
:, = y,
\'MnM l.-J
;Yj(tftil
1
(175)
and the expression for the variance becomes identical with that
given by (91) of Section 7.12, as expected. Thus,
SUB-SAMPLING
(continued)
413
V (i,) =
G- I) L
+ nN
---I
1=1
where
I
N-I
W Ai
(177)
414
f'
= W
I - E (al ) M,2 E
{E (a,)}2
"
+ \'
W
'''''J
Mo2
/" 21 .)
I
St. Vi.
"
+2:
I
Ml
{E(a,)}2 Mo2
( I
m; -
Est. (Vi, h
1 ) E (S 2
M,
St.
j
1i, j)
I I')
"
,-,'oJ
+ f'
{E (ai)I E (a l) -
w E_1(al)_
E (!,al)}
'!1 (_!_ _
M.;a
m,
_!_) sl'
M,
(179)
SUB-SAMPLING
(continued)
415
V(Z,)
in';.
{E (at) E(a j )
1#1=1
E (a.)
I
-1).
S
(m,__1. - M,
.,
Hence
Est. V (i,)
X,
n12
L" (~, - ~)
(m_!_, -
S,,2
..!..)
M, s"
(180)
416
"On the Theory of Sampling from Finite Populations," Ann. Math. Statist., 14, 333-62.
"On the Determination of Optimum Probabilities
in Sampling," Ann. Math. Statist., 20, 426--32.
Sukhatme, P. V. and
"Sampling with Replacement," Jour. Ind. Soc. Agr.
Narain, R. D. (1952)
Statist., 4, 42-49.
Mahalanobis, P. C. (1945) .. Report on the Bihar Crop Survey, 1943-44,"
Sankhya, 7, 29-106.
---(1948)
"Report on Bengal Crop Survey, 1944-45," Department of Agriculture, Forests and Fisheries, West
Bengal.
I.C.A.R., New Delhi (1950) "Report on the Sample Survey for Estimation of
Acreage under Principal Crops in Orissa"
( Unpublished).
"A Generalisation of Sampling without ReplaceHorvitz, D. G. and
ment from a Finite Universe," Jour. Amer.
Thompson, D. J. (1952)
Statist. Assoc., 47, 663-85.
Sampling Methods for Censuses and Surveys, Charles
Yates, F. (1949)
Griffin & Co., Ltd., London.
Hansen, M. H. and
Hurwitz, W. N. (1943)
2. ---(1949)
1.
3.
4.
5.
6.
7.
8.
CHAPTER IX
SYSTEMATIC SAMPLING
9.1 Introduction
So far we have considered methods of sampling in which the
successive units (whether elements or clusters) were selected with
the help of random numbers. We shall now consider a method
of sampling in which only the first unit is selected with the help
of random numbers, the rest being selected automatically according
to a predetermined pattern. The method is known as systematic
sampling.
The pattern usually followed in selecting a systematic sample
is a simple pattern involving regular spacing of units. Thus,
suppose a population consists of N units, serially numbered from
1 to N. Suppose further that N is expressible as a product of
two integers k and II, so that N = kn. Draw a random number
less than k, say i, and select the unit with the corresponding
serial number and every k-th unit in the population thereafter.
Clearly, the sample will contain the n units i, i -1- k, i + 2k, ... ,
i + n -_:--lk, and is known as a systematic sample. The selection
of every k-th strip in forest sampling for the estimation of
timber, the selection of a corn field, every k-th mile apart, for
observation on incidence of borers, the selection of every k-th
time-interval for observing the number of fishing craft landing on
the coast, the selection of every k-th punched card for advance
tabulation or of every k-th village from a list of villages, after
the first unit is chosen with the help of random numbers less
than k, are all examples of systematic sampling. In the first
three examples, the sequence of numbering is determined by
Nature, the first two providing examples of distribution
in space while the third that of distribution in time. In the
fourth and the fifth, the ordering may be either alphabetical or
arbitrary approximating to a random distribution. In the latter
case, a systematic sample will obviously be equivalent to a random
sample. The method is extensively used in practice on account
of its low cost and simplicity in the selection of the sample.
The latter consideration is particularly important in situations
where the selection of a sample is carried out by the field staff
27
418
SYSTEMATIC SAMPLING
419
j\.
(2)
and
Y..
E(P")
iL
YI.
'.1
= Y..
(4)
420
_ 1 \ ' (.,
- k W
-)2
(5)
VI.-Y ..
''''1
= (~n - N}) 8 2
(7)
V(jij.)s.=
i .l:
(ji4. -
Y.. )2
'-I
1 lf1t
4-1
k~
Y'J -
y.}'
(yu -
Y->}'
J-1
t {t
421
SYSTEMATIC SAMPLING
kn'
t. {t
y)'
(y" -
+ L...J
~ (V.
j,
- ..
I' ) (v
..1 '-)'" )}
j~I'~l
kn' {(nk -
I) S'
+ L...J
~ W
~"' (" _.- Ii
r'l)
,--}
r/
) (I','
f}
,-, )}
r"
(8)
)'=-1
kn
(kll _. i) S~
(9)
or
k
"
};
};
1""
(Vi) _.
(-10)
iI'=1
+ p (n -
J)}
(II)
v~riance
p.
VS
VR
(nk - I) {I
p (n - I)}
n(k-I)
(12)
422
(n - I) (kn - 1) pS2 = 2
"-1 n-CI
1: }; .E
(Y'J -
Y..) (Y'J+"
Y..)
"-1
kn - 1
2 1: k (n - a) p"S2 - - "=1"
kn
(13)
SYSTEMATIC SAMPUNG
423
= _... 2
. _ \ ' (n _ a) p
n (n - 1)
(14)
a=l
(1 -
S.,,2
(16)
S",,2
= n("l:'::_:-I)
1: L
/=1
(Yil - ji)2
(17)
~.:1
-r
LY"S
le1
424
o?;
k
"
(YIJ -
Y.Y
n
+ f...J
"_ w
" (v.
1=1
'J
V ) ()!., - v,)
.1
t)
(18)
rrr=l
t~~
{t t
~
kn'
(y" -
+2
x
(y" - 5'.,)(y"," -
5'.,..)}
y,)'
(~ k (n - aJ P,.,. )
(f'W Wf'
J=l
(Y;1 - YJ
n(k -1)
)}
(19)
\=1
where
(20)
(Y.J - y.;)2
n (k -- If
SYSTEMATIC SAMPLING
425
V(jids~ =
-k-
8 .. ,2
r+
n (1
k-
~
I 2n f;:
(n -
a)
P(a)"'5
(21)
+h
(22)
Y..
(I-'
+ h)
h=l
I-'
N+ 1
+ --2=
NI-'2
(N + 1) (2N + 1)
+ N-----'"6--+ I-'N (N + 1)
(23)
(24)
426
and hence
S2
= N
{t y.2 - NYoo2}
~1
."1
_ nk (nk + 1)
12
(25)
(k +
1)
= k--.......
12-----
(26)
and for the same reason, since the column means corresponding
to k different systematic samples also increase by unity, we have
the mean square between column means given by
S
k (k
+ 1)
(27)
12
Substituting from (27) in (6), from (25) in (7), and from (26) in
(16), we get
VSr
VR =
k2 - 1
(28)
-12-
(k - I)(nk
12
+ 1)
(29)
and
(30)
Hence
Vs: Vs.: VR
k +1
= - : k + 1: nk + 1
n
or, approximately
1
'" %n
=n
0
(31)
SYSTEMATIC SAMPLING
427
y~
sin
{a +(h -1)
1~}
428
Systematic sampling is found to be both efficient and convenient in sampling certain natural populations like forest areas
for estimating the volume of timber (Hasel, 1942; Griffith,194546) and areas under different types of cover (Osborne, 1942).
We shall illustrate here its efficiency for sampling a certain natural
population distributed in time.
Example 9.1
SYSTEMATIC SAMPUNG
429
(32)
6-7
7-8
8-9
42
52
19
23
56
23
39
33
27
52
31
25
13
41
14
15
16
16
Week
9-10 10-11
11-12 12-1
1-2
2-3
3-4
4-5
5-6
36
59
14
14
32
39
45
33
15
19
31
16
27
24
19
24
21
27
21
32
45
57
14
47
39
20
26
33
34
61
45
20
28
28
17
40
41
41
37
21
31
15
14
12
15
10
17
10
18
16
'",
430
9_2
Analysis of Variance Table together with the Values of p and of
the Variance of Systematic Relative to Random Sampling for the
Data Collected on the Third Monday
TABLE
=2
k = 4, n = 3
k = 3, n = 4
D.F.
Mean
Square
D.F.
Mean
Square
Mean
Square
Between (nS.')
4708
5297
7008
Within (S,..')
13658
11200
10164
10
10542
9590
II
9590
11
9590
11
9590
Source of Variation
-.
Total
6,n
Il
-0,55
% Vsr
---~---
55
Mean
Square
D.F.
075
-0,16
-0'27
49
VR
D.F.
k = 2, n
-0'199
73
---~---'-'."---.-------
.-.----.~--
9_3
TABLE
3-
Week
+-34
-,25
-,26
-,13
147
61
30
63
-,33
--36
-18
--16
74
33
63
37
-'55
-27
-,16
-,199
49
55
73
-29
-29
-31
--19
78
51
+51
-,48
+,04
-,193
166
154
-,79
-27
-32
--197
23
5S
-,58
--23
--16
-131
46
67
73
12
63
431
SYSTEMATIC SAMPLING
(33)
S",.
where swc 2 is the mean square between units within the selected
systematic sample, i.e., a column of diagram (1). Clearly, (33)
does not provide an unbiased estimate of (6), fOf,
_1 E
n- 1
(r;n )'
'
I)
2 _
nv
I,
2)
J=l
'-=1 [ E
=n
~ 1 [~
{t y,;} - "ElY,')]
tt
icl
y./-n {y ..2+ V
<5'ds.}]
J&l
= n
~ I [~
t. t
y,/- ny_'
- nn\-;; I ~ {I + (n p
= nkn"k 1 SI (l - p)
I)}]
(34)
432
_!_)
( !n _ nk
nk - 1 82 (1 _ p)
nk
(35)
+ 2 (Y.! -
Y.!+l) (y" - y)
SYSTEMATIC SAMPLING
433
Clearly, the value of the first and the second term is equal to
(n-I) Swr2 each, that of the fourth is equal to -2 (n-I) S wr 2p (l)W'
and that of the fifth and sixth is zero each. Hence we may write
1 _ _!__) E
(n
nk
{~~l (Yij -
k k- I
J'ii+l)21
2 (n - I)
n { S.,,2 ( 1 - PU)"')
(37)
=1
(38)
434
jin ...
n~
. ...
L: L:
Yll
1 E -{
11
i: y,}
'
= y..
(39)
EOn", - Y.. )2
= EOn", -
E (jin ... -
+ Yn, - 5',,)2
Yn,)2 + E (ji.. , - Y.. )2
ji..,
+2EO..", -Yn.)(Y.
-y.. )
(40)
435
SYSTEMATIC SAMPLING
n2
n2
[E l' EW\m +
ii,
JI, )2/ i}
E (Ci\m -
h ) (5"'.,
- 'vI',)
I i, j'} ]
(41)
The value of the second tenn of (41) is clearly zero, since samples
are independently drawn from the i-th and i'-th first-stage units.
The value of the first tenn of (41) is derived from (II). We
have
where Si2 denotes the mean square between the second-stage units
of the i-th selected first-stage unit, and Pi denotes the intra-class
correlation between second-stage units within M 1m columns of
m units each which can be formed out of the M units of the
i-th selected first-stage unit.
Substituting from (42) in (41), we then obtain
E(Y -y.>, ~ !, [E
t (I -- t) ~ {I
+p.{m
-I)}]
nlN (I -1t) L
S,2
.=1
{I+p, (m-I)}
(43)
where
N
8.
= N~ 1
L
'-1
(Y ..
-y_)2
436
while the last term is obviously zero. Interchanging the first and
the second terms in (40), and substituting from (43) and (44), we
thus obtain
V(Ynm)
G- ~)
+
So2
n~N
(1
-~)
V(ji"m)
(1n - N1) Sb
1
+ nm
1)}
(46)
where
I
1=1
Ii = - 1: PI
Had the first-stage units been selected with replacement, equation
(46) would be further simplified, giving
(47)
= (-~n _ N
1) S 2 + ~
n
(48)
n _
"
\'
l..J
{"
Vim -
-)2
(49)
y"".
= E ( 1: Ylm2)
- nE (ji,.",2)
(50)
SYSTEMATIC SAMPLING
437
\J
. {) + V (y,.,
II')S.J
[ ....,'p-,2
- E
i.
~ E[
{S" + (I
- ~) ::
;qI+p,(m~ I)}]
x {I -I
p,
(m--l)}}
(51)
11
~n
{Y .. 2 +
?'
v (y,,,,,)}
G.~ ~) So' + n~ (I - ~) ~
x
~I p,(m -I
(I
S.'}
(52)
2
E(Sb )
= Sb
+~
(I
-~) ~
I:
{I +Pi(m-I)}Sh53)
Est. V (ji.",)
= Sbn
(54)
438
Two~Stage
(56)
t '".,J
{~ t
~ {~
E
~E
~ {~ t
E
E('"."
"1
i)}
439
SYSTEMATIC SAMPLING
in virtue of (4),
N
= L.J
\' Pi.
110.
= Y..
(57)
we write
(i, -Z.,)2
=
=
zs.
+ zn. - i,,)2
zn.)2 + E (zn. - Z.. )2
+ 2 {(.in(m;) - Z". ) (z".
(in(m;) - in.
(in(ml) -
- z" )}
(58)
!2 E {i (z'("') -
--
(""'I(m;) _,,)2
n] B E {;'
LJ
"I.
z;. )}
+ 1:, (=I(mll
",of.,
= n~
1: E {(Zjf..., -
>}
.i, .)2 I i}
+ 1",,1'
1: {(Z'(tn;l- .i,.) I i}
X E {(Zj'( .. /l-.ii',)
Ii'}]
The expression under the summation sign in the first term represents the variance of the mean of a systematic sample of mi
selected out of Mi and can be written from (42). The value of
the second term is clearly zero. We, therefore, write
E
('.,~, -
'.J'
~ .; E [
v ('"." I
iJ,,]
L
'.1
N
= Ii
P, ( 1 -
~) ~~'
{I
+ P. (m, -
I)}
440
= 1:
u b/
Pi (Zl. -- Z.Y
j=1
L
N
V(z,)
= U~.2 + 11
Pi (1 -
Hence we have
k) !i;2
{I
+ Pi(m
-In
(59)
= 11 ~ 1
L:
(Zi(mj) -
z,(",,)2
(60)
11 V (z,)
whence
Est. V (z,) = Sb~
(62)
Example 9.2
Reference has been made in Example 9. 1 to a pilot survey for
estimating the total catch of marine fish conducted along a
SYSTEMATIC SAMPLING
441
oc
and
with
Z;",
jiim
A_
NP, = 59A, Yi",
The values of Zim for the selected centres are given in the last
two columns of Table 9.4, and the various steps in the computation of the average catch per hour and its standard error are
442
9.4
No. of
Boats
(1)
(2)
Estimated Average
Catch (Mds./Hr.)
(.vIm)
(Aj)
-----~---.-----.-.-
1
2
3
4
..
6
7
8
9
10
(1) Total
m=2
m=6
m=2
(3)
(4)
(5)
(6)
----~--
68
36
96
45
103
74
'127
174
12
18
m=6
.....
-_ ...
387'50
840'17
1094'17
6167
899'00
140167
677'33
133117
76'50
10983
__
._-_.
729'50
1123 00
754'00
5500
1072'00
127700
16250
634'00
54'50
176'50
= 1} Zim
tnm
f~ (1} Zim)1
fo<1}Zilll)2
= 10 Sb.'
_._-_..
------_._.
44168
1808'90
883'41
10622
676'51
1468'13
41338
592'97
49412
472'93
83151
241783
60876
94'73
806'69
1337'54
99'17
282'42
352'02
760'01
7358'25
7590'68
735'83
75907
541438431
5761842'29
7862281'63
10147764'33
2447897'32
4385922'04
271988'59
487324'67
2719886
48732'47
1649
220'8
22
29
201
339
SYSTEMATIC SAMPLING
443
201
for m = 6
339
for m = 2
REFERENCES
1.
Madow, W. G. and L. H.
(1944)
2.
Madow, L. H. (1946)
..
6. Griffith, A. L. (1945-46)
CHAPTER X
NON-SAMPLING ERRORS
A.
OBSERVATIONAL ERRORS
lOa.l Introduction
In developing the sampling theory in the preceding chapters,
we assumed that the character observed on the i-th unit of the
population (i = 1, 2, ... , N), takes a unique value Yi whenever
the unit is included in the sample, irrespective of the person
who enumerates it. By implication we assumed that a complete
count of all the N units gives a unique value for the mean or
the total of the population. In practice, however, the situa.tion
is rarely so simple as the one described above, since the value
observed on any unit will also depend upon the enumerator
reporting the value. Thus, an eye-estimate of the yield of a crop
in a field will depend upon the judgment of the enumerator making
the estimate and will invariably be different from the true value
of the yield obtained by harvesting the crop in the field. The
magnitude and direc;_tion of the difference will depend upon the
enumerator's intrinsic tendencies or biases and the approach to
the selected unit or interviewee at the time of reporting the
value. Even with factual characters like those of farm facts,
e.g., the area under crops, the number of animals on the farm,
etc., there is found to be a marked variation in the performance
of the same or different enumerators. It follows that even when
the sampling fraction is unity, or, in other words, a complete
count of all the N units is made, the result will vary in repeated
counts. As the errors responsible for this variation arise in the
process of collecting data, they are properly observational errors
but are also referred to as response errors (Hansen et al., 1951).
Together with the errors arising from incomplete samples and
faulty procedures of estimation, they go to make up what are
termed as non-sampling errors.
We have given several examples in Chapter I to show that
the net effect of non-sampling errors on the value of the estimate
NON-SAMPLING ERRORS
445
k = 0, I, 2, ... ,
nil
denote the value reported by the j-th enumerator on the i-th unit
for the k-th occasion. It will be seen that m enumerators have
been assumed to participate in the survey, with the j-th enumerator
making nij observations on the i-th unit in the sample.
The difference between the reported value and the true value
is called the error of ohservation, and for any given measurement
technique will depend upon the enumerator reporting the value,
the interaction of the enumerator with the true value of the unit,
and the mood and like causes at the time of reporting. The
reported value may, therefore, be considered as being made up
of four uncorrelated components as foHows:
(I)
where
represents the bias of the j-th enumerator in repeated
observations on all units,
the interaction of the j-th enumerator with
the i-th unit,
the deviation from Xi + aj + 0i; when the
j-th enumerator reports on the i-th unit on
the k-th occasion,
446
E ('w;.1 i, j)
and
E (8'1
Ii) =
OJ
(2)
n.; ( =
f nij)
f ni;)
NON-SAMPLING ERRORS
Y.j
447
and
Y..
Y.;
(4)
Y..
(5)
and
448
Y.i
Fz
.L:
X,n'j
+ +~ L:,
aj
(6)
,PH
and
h
Y..
L:
Xj
+~
L:a+ L:L:
i
l
n-
jjnjj
(7)
It follows that
1
E(y)=l
.i
N
+lJ
(8)
By definition,
V (jiJ)
= E tv J
E (ji.i)}2
NON-SAMPLING ERRORS
449
Substituting for Y.j and E (Y.j) from (6) and (8), we have
( 10)
~E
(! ~ ,,)'+
x,n" -
-'1'
E(a,
+ ~,
t ,,,n.)
E (
(Ill
jj!
+ i~'
(~ ~ x,n" -
")'
t.
I~I'
S/
(12)
450
where
S.2
~1
(13)
(x, - fL)2
(1 - k)
Sa.l
(14)
where
M
Sa. 2 -_ M 1_]
\' (
.LJ
aj
-)2
(15)
1-1
{f
E ( ,l Ij)
nil
+ 1#"
i: E
ii2 (S,2
(jjI'J
Ij) n1jnl'J}
f n'l)
= S.2
(16)
iz
since
and
E (,,,'j) = 0
~)+Sa.2(1
k) +~ S.I
(17)
+ Sa.'
(18)
NON-SAMPLING ERRORS
451
We write
V (Y.. )
Lv.. - E (Y .. W
t .,-'
+~ t t
~ E{! ~X'-" +~
+~
Now, since
population,
Xl' Xl!, .. , Xh
(t t
,,,n.}(19)
(20)
Similarly,
E
(Im W
f'
a _
J
CL
(21)
452
2
= 8<
8 <2
(22)
- - hp
since
E
(f
";jn ij
Ii)
(,4'
"i),nil' [ / )
=--"
_j_
I
8~
hp
(23)
8
+ hp
<
(24)
~/+ 8,2
h
+ 8ma 2
(25)
(26)
NON-SAMPLING ERRORS
453
454
lOa.4
..
2_
s. -
m- I
'\'(y
,I -
-)2
Y ..
(27)
.E
,
(m - 1)'E(s/) =
E(y,/) - mE (y .. 2)
L'[V(y)
+ {E(y)Pl -
m [v(Y .. ) + {E<Y .. Wl
m {V<Y,.)
+ (p.+a.)2)
J:, V (y,,) -
m V <Y.. )
(28)
2)
In
m-p
S -, -_. -
hp m - I
+ s" 2 +
m S2
hp
(29)
we get
(n - m)E(s 2) = E {
f f y,/ - f Y.l}
ii
(31)
Now putting
n = 1 in
(17), we have
455
NON-SAMPLING ERRORS
Lh
I
-- h - I
-
(h
I
-)2
(34)
J ..
1: E (j.2) - hE <Y.. 2)
i
~,~
1:
{V (yj. )} - h V (ji.. )
(35)
p.
The set of three equations (29), (33) and (37) provides the
estimates of SX2, S,,2 and S/. In particular, we obtain
I=
Est...
S
P (m - 1) (h - 1)
pmh - ph - pm + WI
{2
s.
h (m _ 1)
hp
m'
(m -
of.
I) s.o! J
(38)
456
The
and
E ( s. 2) -- S2
It.
+ S2 + h-=-I
m- 1S
m
..
(41)
S.2
==
(m - I) h_
m
5. 2
+ (II
- m)
s./
(42)
5.
Iim
-;;
(43)
and
(44)
The expression for the estimate of Sa2 given in (43) can also be
derived directly from (38) by putting p = 1 and substituting for
SUI from (42).
It follows from (25) that for Nand M infinitely
large and p = 1,
Est.
v(;,
v..
2
5.
(45)
5.
h-m
in-=!
I
-- (Sl-S
h
I)
(46)
457
NON-SAMPLING ERRORS
Example 10. 1
This example is taken from the crop survey for estimation of
the average yield of wheat conducted in Sind (Pakistan) in
1945-46. The design of the survey was stratified multi-stage
sampling with subdivisions as the strata, a village as the firststage unit, a field as the second-stage unit of sampling and a plot
of 1/40 acre as the ultimate unit of sampling. Within each
stratum the work was divided into two independent samples, one
to be carried out by an official of the Department of Revenue,
and the other by an official of the Department of Agriculture.
Table 10.1 shows the estimates of the average yield for the two
samples, together with the pooled analysis of variance of the
whole sample. For one stratum, namely subdivision Kambar,
TABLE
10.1
.,
Revenue
Agriculture
Combined
1003
54'9
85'6
12
37
25
Source
D.F.
Mean Square
167146 (= E)
Bctween enumerators
Between villages within enumcratoTli
15
73778 (- B)
Within villages
20
3154 (- W)
458
;- L L J;;2}
J
+ m-I
"
where
~- ~ ~>}
true variance between village means;
number of fields in the sample in the subdivision;
number of fields in the sample harvested by
the j-th enumerator;
number of fields in the i-th village for the
j-th enumerator;
number of villages in the sample in the
subdivision;
and
S..I
4684
NON-SAMPLING ERRORS
459
and
S,.2 + Sb 2 + Sw 2 =
40902
S,.2 is thus seen to be a little larger than 10% of the total variation.
A test of significance of the differential bias is provided by the
ratio of E to B. The design being non-orthogonal, however, the
two mean squares are not independent, and the ratio of the two
cannot be regarded as following the F distribution. An appropriate
test may be provided by a comparison of E with
where
L t. (], ~ J,;)}
-(h ::__ m)
and
I _
kt')
kl
w + ~l' n12
kl
f
. (1 - Z:T ~; + (~:Y~;
On reference to Snedecor's F Tables, it will be found that the
observed ratio is smaller than the F . value, showing that S,.2
is not significant.
460
NON-SAMPLING ERRORS
461
Clearly, the sample mean for the loth stratum will be given by
.,
m,
\'x,+_2_
\'a'+E'W
m t W!
I
Y-'=
..
h,
(47)
_._
Y..
E p"i' ..'
1;1
where
Nt
P,
(48)
(Y..' I h,)
= IL,
+ a,
(49)
,:,;;.1
1=1
~'IL'+a
(50)
S2
+ S2 + _!_,,:_
(51)
Similarly
V (Y,.' I h,)
2
S
,_ t.;_
h,
__!_
m,
hi
where Stx 2 is the variance of xit, St./ the variance of at and S.2,
for the sake of simplicity, is assumed to be constant from stratum
to stratum; and
V (Y"
I hI'
,"'I
462
and
m,
Ii;
S ,( )
(54)
whence
Est. V (Y ..' I h,)
(55)
and
(56)
Example 10.2
The data for this example are derived from a pilot survey for
comparing the relative efficiency of plots of different size in
estimating the average yield of irrigated wheat in Moradabad
District of U.P. (India) in 1944-45. The design of the survey
and the method of assigning enumerators were similar to those
described in Example 10.1. Thus, in each of the five subdivisions
of the district, two inr1ependent samples of two villages each were
selected and allocated to two enumerators designated A and B.
In each village two fields were selected and in each field two
plots of each size were marked. The data relating to the plot size
of an equilateral triangle of side 25 links (1179 sq .ft.) are taken here
for illustration. Table 10.2 gives estimates of the average yield
together with the analysis of variance for individual subdivisions,
and also the pooled values for the district. Estimate the contribution to the total variation due to S0.2.
The calculations are straightforward. Substituting from the
table the values of E and B in the formula
I
BaS
t. ,. ==
s,. 2
m,..t.
h, 0)-,( )
(E, - B,)
= -8--
NON-SAMPLING ERRORS
ti
.;:
g,
.'"
is
.s...
'o !.I
'0
'OJ
o-
z~
..,'"g,
~~
~
.... ..,
o !.I
'0
0-
,~
00
-c
g,
"<t
co
10
o !.I
'0
0-
....
,:;
..J
!XI
Pol
....
<:u
Ii::
..
~ ~
~
..
01)
N
N
"<t
00
'"
N
,.:..
'"
o !.I
&
.....'"
-
>
E!
'"
~
C
......
"
.'"
00
0-
..c-
:g
"I:
10
N
'"
,.:..
~
00
00
10
z~
~
-
~
~
o !.I
'0
0_
10
'-
t!i
c~
00
z~
"I:
0- ...,
6 .....
0on
~
N
'c-
':'I
..,\0
"::1
"''''
~rZ
.~
'-
'0
0-
~
...
.~
10
VI
>
~ 'c;:..
V'.)
10
00
.,
'"
\0
.,E
'c~
co
z~
....
....9 ..,.... ..,..,~
~
.... ~ ~
N
,.:..
;;:; ;,., .....
~
....
,_<:
t-i
2-
l: ~
.,~>
-::-
z~
....<.J
....
~g
'-
';;:
....
N
N
0-
~'"
....
.... ....
;:g
.;,
00
c~
10
co
0-
....~
....
00
~
00
..,~
Inl
.....
""
.....
':'I
\0
->
....
00
.....
0-
cd~
0
~
~ ....'" ,.,.....
.8
...,
.....
\0
463
3;
;:g
00
co
u:
~~
~&
':'
.....
"?
VI
.....
....,
00
on
...,0 '"
u:
.~
l\l
c~
':'I
..,
~'"
~g.
0-
on
on
til
.....
':'I
S~
u:
c~
.;,
11'"
~g.
til
':'
..,
on
...,
~
10
'-
o !.I
'0
0-
00
00
z~
...
8
'Wl
i&i :>it
IJ
II II j rl
'S:
'Wl
;g
'S:
;g
::s
(I')
]
~
:!
464
we obtain the values shown in the last row of the table. The
average magnitude of Sta 2 over the five subdivisions works out
to 65693.
Also
Est. (St. 2 + S/) =
ft
whence the average value of Stx 2 + S/ for the district works out
to 13074 8. The total variance of an observation is given by
EP=S2+S2+S2
a.
'II
Ie
E"
= 196441
Thus
Sa2
- _I
Y .. - h
\'h-'
w
,Y..
(57)
and its expected value and variance for a given set of hI, h2'
... , hk by
t
1L
'~1
h, (iL,
+ ii,)
(58)
465
NON-SAMPLING ERRORS
and
L:
}i2
ht (St/+ S/)
+ Z2
t::l
1::,.,
h, 8, ..2
(59)
It is
not
add
tion
seen from (58) that the conditional expected value of 5'" does
equal f..L + ii, To the expression (59) we must, therefore,
the square of the bias component and then take the expectain order to obtain the variance of }'", We write
- E
JIi
1:
k
2'.
'2
h, (StJ . \ S.)
+ II112
I=J
1:A }
h, S'a
11101
(60)
Pt (S'J
+ S/) + Z
1:
PtSta
{t Gt - PI) + t)}2
{t (~ - p,Y +
(fL,
IEl
=E
(fL,
1=1
30
0.,)2
(61 )
466
I: Z=7. l!1~!h-=P_J
k
(/i,
+ ii,)2
1=1
I: %~-1' 1
P1 ..!. (/i,
'".""1
N -h 1
N":'_-1 . h
I
I:
+ ii,) (/i" +
P,P,' (/it
t~-I=t'=l
N-h
1
..... .
N-I h
\ ' P,
W
(/it
2
+ a,-- /i - a)
Ie 1
L
k
~ ~
P, (/i,
+ iii -
(62)
II- - a)2
1"1
!L
k
V(ji.. )
P, (St/+ S/)
+~
!L
L
k
PIS,,,'
P, (/it
+ iii -
" -
ii)2
(63)
'.1
from it (l/h) E
'-I
p~t,,".
we may write
, NON-SAMPLING ERRORS
467
lL
Pt (fLt
+ iit -
fL - ii)2
t=l
+Z
PtStr/ -
PtS tt1.
(64)
t_l
S/
+ S/ + Sta 2) +
,= }; PI (S/Z 2
t~l
2,' PI (fL,
+ ii t
--
fL --- a)2
(65)
t~l
whence
V (Y ) =
..
~/
+ ('m h
hi) W
'\' p,Stt1.
(66)
t_1
k
+ 1 + Ell -
E E {(x I
E E [(x, - fL)
0.
= - ~I + Sa'
(XI' -
(1 - k)
(0.
(XI'
0.
468
where
Cl
C.
c.
and
469
NON-SAMPLING ERRORS
+ S .. [~
2
~J + IL (c 1h + C2m +
('3
y'hm - Co)
(70)
(72)
and
(73)
where
S.. 2
(75)
When
(76)
470
Ca
to
= O. We get
(77)
+ czf3h = Co
Ca
= 0, we get
(78)
or
h
= ___ ~o.
Cl
+ czf3
(79)
and
m=
Cl
f3C o
+ c2i3
(80)
471
NON-SAMPLING ERRORS
10.3
_.-.---
.~---
% Illiterate
% Economically independent
.~~--'---
88
75
90
87
62
95
67
46
77
50
46
31
.- _-
.---~--."
- .
---_ .. -
Values
% Illiterate
% Economically independent
oiS.. 2 ami'S.'
-j
S..' + S.2
574
104
722
Example 10.4
This relates to the data on acreage collected in the course of
a surprise check to which a reference has already been made
in Section 1.8. Acreage under crops in India is compiled by the
village accountants by noting the names of the crops field by
field in the course of their administrative duties. As all the fields
are surveyed and mapped and the area of each field (survey
number) is, therefore, accurately known, the total area under any
crop is obtained by simply adding the area of the fields growing
that crop.
472
Comparison
10.4
0/ Crop-Acreages
Difference
Village
Accountants
Statistical
Staff
(2) - (3)
(2)
(3)
(4)
Wheat
3313224
3227149
86075
Gram
2391840
2415085
Barley
1873291
Arlrar
1888117
(I)
Percentage
over (3)
(5)
27
- 23245
10
2029875
-156584
- 77
2103171
-215054
-102
NON-SAMPLING ERRORS
473
strengthening the supervision over the work of the field staff and
the conduct of similar checks in other parts of the country.
This conclusion is confirmed by treating the data by the methods
developed in this chapter. We have
On analysing the data, it was found that Sa 2 did not exceed 5~~~
of the total variation in the case of any crop.
lOa.9
474
NON-SAMPLING ERRORS
475
IO.S
-----
__
Number of Cases
--------------.-~--.---
Observed
-----------~-.-.---.--
Difference
Expcx:tcd
..
x'
.--~----
109
1895
+90'05
42792
005--0 to
20
18'95
-j
105
006
010--0'90
235
303'20
-6820
15-34
090--095
12
1895
- 695
255
18'95
-1595
379
379'00
095-100
Total
..
1342
_._._-----------45929
476
NON-SAMPLING ERRORS
477
and improving the fieldwork on the spot, whereas replicated samples will usually suggest the need for improvement when the survey is over.
(iv) A supervisor need not be present throughout the operations
connected with the enumeration of a selected unit,
whereas an enumerator under sample survey must enumerate completely every unit assigned to him.
(v) Units selected for supervision mayor may not be selected
by the principle of random sampling, whereas in replicated samples they will necessarily be so selected. When
it is possible to arrange supervision on a probability
basis and the work done by the supervisors is considered
a sub-sample of the work done at the primary level.
supervision may be considered a very special form of
replicated samples subject to the differences mentioned
above. This way supervision can be utilised to improve
the estimates obtained from the work done at the
primary level.
(vi) Replicated samples will not reveal minor defects in an
investigator and will certainly not reveal faults which are
common to all the investigators, whereas this is possible
with supervisory checks.
(vii) Replicated samples alone can estimate observational errors
whereas supervision will not, unless conducted as visualised in (v).
It would be seen that supervision can provide a better control
over fieldwork in a variety of ways which is not possible in the
case of replicated samples. Replicated samples are no alternative
to supervisory check, though the latter can be. Replicated
samples have a place either when the object of the survey is to
compare different methods or different classes of investigators, or
at the pilot stage of a large-scale survey for testing questionnaires
and procedures, but would hardly appear worth while for adoption
as a regular feature of surveys.
478
B. INCOMPLETE SAMPLES
Further, let Nl and N. denote the sizes of the response and the
non-response classes in the population. Clearly, Nt and N. cannot
be known and can only be estimated from the sample. We have
Est. Nl
nlN
479
NON-SAMPLING ERRORS
and
= E {E (nJ'n, I nh n)}
=E
{nJ'N,
I n}
_ nN'YN,
--N-
and
E (naY., I n) = E E E (nJ'., I hs. Yl>
E E (n 2}., I nl. n)
E (nJ'N, I n)
y.,., n)
480
whence
E(j.,) = {NtYN1: N~N.}
(85)
{(J-' - )-, ) + n
n "(I)
2
tI
It~
_ "I'
"'l
J'N }
>}2
-n
- N
)h,
_',
(86)
G- ~)
82
y.y
where
It.
sal
E (YI - ji..,).
n. -1
(87)
481
NON-SAMPLING ERRORS
so that
where
N,
N2 -
" (J'
W
.
-)'
)2
. N,
i=l
Hence
(In _ NI) 52
+ f~-=-J
. NN z
n
S.2
Hence, from
(90)
11
C+!L(V- Vo)
482
and
(93)
which reduces to
Hence
(94)
{S2 + ~ (/ - 1) S22}
(NCo + Nlcl + ~2 Ca)
p.N
Now to find
p.
(95)
we note that
or
( V. + NS')1I =
nl
{N
SI + N
(j - 1) Sat
}2
(96)
NON-SAMPLING ERRORS
483
Z2 (f .....
Va
82
1)822
(97)
+ fir
Equations (94) and (97) thus provide the values of nand f required
to estimate the population mean with the desired standard error
at the minimum cost.
An example will serve to illustrate the method. Suppose the
response rate is 50% and S22 for the non-response group is 4/5
of that in the whole population. In other words,
Nt
Nt
N =N
05
and
8 22 =
482
-:5
To work out the cost of the survey let us assume that it costs
one rupee to contact a unit, four rupees to enumerate and
process information on that unit and eight rupees to enumerate
and process information on the unit in the non-response group.
The total cost of the survey is, therefore, given by
(99)
484
(100)
20
Values
10.6
100
700
50
- 140
700
35
180
780
30
220
880
27
2.
3.
4.
REFERENCES
..
Response Errors in Surveys," Jour. Amer. Statist.
Hansen, M. H., Hurwitz,
W. N., Marks, E. S. and
Assoc., 46, 147-90.
Mauldin, W. P. (1951)
Sukhatme, P. V. and Seth, "Non-Sampling Errors in Surveys," Jour. Ind.
G. R. (1952)
Soc. Agr. Statist., 4, 5-41.
l.e.A.R., New Delhi (1947) Report on the Crop-Cutting Survey for Estimating
the Outturn of Wheat in Sind, 1945-46.
.. Report on the Sampling Survey for Estimating the
- - (1947)
Outturn of Wheat in the United Provinces,
1944-45.
NON-SAMPLING ERRORS
S. International Training
Centre on Censuses and
Statistics for S.E. Asia
(1950)
485
6. Sukhatme, P. V. and
Kishen, K. (1951)
: I \
I
.!IIWIQr-.cI
tJ "
4pll" . . . .
q.'r"__
BANGALORE
UNIVfRSITY LIBRARY,
J\C:
-:,a
DEC "to
r.A flf;a2
filo.Y~
~\J ..
III1U11' _ _.....
_._ ........................... ..
"
INDEX
A
C-(contd.)
Ls
C :
Census
utilization for improving precision of
sample, 182-186.
comparison with sampling method,
453.
Cluster sampling, 5--6, 238-284.
notation, 239-240, 265, 268.
estimates, 240, 265-267, 268.
variances, 240, 247, 266-268, 269.
estimates of variances,
250-251,
269-270.
comparison with systematic sampling,
417-419.
comparison with two-stage sampling,
285, 302-303.
efficiency of, 240-250, 270-284.
efficiency estimated from sample,
250-252.
efficiency in terms of intra-class correlation, 243-247, 270-272.
with probability of selection proportional to size. 268-284.
Cluster size
optimum, 239, 257-264.
relation of variance to, 252-256.
Cochran, W. G., 154, 186, 204, 230, 237,
259, 284.
Collapsed strata, 399-404.
Confidence limits (or coefficient, or
interval)
for means, 38-40.
_f~ns,47.
for ratio estimate, 158~160.
Efficiency, 124.
of cluster sampling, 240-252, 270-284.
of different estimates, 169-170, 174,
213-215, 272-284, 325-326, 329,
343.
of ratio estimate, 160-164, 186.
of regression estimate, 220-223.
of sampling systems with varyin,
probabilities, 272-284, 376-379.
488
INDEX
E-(contd.)
of stratified sampling, 124, 126, 134,
223, 352-357.
of sub-sampling designs, 302-303,
376-379.
Enumerators
bias, II, 13-14, 17,444-448.
covariance of response to, 467-468.
errors (see Observational errors).
optimum number of, 468-470.
selection and assignment of, 447, 460,
465, 473.
supervision of, 11,471--477.
variance of biases, 450, 453.
Evans, W. D., 98, 137.
F
H
Hansen, M. H., 167, 187, 247, 284, 305,
358,399,416,444,468,478,484,485.
Hasel, A. A., 204, 237, 428, 443.
Hendricks, W. A., 256, 284.
Horvitz, D. G., 70, 73, 412, '1116.
Hurwitz, W. N., 167, 187, 247,284, 305,
358, 399, 416, 468, 478, 484, 485.
Hypergeometric distribution (see also
Qualitative characters).
confidence limits for, 47.
for two classes, 42--48.
generalized, 49-54.
mean value, 43-44.
variance, 44--47.
I
Incomplete samples, II, 12,478--484.
Indian Council of Agricultural Research,
8,9,17,389,416,428,443,472,484.
Intra-class correlation
between units of a column, 421.
efficiency of cluster sampling in terms
of, 243-247.
example of negative value, 248-250.
non-circular serial correlation, 423.
within first-stage units, 377.
within-stratum serial correlation, 425.
J
Jessen, R. J., 230, 255-259, 284.
K
Kiser, C. V., 11, 17.
Kishen, K., II, 17,485.
Koop, J. c., 141.
Koshal, R. S., 125, 137.
L
o
Observational errors, 11-12, 444.
control of, 473--477.
measurement of, 445-446.
notation, 445, 446, 447, 460.
estimation of population mean from
observations subject to, 448, 461,
464.
variance of observations subject to,
448--452, 461, 465-467, 468.
Optimum allocation
case of incomplete samples, 481--484.
in assigning units to enumerators,
468--470.
in double sampling, 117-118.
in stratified sampling with simple mean
estimate, 86-90, 92-93, 95-100.
INDEX
489
O-(contd.)
R-(contd.)
Random numbers
use of, 6-10.
table of, 18--19.
Ratio method
notation, 138--139, 179,369,405.
estimates. 139, 166--168, 179, 181,267,
317. 369. 405--406.
bias, 139--147. 166--168, 176. 178. 180.
variances, 146-154. 168-170. 177-178,
180--181, 370-372, 407-409.
estimates of variances. 150-151,181.
comparison with simple mean estimate,
160-161, 164.
p
Panse, V. G., 259, 284,321,357.
Partitional notation, 31-33.
Pattrrson, H. D., 235, 237.
Plot size, 14-17,253,462-464.
Preliminary sample, 42, 95-100.
Probability of inclusion, 4-5, 24-26,
65-71.
Probability sampling. 3, 10.
Probabilities of selection
determination
of optimum.
71,
366-369, 373-376.
equal,4-5.
unequal
at the i-th draw. 24-25.
proportional to size, 181-186,
268-284, 361. 368.
in single-stage sampling. 60-72,
179-186, 268-284.
in stratified sampling, 127-136.
in sub-sampling, 358-415.
in sub-sampling with systematic
sampling, 438-443.
Proportion (see Hypergeometric distribution or Qualitative characters).
Purposive sampling methods, 10.
Q
Qualitative characters, 42-60.
combined with quantitative characters,
54-60.
ratio estimates of. 174-178.
Quota sampling, 10.
490
INDEX
S-(contd.)
S-(contd.)
491
INDEX
S-(conld.)
estimate of variance, 431-433, 436-438,
440.
considered as cluster sampling, 418419.
comparison with simple random
sampling, 420-423.
comparison with stratified sampling,
423-425.
in populations with linear trend,
425-427.
in populations with periodic variation,
427-428.
in natural populations, 428.
in randomly ordered populations, 432.
in two-stage sampling, 433443.
u
Unbiased estimate, 21-22.
Unequal selection probabilities (9 Probability of selection).
Unit of sampling
choice of (see Cluster samplins)
definition, 3.
effect of change of size of, 2S2-2S6,
303-305.
first-stage, 285.
in multipurpose surveys. 257-264, 297.
optimum, 239, 257-264, 305.
second-stage, 285.
third-stage, 285.
"
UNI Vi,RSITY
LlBRAR Y. BANGALORE-5G0024
This book should be returned on or before
the d~lte mentioned below; or else the
Borrower will be liable for overdue charges
as per rules from the DUE DATE.
CI. No.
~ Q APR
Ac. No.
aQw.
2 4 APR 2010
'-16 +o14~ J'
,?"
G1CVlC
Lib!'u,