Chapter 8 Stratified Sampling

This chapter introduces a useful technique called stratification, which is the process of splitting a finite population into subgroups and then taking independent samples from each of those subgroups. The sampling within strata may be a simple random sample, or another design such as cluster sampling. We will however concentrate on the case of simple random sampling as the within-stratum sampling scheme.

Stratification is an example of using auxiliary information about the population at the design stage. This information is used to put each unit into one of the strata.

There are several reasons for stratification. Some of these are:

Efficiency: we partition the sample space so that fewer extreme samples can be selected, or that influential sampling units are isolated and selected with a high probability.
Using auxiliary information to improve the efficiency of the sample design by forming homogeneous groups with smaller coefficients of variation (CVs).
The need to form subpopulation estimates of sufficient accuracy.
Using auxiliary information to overcome nonsampling errors; e.g. it might be that regions have differential response rates so that it useful to group these areas and try to use auxiliary information to improve the estimates.
Administrative reasons, e.g. recruitment and training of field force might be carried in certain centres and it would be natural for these to define strata.

We will briefly look at 1 and 2. In which case when we form strata we are trying to form subgroups of the population which are more homogeneous than the total population. One measure of homogeneity is the CV (coefficient of variation: \(S_Y/\bar\) ) of the variable, and so we try to form subgroups whose CV’s are much less than the population CV’s.

Example. \(N=1000\) students enrol in a first year statistics course. We are given a list of their names, and for each person we are also told whether or not they have a degree: 100 do have a degree already and 900 do not. We are asked to estimate the mean age of students in the class, and we are only allowed to sample 20 students.

How should we proceed?

We could take a simple random sample of students, and calculate the mean age \(\bar\) of the sample, and use that as our estimate. However it’s very likely that the people who already have degrees are older, and moreover the spread of ages in that group is likely to be much wider than in the no degree group. It would therefore make sense to take two separate samples, one from each group, separately estimate the mean of each group, and then combine those estimates to form the overall mean estimate. This is called stratified sampling, and it can lead to estimates which are much more precise than those from simple random sampling.

8.1 Notation for stratified sampling

In stratified sampling we require prior information on every unit in the population (not just the sampled units). We use this prior auxiliary information to classify every population unit into one, and only one stratum. We’ll leave the method of deciding how to form the strata for later.

For the moment suppose that we have determined our strata, and there are \(H\) of them. The \(N\) population units are divided up with \(N_h\) units in each stratum: \[\begin N = \sum_^H N_h \end\] Every population unit belongs to one and only one stratum \(h\) . The proportion of the population in stratum \(h\) is \(F_h\) \[\begin F_h = \frac \qquad \text \qquad \sum_^H F_h = 1 \end\]

Example continued. Our class of \(N=1000\) students can be split into \(H=2\) strata – those without and those with degrees:

\(h\)	Stratum	Stratum Size	Stratum Proportion
1	No degree	\(N_1=900\)	\(F_1=\frac=\frac=0.9\)
2	Has degree	\(N_2=100\)	\(F_2=\frac=\frac=0.1\)
Total	\(N=\sum_hN_h=1000\)	\(\sum_hF_h=1.0\)

We relabel each unit by its stratum \(h\) and unit number \(i\) within that stratum. So the within stratum total for stratum \(h\) is the sum of all the \(Y\) values for units in stratum \(h\) : \[\begin Y_h = \sum_^ Y_ \end\] similarly we have the within stratrum mean and variance: \[\begin \bar_h &=& \frac \sum_^ Y_ = \frac\\ S_h^2 &=& \frac \sum_^ (Y_-\bar_h)^2 \end\] These formulae are identical to those we have had before, only we have the label \(h\) to show that they are being calculated separately in each stratum.

Example continued. Our class of \(N=1000\) students can be split into \(H=2\) strata – those without and those with degrees:

\(h\)	Stratum	Stratum Total	Stratum Mean	Stratum Variance
1	No degree	\(Y_1=18435.6\)	\(\bar_1=20.5\)	\(S_1^2=2.08\)
2	Has degree	\(Y_2=3161.6\)	\(\bar_2=31.6\)	\(S_2^2=50.03\)
Total	\(Y=\sum_hY_h=21597.2\)

The overall population total is simply the sum of all the within stratum totals: \[\begin Y = \sum_^H Y_h \tag \end\] However we have to be careful when combining the within stratum results to form the overall population mean and variance. The population mean is given by \[\begin \bar &=& \frac\\ &=& \frac<\sum_^ Y_h>\\ \tag &=& \frac<\sum_^ N_h\bar_h>\\ &=& \sum_^ \frac\bar_h\\ &=& \sum_^ F_h \bar_h \end\] i.e. the population mean is a weighted sum of the stratum means \(Y_h\) .

Example continued. To find the mean age of the 1000 students, form the weighted sum: \[ \bar = \sum_^ F_h \bar_h = (0.9)(20.5) + (0.1)(31.6) = 21.6 \]

Example continued. To find the variance of the ages of the 1000 students, form the two components:

\[\begin \text &=& \frac\sum_^H (N_h-1) S_h^2\\ &=& \frac = \frac = 6.83\\ \text &=& \frac\sum_^H N_h (\bar_h-\bar)^2\\ &=& \frac<(900)(20.5-21.6)^2)+(100)(31.6-21.6)^2> = \frac = 11.10\\ \text &=& S_Y^2 = \frac\sum_^H (N_h-1) S_h^2 + \frac\sum_^H N_h (\bar_h-\bar)^2\\ &=& 6.83 + 11.10 = 17.93 \end\]

The total variance has been partitioned into two parts by the stratification – most of the total variance is between the strata, which is highly desirable.

The power of stratification lies in the separation of between and witin stratum variance. All our sampling error comes from the within stratum variances \(S_h^2\) , so if we can push as much of \(S_Y^2\) into the between stratum variance, we won’t see that variance in our estimators and they will be much more accurate.

8.2 Estimation in Stratified Sampling

The key concept in stratified sampling is that we have divided the population into \(H\) groups, and we take completely independent samples from each stratum: it’s as if we were running \(H\) separate surveys.

This means that the sampling method can be different in each stratum: we could take a SRS in one stratum, a census in another, a cluster sample in another etc.

Then the natural estimator for the population total is \[\begin \estm_ = \sum_^ \estm_h \end\] where \(\estm_h\) is an estimator (appropriate to the sampling scheme) for the total for stratum \(h\) . The variance of this estimator is easy to work out since the sampling is independent in each stratum. \[\begin \bfa<\estm_> = \sum_^ \bfa<\estm_h> \end\] where \(\bfa<\estm_h>\) is the variance of the estimator for the total for stratum \(h\) .

The natural estimator of the population mean is: \[\begin \estm>_ &=& \frac_>\\ &=& \frac \sum_^ \estm_h\\ &=& \frac \sum_^ N_h\estm>_h\\ &=& \sum_^ F_h\estm>_h \end\] and its variance is \[\begin \bfa<\estm>_> = \sum_^ F_h^2 \bfa<\estm>_h> \end\] This is a weighted sum of the variances of the stratum mean estimates, just as the mean was a weighted sum of the stratum means, but note that \(F_h\) appears as \(F_h^2\) .

8.3 Stratified Simple Random Sampling

To take a specific example of a sampling scheme, suppose that we take a SRS of size \(n_h\) from each stratum and these samples are independent. Then the within stratum estimates of totals, means and proportions are: \[\begin \nonumber \estm_h &=& \frac \sum_^ y_ = N_h\bar_h\\ \estm<\bar>_h &=& \frac \sum_^ y_ = \bar_h\\ \nonumber \estm

_h &=& \frac \sum_^ y_ = \bar_h \end\] with variances \[\begin \nonumber \bfa<\estm_h> &=& N_h^2\left(1-\frac\right)\frac\\ \bfa<\estm<\bar>_h> &=& \left(1-\frac\right)\frac\\ \nonumber \bfa_h> &=& \left(1-\frac\right)\frac \end\] which can be estimated from sample data by: \[\begin \nonumber \bfa<\widehat><\estm_h> &=& N_h^2\left(1-\frac\right)\frac\\ \bfa<\widehat><\estm<\bar>_h> &=& \left(1-\frac\right)\frac\\ \nonumber \bfa<\widehat>_h> &=& \left(1-\frac\right)\frac_h(1-\estm

_h)> \end\] These formulae are identical to the SRS formulae we have had earlier, with the single change that we have a subscript \(h\) to indicate that the estimates and variances are specific to stratum \(h\) .

Example continued. Assume we have drawn a sample of size 10 from each of the two strata. The sample statistics are as given in the following table:

Stratum	Stratum Size	Stratum Fraction	Sample Size	Sampling Fraction	Sample Weight	Sample Mean Age	Sample Variance
\(h\)	\(N_h\)	\(F_h=\frac\)	\(n_h\)	\(f_h=\frac\)	\(w_\)	\(\bar_h\)	\(s_h^2\)
1	No degree	900	0.9	10	0.0111	90	20.3	3.22
2	Has degree	100	0.1	10	0.1000	10	37.8	56.2
Total	1000	1.0	20

Note that the sampling fractions and hence the sample weights are different in the two strata.

Using these data we can estimate the mean age in each stratum, and create a 95% confidence interval for each estimate:

Stratum	Estimated Mean Age	Variance of Estimate	Std. Error of Estimate	RSE	95% Conf. Int.
\(h\)	\(\estm>_h=\bar_h\)	\(\bfa<\estm>_h>\)	\(\bfa<\estm>_h>\)	\(\bfa<\estm>_h>\)
1	No degree	20.3	0.3181	0.56	0.028	(19.2, 21.4)
2	Has degree	37.8	5.0573	2.25	0.060	(33.4, 42.2)

Example continued. An overall estimate of the mean age of the \(N=1000\) students from the sample is given by \[ \estm> = \sum_^H F_h\bar_h = (0.9)(20.3) + (0.1)(37.8) = 22.1 \] with variance \[\begin \bfa><\estm>> &=& \sum_^ F_h^2 \left(1-\frac\right)\frac\\ &=& (0.9)^2\left(1-\frac\right)\frac +(0.1)^2\left(1-\frac\right)\frac = 0.3085 \end\] leading to the following 95% confidence interval for the mean student age in the whole class of 1000 students: \[ 22.1 \pm (1.96)\sqrt = 22.1\pm1.1 = (21.0,23.2) \]

Example continued. Now assume that in the example above we had asked about car ownership, and found that 3 of the sampled students without degrees owned a car, whereas there were 8 car owners among those with degrees. What is the proportion of car owners in the whole population?

First compute the proportions in the two strata: \[\begin \widehat
_1 &=& \frac = 0.30\\ \widehat
_2 &=& \frac = 0.80 \end\] Then combine these estimates to form the full population estimate: \[ \widehat
= \sum_hF_h\widehat
_h = (0.9)(0.30) + (0.1)(0.80) = 0.35 \] with variance \[\begin \bfa> &=& \sum_h F_h^2\left(1-\frac\right) \frac_h(1-\widehat
_h)>\\ &=& (0.9)^2\left(1-\frac\right)\frac +(0.1)^2\left(1-\frac\right)\frac = 0.01885 \end\] leading to the following 95% confidence interval for the proportion of car owners in the whole class of 1000 students: \[ 0.35 \pm (1.96)\sqrt = 0.35\pm 0.27 = (0.08,0.62) \] Scaling these estimates by the population size \(N=1000\) gives an estimate of the total number of car owners \[ \widehat = N\widehat
= (1000)(0.35) = 350 \] with 95% confidence interval \[ N\times(0.08,0.62) = 1000\times(0.08,0.62) = (80,620) \]

8.4 Comparison of sampling schemes: the Design Effect

When we evaluate a sampling scheme, our main concern is usually to see whether it results in improved or worsened estimates than those obtainable under other sampling schemes. By improved we usually mean estimates with a smaller sampling error, although other factors such as cost and the need to form good subpopulation estimates may be important as well.

The standard comparison we make is to compare the variance of an estimator \(\estm_\text\) under the proposed complex sampling scheme, with the variance of the equivalent estimator under simple random sampling \(\estm_\text\) with the same sample size. This comparison is made by forming the design effect – which is the ratio of the two variances: \[\begin \bfa<\estm_\text> = \frac<\estm_\text>><\estm_\text>> \tag \end\] If the Deff is greater than 1, then the variance of \(\estm_\) is greater than that of the SRS estimator \(\estm_\) . The estimator \(\estm_\) is then said to be less efficient than \(\estm_\) . A desirable Deff is therefore less than one, indicating that the complex design is more efficient.

Example continued. We want to know if the stratified estimator of the mean student age is an improvement over simple random sampling. For \(n=20\) the SRS estimator of the mean has variance \[ \bfa>_> = \left(1-\frac\right)\frac = \left(1-\frac\right)\frac = 0.8786 \] For a stratified sample of \(n=20\) allocated with 10 units in each stratum \[\begin \bfa>_> &=& \sum_^H F_h^2\left(1-\frac\right)\frac\\ &=& (0.9)^2\left(1-\frac\right)\frac + (0.1)^2\left(1-\frac\right)\frac = 0.2117 \end\] so the design effect is \[ \bfa>_> = \frac<\bfa>_>><\bfa>_>> = \frac = 0.24 \] This is less than one, showing that the stratified estimator (for this allocation of a sample of size \(n=20\) ) is more efficient.

8.4.1 Estimation of the Deff for Stratified SRSWOR

In general \(S_Y^2\) and \(S_h^2\) , which are required in the formula for the Deff, are unknown and must be estimated using sample data. The within stratum variances \(S_h^2\) are simply estimated by the corresponding sample variances \(s_h^2\) , however the overall population variance is a little more complex. From Equation (8.3) we had \[ S_Y^2 = \frac\left( \sum_^H (N_h-1) S_h^2 +\sum_^H N_h (\bar_h-\bar)^2\right) \] and it follows that an estimate of \(S_Y^2\) is given by \[\begin \estm_Y^2 = \sum_^H F_h s_h^2 + \sum_^H F_h(\bar_h-\estm<\bar>_)^2 \ \ \ \text\ \ \ \estm<\bar>_ = \sum_^H F_h\bar_h \end\] > = ^H F_h_h \end So to calculate the Deff of the estimator of the mean from sample data in stratified SRSWOR we calculate \[\begin \bfa><\estm<\bar>_> &=& \sum_^H F_h^2\left(1-\frac\right)\frac\\ \bfa><\estm<\bar>_> &=& \left(1-\frac\right)\frac \left[ \sum_^H F_hs_h^2 + \sum_^H F_h (\bar_h-\estm<\bar>_)^2 \right] \end\] and then compute \[ \bfa><\estm<\bar>_> = \frac<\bfa><\estm<\bar>_>>< \bfa><\estm<\bar>_>> \] Note: to get a good estimate of the variance of the estimator and its Design Effect we need sufficient sample size in each stratum to get reliable estimates of \(\bar_h\) and \(s_h^2\) . Small samples lead to very unreliable estimates of variances in general.

Example continued. We have already estimated \(\bfa>>_>=0.3085\) from the sample. It remains to estimate \(\bfa>>_>\) : \[\begin \bfa>>_> &=& \left(1-\frac\right)\frac \left[ \sum_^H F_hs_h^2 + \sum_^H F_h (\bar_h-\estm>_)^2 \right]\\ &=& \left(1-\frac\right)\frac [ (0.9)(3.22)+(0.1)(56.2) \\ && \qquad\qquad +(0.9)(20.3-22.1)^2+(0.1)(37.8-22.1)^2 ]\\ &=& (0.049)[8.518+27.565] = 1.7681 \end\] leading to an estimate of the Deff of \[ \bfa>>_> = \frac<\bfa>>_>>< \bfa>>_>> = \frac = 0.17 \] which is similar to the true value of 0.24, and likewise indicates that the stratified design is much more efficient than SRSWOR.

8.4.2 Interpretation of the Design Effect

The Deff can be used in two important ways:

For a specified accuracy, the Design Effect tells us by what factor our sample size is reduced (or increased) by the use of a complex design. \[ n_ = \times n_ \] Example. For a variable with a Design Effect of 0.2 what size sample is required to achieve the same accuracy as a SRSWOR with \(n_=500\) ? \[ n_ = 0.2\times500 = 100 \] Only a sample of size 100.
For a specified sample size, The design effect tells us by what factor our margins of error are reduced (or increased) by use of a complex design. \[ _ = \sqrt<>\times _ \] Example. In a SRS we achieve a margin of error of \(\pm25\) for a sample size of 1000. For the same sample size in a complex design where the design effect is 0.2, what will the margin of error be? \[ _ = \sqrt\times 25 = 11.2 \] Only a margin of error of \(\pm11\) .

If the Deff is less than 1, the complex design requires a smaller sample size for the same accuracy, OR achieves lower margin of error for the same sample size. i.e. the complex design is better. If the Deff is greater than 1, the SRS is better than the complex design.

8.4.3 Details

In general the Design Effect is the ratio of the variance of an estimator under some complex design (such as stratified sampling), to the variance of an estimator under SRSWOR, with the same sample size: \[\begin \bfa> = \frac_>>_>> \end\]

Now assume we have a sample of size \(n_1\) in the complex design, and a sample of size \(n_2\) in a SRSWOR, and we have chosen \(n_2\) so that the variances of the two estimators are equal: \[ \bfa>_>_ = \bfa>_>_ \] then the Deff becomes simply \[\begin \bfa>_>_ = \frac \end\] Thus if it takes a sample of size \(n_\) to achieve a certain accuracy under SRSWOR, then it will take a sample of size \[\begin n_ = n_\times \bfa>_> \end\] to achieve the same precision under the complex design.

Example. For a variable with a Design Effect of 0.2 what size sample is required to achieve the same accuracy as a SRSWOR with \(n_=500\) ? \[ n_ = 0.2\times500 = 100 \] Only a sample of size 100.

The Design Effect thus tells us by what factor our sample size is reduced (or increased) by the use of the complex design.

We can also use the design effect to see the effect of a design on confidence intervals. The SRS 95% confidence interval is \[ \widehat_ \pm 1.96\times \bfa<\widehat_> \] Using the Deff the variance and standard error of \(\estm_\) are \[ \begin \bfa<\widehat_> &= \bfa<\widehat_> \bfa<\widehat_>\\ \bfa<\widehat_> &= \sqrt<\bfa<\widehat_>> \bfa<\widehat_>\\ \end \] then under the complex design the equivalent 95% confidence interval \[ \widehat_ \pm 1.96\times \bfa<\widehat_> \] can be written \[ \widehat_ \pm 1.96\times \sqrt<\bfa<\widehat_>>\bfa<\widehat_> \]

8.5 Sample Weights in Stratified SRSWOR

The weight of sample member \(k\) in stratum \(h\) of a stratified simple random sample is \[ w_ = \frac \] Estimates of totals in stratified SRSWOR are formed just as they are in SRS, but the weights can differ between sample members due to the differing sample fractions in each of the strata.

The goal of stratification is to put units which are similar to each other, but different from the rest of the population, all together in a single stratum.

If a stratum contains a lot of highly unusual and influential units then we sample from that stratum with high probability, and consequently give each sample member from that stratum a low weight. A good example of this is in buiness surveys where a small number of big companies can dominate estimates of total revenues in certain sectors of the economy. Such companies are usually grouped together into a single stratum, and a census is taken in that stratum.

8.6 Steps in Stratified Sampling

Given a survey population of size \(N\) form a stratified sample of size \(n\) by the following steps.

Identify the stratification variable(s), \(_i\) . \(_i\) must be known for every unit on the frame.
Form the strata.
Decide how many there will be ( \(H\) ), and which values of \(\) will belong to which stratum. Each unit \(i\) is then assigned to one and only one of the strata on the basis of its \(_i\) value. There are \(N_h\) units from the population in stratum \(h\) , so that \[ N = \sum_^H N_h \] The stratum fractions are \[ F_h = \frac\] and these sum up to 1: \[ \sum_^H F_h = \sum_^H \frac= \frac\sum_^H N_h = 1 \]
Allocate the sample \(n\) to the strata. i.e. decide how many units \(n_h\) will be sampled from stratum \(h\) (for \(h=1,\ldots,H\) ). These stratum sample sizes add to the total sample size: \[ n = \sum_^H n_h \] The fractions of the sample allocated to each stratum are \[ p_h = \frac\] and these sum up to 1: \[ \sum_^H p_h = \sum_^H \frac= \frac\sum_^H n_h = 1 \]
Draw the sample. Draw \(n_h\) units from the \(N_h\) in each stratum \(h\) , according to the chosen sampling scheme.
Calculate estimates and their variances. Make estimates both within strata as well as combined estimates for the whole population, together with their variances.

8.7 Formation of Strata

We have some auxiliary variable \(X_i\) which we believe is correlated with the variable of interest \(Y_i\) . We have a (measured or estimated) value of \(X_i\) for every unit in the population: this may have come from some previous survey or census.

How can we use this information to form strata?

We might use \(X_i\) as a way of identifying those very few population units which need special treatment, and after putting them into their own full-coverage stratum we might decide that no further stratification is necessary.
Sometimes the strata are defined via ‘natural’ subpopulations or geographic areas – in which case \(X_i\) is the region the unit belongs to.
We may use the stratification to produce strata which are as homogeneous as possible, with the aim of having improved efficiency of the final estimates.

One way of producing homogeneous strata is the cumulative \(\sqrt\) rule>: The rule is to form stratum boundaries so that the intervals are equal on the cumulative \(\sqrt\) scale. This requires carrying out the following steps:

Get the frequency distribution \(f(X)\) (histogram) for the auxiliary variable \(X\) . (The number of histogram bins \(M\) should be much greater than the number of strata \(H\) required.)
take the square root of the frequency in the each bin: call this \(\sqrt(X)>\)
form cumulative \(\sqrt\) for each bin \(j\) i.e. \(\sum_^\sqrt(X)>\)
split the total cumulative \(\sqrt\) into \(H\) equal intervals:

i.e. divide the total cumulative \(\sqrt\) by \(H\) , call this \(I=\frac\sum_^\sqrt\) and consider the \(H-1\) numbers \(I,\; 2\times I, \; \ldots, \; \left(H-1\right) \times I\)

For each of these numbers \(h\times I\) , look for the histogram bin whose cumulative \(\sqrt\) is closest to \(h\times I\) and then the stratum boundary is the right hand end of that histogram bin. (Clearly 0 and the population size are also boundaries.)

This method is an approximation which can be done with hand calculations and say a published table showing the frequencies distribution in certain intervals.

An example of the applying the cumulative \(\sqrt\) rule

Suppose you wish to estimate the number of days injured people are off work by taking a sample of records: this is our variable of interest \(Y\) . Assume that you have information on the number of ACC claims that employers send in each week: this is your auxiliary information \(X\) which you have for every employer – and we have good reason to expect that \(X\) and \(Y\) are correlated, which means that \(X\) will be a good stratification variable.

Suppose that you wish to form four strata. The Table 8.1 sets out the necessary data for calculating the stratum boundaries. For each of 5000 companies we know \(X\) , the number of claims that company makes per week, on average. The data are given in Table 8.1.

(Note: These average numbers of claims are decimal values, and \((5,6]\) means the interval 5 to 6 excluding 5 but including 6.)

Table 8.1: \(X\) : Number of ACC claims per month

\(j\)	No. of ACC claims per week	\(f_j\)	\(\sqrt\)	\(\sum_^j\sqrt\)
1	(0-1]	459	21.4	21.4
2	(1-2]	841	29.0	50.4
3	(2-3]	931	30.5	80.9
4	(3-4]	783	28.0	108.9
5	(4-5]	575	24.0	132.9
6	(5-6]	419	20.5	153.4
7	(6-7]	291	17.1	170.4
8	(7-8]	222	14.9	185.3
9	(8-9]	159	12.6	197.9
10	(9-10]	100	10.0	207.9
11	(10-11]	73	8.5	216.5
12	(11-12]	58	7.6	224.1
13	(12-13]	34	5.8	229.9
14	(13-14]	20	4.5	234.4
15	(14-15]	10	3.2	237.6
16	(15-16]	6	2.4	240.0
17	(16-17]	6	2.4	242.5
18	(17-18]	3	1.7	244.2
19	(18-19]	2	1.4	245.6
20	(19-20]	3	1.7	247.3
21	(20-21]	2	1.4	248.8
22	(21-22]	1	1.0	249.8
23	(22-23]	1	1.0	250.8
24	(23-24]	1	1.0	251.8

A histogram of the auxiliary data \(X\) is:

ACC Claims Distribution

Figure 8.1: ACC Claims Distribution

Since \(\sum_^\sqrt(y)>= 251.8 \approx 252\) and we want \(H=4\) strata, the interval boundaries on the cumulative \(\sqrt\) scale are \(252/4, 2 \times 252/4, 3 \times 252/4\) , i.e. \(63, 126, 189\) . The actual cumulative \(\sqrt\) numbers nearest these are: \(50.4, 132.9, 185.3\) , so that the interval boundaries on the \(f\) scale are \(2, 5, 8\) : i.e 1-2 claims a week, 4-5 claims a week, and 7-8 claims a week. Therefore the four strata we form are \((0,2], (2,5], (5,8], (8,24]\) .

Where we have access to detailed information from a previous Census, with the advent of powerful computers, it is conceivable to consider tackling this problem as one of minimizing the variance of the stratified estimator subject to some constraints such as stratum size, etc. Also in practice when we survey we don’t collect just one variable. This means that it is very likely that different variables may require different stratifications, which is not practicable. Hence we need to find compromise stratifications and this is only practical by using computers.

Alternatively, we can consider applying multivariate classification methods to form homogeneous groups within the population which will become our strata, or building blocks for them. Such methods appeal to infinite population models and hence may seem subject to criticisms about the reasonableness of such models. However, having formed the strata, under classical finite population sampling the inferential framework comes from the randomization of the independent samples and not any models used to form the strata. So our inference is still somewhat assumption free.

Moreover, whatever statistical methods we use for forming strata, our aim is to form strata which are robust to changes in the population. Hence we shouldn’t blindly optimize our design on historic data.

8.8 Allocation

Once the strata have been defined we have a total sample size \(n\) , which we want to allocate to each stratum in proportions \(p_h\) , so that the proportion of the sample allocated to stratum \(h\) is \[\begin p_h = \frac \end\] and \[\begin \sum_^H p_h=1 \end\] If we know the allocation proportions \(p_h\) and the sample size \(n\) then the number allocated to stratum \(h\) is the nearest integer to: \[ n_h = p_h n \]

There are several methods for allocating the sample. Some of these are:

Equal Allocation \[ p_h = \frac\ \ \ \text\ n_h = \frac\] Put an equal number of units into each stratum, irrespective of the stratum properties.
Proportional to population size \[ p_h = \frac= F_h\ \ \ \text\ n_h = \frac\times n \] In large samples this is what you would expect if you took an SRS of the total population and then formed the strata. So this does not produce many gains in efficiency.

This is also called a self weighting design because the weights are the same for all sample members, no matter which stratum: \[ w_h = \frac = N_h\frac = \frac \] which are the same weights we’d get if we took a SRSWOR. (However the design still needs to be analysed using the Stratified SRS formulae.)

Neyman Allocation \[ p_h = \frac= \frac\] Neyman allocation gives the lowest possible variance at a fixed sample size. Here we are allocating the sample to the strata with the greatest variance, but also accounting for the size of the strata and hence how much is contributes to the overall estimate: recall \[ \bfa>_> = \sum_^ F_h^\left(1 - \frac\right)\frac\] where \(F_h=N_h/N\) If all the strata had the same or very nearly the same variances then Neyman allocation would be the same as proportional allocation.
Optimal Allocation \[ p_h = \frac>\right)> = \frac>\right)> \] where the cost of surveying is the sum of a base cost for the whole survey, and a varying cost per unit in each stratum ( \(c_h\) ) i.e. \[ \mbox = c_ + \sum_hc_hn_h \] Optimal allocation gives the lowest possible variance at a fixed survey cost. Note Neyman allocation is the case where the cost is equal in each stratum.
Big enough so that you can form accurate stratum estimates, where the strata are subpopulations of interest. Typically Neyman allocation leads to very accurate population estimates but poor subpopulation estimates. One way of achieving a good compromise between population and subpopulation estimates is due to Bankier, which he calls Power Allocation. Neyman allocation is a special case of this.

8.8.1 Allocation example

In a survey of students, 100 students are to be allocated across two strata: undergraduate and postgraduate. It costs twice as much to survey a postgraduate as an undergraduate, and the standard deviation of age (a key design variable) is three times higher amongst postgraduates than among undergraduates. 20% of the student body are postgraduates.

Allocate these 100 students across the two strata using each of the following methods:

Equal Allocation
Proportional Allocation
Neyman Allocation
Optimal Allocation

There are \(H=2\) strata. The information we have is

\(h\)	Stratum	Stratum Fraction \(F_h\)	Cost, \(c_h\)	Std. Dev. \(S_h\)
1	Undergraduate	0.80	\(C\)	\(S\)
2	Postgraduate	0.20	\(2C\)	\(3S\)

Note that for these calculations we don’t actually have to know the value of the undergraduate cost \(C\) , or the standard deviation of undergraduate age \(S\) : just their relative sizes.

We have \(n=100\) students to allocate.

\(h\)	Stratum	\(p_h\)	\(n_h=np_h\)	Cost, \(n_hc_h\)
1	Undergraduate	0.50	50	\(50\times C=50C\)
2	Postgraduate	0.50	50	\(50\times 2C=100C\)
Total	1.00	100	\(150C\)

\(h\)	Stratum	\(p_h\)	\(n_h=np_h\)	Cost, \(n_hc_h\)
1	Undergraduate	0.80	80	\(80\times C=80C\)
2	Postgraduate	0.20	20	\(20\times 2C=40C\)
Total	1.00	100	\(120C\)

\(h\)	Stratum	\(F_hS_h\)	\(p_h=F_hS_h/\sum_k F_kS_k\)	\(n_h=np_h\)	Cost, \(n_hc_h\)
1	Undergraduate	\((0.8)(S) = 0.8S\)	\(0.8S/1.4S = 0.57\)	57	\(57\times C=57C\)
2	Postgraduate	\((0.2)(3S) = 0.6S\)	\(0.6S/1.4S = 0.43\)	43	\(43\times 2C=86C\)
Total	\(1.4S\)	1.00	100	\(143C\)

\(h\)	Stratum	\(F_hS_h/\sqrt\)	\(p_h \propto F_hS_h/\sqrt\)	\(n_h=np_h\)	Cost, \(n_hc_h\)
1	Undergraduate	\((0.8)(S)/\sqrt = 0.80S/\sqrt\)	\(0.80/1.22 = 0.66\)	66	\(66\times C=66C\)
2	Postgraduate	\((0.2)(3S)/\sqrt = 0.42S/\sqrt\)	\(0.42/1.22 = 0.34\)	34	\(34\times 2C=68C\)
Total	\(1.22S/\sqrt\)	1.00	100	\(134C\)

Total costs are \[ \text = n_1C_1 + n_2C_2 = n_1C + n_22C = (n_1+2n_2)C \]

Relative costs, compared to Equal allocation:

Method	Allocation	Total Cost	Relative Cost
Equal	(50,50)	\(150C\)	1.00
Proportional	(80,20)	\(120C\)	0.80
Neyman	(57,43)	\(143C\)	0.95
Optimal	(66,34)	\(134C\)	0.89

At fixed sample size, Neyman allocation is always best. It is 5% cheaper than equal allocation. Proportional allocation is the cheapest, but not the most efficient.

8.8.1.1 An example of allocation rules

Table 8.2 displays the difference between Proportional and Neyman Allocation for the ACC data and the four strata chosen by the cumulative \(\sqrt\) rule.

Table 8.2: Number of ACC claims per week: allocation rules

Stratum	Size	Std. Dev	Fraction	Equal	Proportional	Neyman
\(h\)	\(N_h\)	\(S_h\)	\(F_h\)	\(N_hS_h\)	\(p_h=\frac\)	\(p_h=F_h\)	\(p_h=\frac\)
1	1300	0.496	0.496	644.8	0.25	0.26	0.14
2	2289	0.853	0.853	1952.5	0.25	0.46	0.43
3	932	0.852	0.852	794.1	0.25	0.19	0.17
4	479	2.441	2.441	1169.2	0.25	0.10	0.26
Total	5000	1.000	4560.6	1.00	1.00	1.00

8.8.1.2 Notes

Because of rounding error it is possible that the number of units allocated to the strata is less than the desired overall sample size. In this case it is probably best to add the extra unit to the stratum with the highest variance.
With Neyman allocation and optimal allocation it is possible that the allocated sample size for a stratum is greater than the population size for that stratum. If that happens then:
1. make that stratum a full coverage stratum i.e. select all population units with probability 1. Suppose there are \(N_\) of them.
2. remove this full stratum from the population so the population size is now \(N-N_\) and re-do the Neyman allocation or optimal allocation on the remaining strata so the sample size to be allocated is now \(n-N_\) .
Note that \(\bfa>_>\) can be greater than \(\bfa>_>\) if the variability between strata is less than the variability within strata.

Note also that this decomposition of variance shows that a stratified SRS sample design is more efficient than an SRS sample design. With a good choice of stratification, the design effect of stratified SRS under Neyman or optimal allocation is often considerably less than 1.

8.9 Calculating Sample Sizes in Stratified SRSWOR

In general in the allocation of a total sample of size \(n\) across strata we can write \[ n_h = p_hn \] where \(p_h\) is the proportion of the sample allocated to stratum \(h\) , and \(\sum_hp_h=1\) .

Now assume that we want a particular standard error of an estimate of the total \(\bfa>\) . So we can write \[\begin \bfa> &=& \bfa>^2\\ &=& \sum_^H N_h^2 \left(1-\frac\right)\frac\\ &=& \sum_^H N_h^2 \left(1-\frac\right)\frac\\ &=& \frac\sum_^H \frac - \sum_^HN_hS_h^2 \end\] which can be rearranged to \[\begin n = \frac<\sum_^H \frac>< \bfa>^2 + \sum_^HN_hS_h^2> \end\] This expression is particularly simple for proportional allocation where \[ p_h = \frac \] in which case \[\begin n = \frac< \bfa>^2 + \sum_^HN_hS_h^2> \end\] Neyman allocation, where \[ p_h = \frac \] also results in a particularly simple form: \[\begin n = \frac<\left(\sum_^H N_hS_h\right)^2>< \bfa>^2 + \sum_^HN_hS_h^2> \end\]

Note that strata may be used in a sample design not so much to control the variance, but to control the sample size in subpopulations. If that is the case it is best to calculate the minimum sample size required separately for each stratum, based on a required standard error for each stratum.