You are on page 1of 14

CARIBBEAN

COMMUNITY

SECRETARIAT RESTRICTED SCCS/2005/30/28

THIRTIETH MEETING OF THE STANDING COMMITTEE OF CARIBBEAN STATISTICIANS Kingston, Jamaica 26-28 October 2005

26 October 2005

CONSTRUCTING A PROXY MEAN TEST USING SURVEY DATA AN EXPOSITION OF THE METHODOLOGY

By Mr. Anand Persaud

Presented at the Thirtieth Meeting of the Standing Committee of Caribbean Statisticians, Kingston, Jamaica, 26-28 October 2005

INTRODUCTION

A proxy means test is by far the most objective means test for assessing ones eligibility for social welfare assistance (see Grosh (1994) for an assessment of the several mechanisms in the literature). This mechanism assesses each potential beneficiary on the basis of his/her welfare status, rather than on income or wealth as is required by the other assessment mechanisms. It uses a scoring formula to assess the true economic status of each potential beneficiary. For this reason, it is an excellent poverty assessment mechanism. While it aims at targeting those most affected, a Proxy Means test ensures that benefits go to the people who need them most and that transfers are made in a most efficient and transparent manner. This paper present an exposition of the methodology in constructing a proxy means test using household/individual data (generated from surveys), by use of regression analysis.

Problems with the Other Means Tests Usually, a means test assesses ones eligibility for social welfare benefits based on ones income and wealth. This requires a fairly accurate measure of the income of an individual, which should include not only his earned income but also transfers, remittances and incomes from informal and other sources. In practice, such means test suffers from several problems. Firstly, potential beneficiaries have an

incentive to understate their income. Verification of that information is difficult in


2

developing countries since reliable data does not exist.

Secondly, income is

considered an imperfect measure of welfare in developing countries since the value of own-produced goods, gifts and transfers, which together comprise a sizable proportion of ones income, are usually excluded because of the unavailability of data or, when used, are not measured accurately. Thirdly, incomes of the poor in developing countries are very volatility due to factors ranging from seasonality of agricultural produce to sporadic nature of employment. This is especially so in rural communities and the informal sector. In light of these issues, arriving at a fairly accurate measure of income to assess the welfare of potential beneficiaries is likely to be highly distorted.

Why a Proxy Means Test A means test that avoids the problems involved in relying on reported income is therefore appealing. A proxy means does just that. This test uses data on household or individual characteristics to proxy household income or welfare, which is then used to predict welfare. The obvious advantage of proxy means test is that good predictors of welfare such as socio-economic data, demographic data, housing characteristics, and ownership of household durables/assets are easier to collect and verify than direct measures like income. Grosh (1994) in her studies in Latin America, showed that proxy means tests tend to produce the best incidence of outcomes when compared with the other targeting mechanisms.
3

Empirical Evidence Haddad, Sullivan and Kennedy (1991) used household survey data from Ghana, the Philippines, Mexico and Brazil to show that data on some household variables, which are very simple to collect, could serve as good proxies to measure caloric adequacy of pre-school children. Glewwe and Kanaan (1989) used household and individual data for Cote dIvoire to predict welfare levels and demonstrated that simple regression predictions could improve targeting, markedly. More recently, Grosh and Glinskaya (1997) used regression analysis with household data from Armenia to show how the targeting outcomes of a cash transfer programme could be improved. Using the Living Standards Measurement Survey Data sets from Jamaica, Bolivia and Peru, Grosh and Baker (1995) explored the kind of information to be included in a proxy mean test to improve targeting. Their results showed that survey data sets when used, improved targeting, markedly. They also found that more information was generally better for the scoring formula, though there were diminishing returns.

Designing a Proxy Means Test Several issues need to be carefully considered when designing a Proxy Means test. Potential beneficiaries for welfare benefits tend to come from the lower quintiles of survey data, in which case it will be inappropriate to use the full sample since the weights derived will represent the average characteristics of the larger population.
4

For social welfare programmes, beneficiaries are likely to come from the 1-3 quintiles, in which case the estimation should include only that segment of the population, to yield optimal results.

The second issue involves the variables to be included in the analysis. However, the choice of sample shapes what variables should to be included. This is so because household/individual characteristics of the entire survey could misrepresent those of potential beneficiaries, for the very reason pointed out earlier. Handa (2001) pointed out that there are some variables that would be found in the general population but not prevalent among the poorer segment of the population. Motor vehicles, for example, are found in only one percent of the population and that durables such as electric stoves and washing machines are not found in the lower 60 percent of the population. For any social assistance

programme therefore, such items should be excluded from the analysis and include as many of those items which are correlated with the target group. This clearly suggests that sample issues must first be settled since the selection of variables for the proxy means test are derived from the sample. A word of caution at this point. When there are too many variables, the burden of verifying each increases. But, as Grosh and Baker (1995) pointed out, more information is generally better than less, for targeting.

The third issue has to do with the methodology. Latin America principally uses a qualitative principal components approach which constructs a proxy indicator of welfare using the characteristics of the household. The other methodology derives a scoring formula using regression analysis. Castano (2001) compared both methodologies using data from Colombia and found them to yield similar weights for the different variables. This paper focuses on the latter methodology since it is far less complex in nature while its estimates are easier to derive and interpret.

The Regression Equation The general regression equation to derive the proxy means test takes the form: where: Yi = B0 + Bi Xi + Bj Xj + e i Yi where ei N (01 2)

is the log monthly/annual household consumption per person (log per capita consumption expenditure is found to work well in regressions).

Xi

is a set of variables describing the household, demographic and durable/asset characteristics, and the location variable

Xj

is a set of variables describing the economic environment of the household (these include employment status, other sources of income, mode of transportation, etc.).

Bi and Bj

are parameters to be estimated.


6

ei

is the random error term assumed to be normally distributed with mean (u) = 0 and constant variance ( 2 ).

This regression equation is estimated using Ordinary Least Squares (OLS) method.

In the literature, consumption expenditure is generally considered a more accurate measure of welfare than income, for two major reasons. Firstly consumption expenditures tend to be less variable than income over time, and it is more likely to reflect the households true economic status. Secondly, in practice, consumption is generally measured with far greater accuracy than income in a household survey. Using regression analysis, consumption expenditure (as a measure of household welfare) is regressed unto a set of explanatory variables that are correlated with household/individual welfare.

Issues To predict welfare, the consumption variable is regressed (using OLS method) on different sets of explanatory variables. This method is driven primarily by

convenience and ease of interpretation, but it has its problems. The first problem is that many of the explanatory variables are likely to be endogenous to household welfare but because our problem is to identify rather than to explain the reason, this problem is not of a major concern. Second, Grosh and Baker (1995) pointed
7

out two problems when using OLS. Strictly speaking OLS minimizes the squared errors between the true and the predicted levels of welfare. This however is a different theoretical problem from that of minimization of poverty. Nevertheless, OLS is considered convenient and useful when a large number of predictor values are available. Further, the weights derived for the target formula are easy to interpret, and the economic meanings are clear, making the indicators developed easy to understand. These features are likely to appeal to policymakers.

Selection of Variables Selection of variables to predict welfare should take into account two criteria. Firstly, the variables should have a strong relationship with welfare in order to increase accuracy in prediction. Secondly, the variables should be verifiable with relative ease (for example, by visits to the households). With those in mind explanatory variables can be placed into different categories. Some suggested categories are as follows: a. Location. Location refers to urban or rural. These are by far more easily verifiable. b. Household demographics. These include union status, household size, age, dependants, sex of head, number of children at school, number of children under 5 years old and the physically challenged. These variables

are not difficult to verify and households are less likely to misrepresent such information. c. Housing characteristics. These includes dwelling, land, construction materials, number of rooms/bedrooms, kitchen, toilet facilities, source of drinking water, and garbage disposal. These can all be verified with relative ease. d. Household Durables/Assets. These can include radio/stereo, television, VCR, DVD, refrigerator, freezer, telephone (land), cell phone, personal computer, sofa, table/chairs, bicycle, motorcycle, horse/donkey carts, domesticated animals and even newspapers. Again, these are all easily verifiable. Importantly, these variables tend to have a high predictive power of welfare and therefore including them can reduce mis-targeting. e. Economic activity. This category includes schooling (education)

employment, mode of transportation, other sources of livelihood, remittances, parental/spousal support, pension and disability benefits. These can all be verified with relative ease.

Each category may have sub-categories with a number of variables, which, when put together, could result in a very extensive list in excess of seventy variables. However, care should be taken to ensure that they clearly represent the welfare of the targeted beneficiaries.
9

Missing Data Sample data may contain missing values. One way to deal with this problem is to exclude those observations altogether from the analysis. However, when a sample is already small one might not want to lose any observation. Instead of deleting such observations the advice is to give these observation with missing data a value of zero (0), otherwise a one (1). The other consideration is to merge or reassign variables to form new but combined variables.

Procedure The original set of variables in the broad categories are introduced into a weighted OLS regression of (log) per capita monthly/annual consumption expenditure. Different subsets of the variables are checked for possible multicollinearity. Simultaneously, checks are made on the a priori conditions. Some variables may then be adjusted or dropped as necessary to reduce these problems. (The pairwise method has proven effective to detect multicollinearity). A step wise regression is then administered using the remaining sets of variables. This is designed to

eliminate from the regression, variables there that are not statistically significant and do not contribute to the models overall explanatory power (R2).

Different models then, evolve, based on the subset of variables entering the regression. Each is then analysed for its targeting errors, that is, the degree of
10

under- coverage or leakage, before the most efficient model is selected. As would be expected, persons are likely to be excluded (Type I error), while others are incorrectly identified as being eligible (Type II error). Under-coverage as calculated by dividing the number of cases of Type I error by the total number of persons who should get benefits. Leakage is calculated by dividing the number in the Type II error category by the number of persons served by the programme. Under-coverage reduces the impact of the programme on the welfare level of the potential beneficiaries, but carries no budgetary cost. Leakage, on the other hand, has no effect on the welfare impact of the programme on the potential beneficiaries, but increases programme costs.

Low levels of leakages and under-coverage would be preferable.

In reality

however, a trade-off becomes necessary. If the goal is to assign priority to the poor, it becomes more important to eliminate under-coverage. On the other hand, if cost saving is the priority it becomes important to minimise leakage.

Finally, the estimates of the explanatory variables in the model of choice become the weights of the scoring formula which is then used to determine the eligibility of potential beneficiaries for the programme at hand. This formula assigns a score to each applicant. Since not everyone becomes eligible for programme benefits it, is necessary to identify cut off score (s) which will be used to select eligible
11

beneficiaries. Off course those whose scores fall within the cut-off points become eligible for programme benefits. Otherwise they are rejected. In fact this approach makes the proxy means test superior to the other means test.

12

REFERENCES Castano, E. (2001). Proxy Means Test Index for Targeting Social Programs: Two Methodologies and Empirical Results. School of Economics, University of Antioquia.

Glewwe, P and O. Kanaan (1989).

Targeting Assistance to the Poor: A

Multivariate Approach Using Household Survey Data Policy, Planning and Research Working Paper No. 225, World Bank, Washington D.C.

Grosh, M. (1994). Administering Targeting Social Programs in Latin America: From Platitude to Practice, World Bank, Washington D.C.

Grosh, M. and Judy Baker (1995). Proxy Means Tests for Targeting Social Programs: Simulations and Speculations; Living Standards

Measurement Survey Working Paper No 118. World Bank, Washington D.C.

Grosh, M and E. Glinskaya (1997). Proxy Mean Testing and Social Assistance in Armenia. (Draft). Development Economics Research Group, World Bank, Washington D.C.

13

Haddad, L. J. Sullivan, and E. Kennedy (1991). Identification and Evaluation of Alternative Indicators of Food and Nutrition Security: Some Conceptual Issues and an Analysis of Extant Data. (Draft). International Food Policy Research Institute, Washington D.C.

Handa, S (2001). Notes on a Proposed Scoring Formula for a Proxy Means Test in Jamaica; Inter-American Bank.

14

You might also like