You are on page 1of 20

Regression Analysis

Incidence of HIV, Alcohol Consumption and Prevalence of Modern Contraception


Evaluating the relationship of HIV, alcohol use and risky sex
!In partial fulllment of the requirement for ECONOMETRICS !Presented to Dr. Cesar Runo of the Economics Department, !De La Salle University - Manila !By Bon Marie M. Tiangson 11017872 September 6, 2013

TABLE OF CONTENTS
I. INTRODUCTION____________________________ 3 ! RATIONALE OF THE STUDY____________________________________4 II. METHODOLOGY___________________________5 !
DATA PRESENTATION_________________________________________6

III. REGRESSION RESULTS_____________________11 IV. REGRESSION DIAGNOSTICS________________13 !


! (ANOVA, NORMALITY, HETEROSKEDASTICY & MULTICOLINEARITY)

V. RECOMMENDATION_______________________17 VI. BIBLIOGRAPHY___________________________18 VII. APPENDIX_______________________________19

I. INTRODUCTION
! In the 1980s a group of gay men in New York and California started to

develop a number of infections and cancers that were seemingly resistant to any treatment, it quickly became obvious that they were all suffering from a common syndrome: acquired immunodeficiency virus (AIDS). Although, it was yet to have a name. The discovery of human immunodeficiency virus (HIV) came soon after when doctors observed a distinctive feature of all the gay mens cases, they were lacking a particular type of blood cell which is crucial to healthy immune system (Avert). The virus has been spreading at large ever since. Globally, more than 34 million people in 2012 were infected with HIV/ AIDS. In 2011, an estimated 2.5 million people were reported to have newly acquired the said disease. And since the beginning of the epidemic, more than an estimated 70 million people have contracted HIV/AIDS and almost 35 million have died of HIV-related causes (WHO).

Global view of people living with HIV (illustration from CBC)

The virus is transmitted when one engages in unprotected sexual contact with an infected individual, by direct contact to contaminated needles, mother to child transmission (infants with HIV-infected mothers) or via blood transfusion of infected blood. According to World Health Organization (WHO) unsafe sex is the most common medium of HIV transmission 1. Studies suggest a correlation between alcohol consumption patterns and an increased probability of sexually risky behaviors such as having unprotected sex (Fisher, 2008). Specifically, the studies imply that heavy consumption of alcohol may increase sexually risky-behavior among individuals by affecting the persons judgment and disinhibiting socially learned constraints, hence excusing behaviors that they otherwise wouldnt engage in when sober (NIAAA). Other than affecting risk-taking temperament, alcohol can also impair bodys normal immune responses from HIV by suppressing certain cellular activity that (McGreggor, 1998). fight against bacterial infection 2

RATIONALE OF THE STUDY


The spread of HIV/AIDS poses a challenge not only to public health but also to the social and economic welfare of a country. HIV/AIDS pandemic has been associated with unfavorable economic effects. Studies conducted by the World Bank (WO) revealed that HIV/AIDS may potentially subtract 1% a year from GDP growth rate of some sub-Saharan Countries and that it may deflate

1 2

See The World Health Report 2002: Reducing risks, promoting healthy life

Like macrophages which keeps lungs free from infection, in developing countries HIV/AIDS is open paired with another disease called tuberculosis (Guarneri, 1986).

the GDP of South Africa by as much as 17% in the next decade. Studies on the impact of the prevalence of the disease in Africa also suggested a correlation between the disease and the derail of certain poverty reducing efforts. In Kenya, the disease has been wearing away the health benefits the country gained since its independence (Dixon, 2002). Knowing what factors can increase the risk of infection can help legislators come up with a better policy mix that aids HIV/AIDS prevention and eradication.

This paper will empirically determine the relationship of total alcohol consumption and risky sex (prevalence rate of modern contraception) to the number of people living with HIV/AIDS using linear regression and 2006 data on the said variables.

II. METHODOLOGY
! Software Gretl was used to empirically determine the relationship of alcohol consumption, prevalence of modern contraceptives and number of people living with HIV. Where, HIV is the dependent variable while alcohol consumption and prevalence of modern contraception are the independent or explanatory variables. Ordinary Least Squares and the Classical Linear Regression Model was used:

Since it is suggested that high alcohol consumption (APC variable) increases the number of people living with HIV, we will assume that APC positive has a positive relationship to the number of people living with HIV. And since the prevalence of modern contraception (CPR) is suggested to lessen the risk of contracting HIV, we will assume that it will have a negative relationship to the number of people living wit HIV. With this, we formulate our linear regression model:

and our hypotheses:

The null hypothesis H-naught, states that when the value of the estimates are zero (0), CPR and APC has nothing to do with HIV. Whereas, the alternate hypothesis H-one, is in accordance with our assumptions.

DATA
The data used in this paper is the 2006 cross-sectional data on number people living with HIV, total per capita alcohol consumption and prevalence of modern contraception of 30 countries. Data on HIV and alcohol consumption were obtained from the WHO database, whereas data on modern contraception prevalence rate was retrieved from the WB database.

Table 1. Data used Number of people living with HIV 5100 60000 280000 130000 440000 530000 150000 7100 11000 250000 16000 7400 78000 13000 930000 110000 180000 63000 55000 41000 35000 5300 160000 Contraceptive Prevalence Rate (%) 51.8 24.41 28.99 38.62 20.91 36.64 28.41 71.57 18.38 21.33 19 34.42 53.23 54.04 44.69 15.23 65.41 59.97 23.02 41.06 16.88 47.56 58.11 Total alcohol consumption per capita (in liters) 1.38 1.15 6.98 4.52 4.22 4.49 1.73 4.2 1.66 1.78 3.06 6.7 0.02 6.61 1.53 0.54 4.18 0.18 0.1 0.01 0 5.24 5.07

Country

Azerbaijan Benin Bosnia and Herzegovina Burkina Faso C_te d'Ivoire Cameroon Central African Republic Cuba Djibouti Ghana Guinea-Bissau Guyana Iraq Kazakhstan Malawi Mali Namibia Nepal Niger Pakistan Somalia Suriname Swaziland

Syrian Arab Republic Togo Uganda Uzbekistan Viet Nam Yemen Zimbabwe

62000 150000 1100000 310000 210000 17000 1400000

63.45 17.5 43.14 62.89 71.95 42.31 4.04

1.55 1.33 11.6 9.13 1.4 0 0.8

Regional situation of countries used in the study: Africa (14) - Benin, Burkina Faso, C_te d'Ivoire, Cameroon,Central African Republic, Ghana, Guinea-Bissau, Malawi, Mali, Namibia, Niger, Swaziland, Togo and Uganda; America (3) - Cuba, Guyana and Suriname; Eastern Mediterranean: Djibouti, Iraq, Pakistan, Somalia, Syrian Arab Republic, and Yemen; Southeast Asia (1) - Nepal; Western Pacific (1) - Viet Nan
Table 2. Provides brief definition of the variables, what they measure and how they were obtained3.
Definition/what it measures: The number of people with HIV infection, whether or not they have developed symptoms of AIDS, estimated to be alive at the end of a specific year. How it was obtained: Countries produce national estimates of the number of people living with HIV, which are compiled and published annually by UNAIDS and WHO. Standard methods and tools for HIV estimates that are appropriate to the pattern of the HIV epidemic are used. However, to obtain the best possible estimates, judgement needs to be used as to the quality of the data and how representative it is of the population.

Number of People Living with HIV


Variable name: HIV Dependent Variable

Total Per Capita Alcohol Consumption


Variable name: APC Independent Variable

Definition/what it measures: total alcohol per capita consumption (APC) is defined as the total amount of alcohol consumed per adult (15+ years) over a calendar year, in litres of pure alcohol. Recorded alcohol consumption refers to official statistics (production, import, export, and sales or taxation data). Method of Measurement: Recorded adult per capita consumption of pure alcohol is calculated as the sum of beverage-specific alcohol consumption of pure alcohol (beer, wine, spirits, other) from different sources. Definition/what it measures: Contraceptive prevalence rate is the percentage of women who are practicing, or whose sexual partners are practicing, any form of contraception. How it was obtained: Obtained by compiling Household surveys, including Demographic and Health Surveys by Macro International and Multiple Indicator Cluster Surveys by UNICEF.

Modern Contraception Prevalence Rate


Variable name: CPR Independent Variable

Limitations: Data use was chosen because of their availability. Modern contraception prevalence rate was used due to the lack of data on world use of condoms. Also, because the only available data on the total number of people living with HIV was the one from 2006, it limited the numbers of countries covered by this study to 30 (this will also be our number of observations), namely: Iraq, Kazakhstan, Malawi, Mali, Namibia Nepal, Niger, Pakistan, Somalia, Suriname, Swaziland, Syrian, Arab, Republic Togo, Uganda, Uzbekistan, Viet Nam, Yemen and Zimbabwe (cluster country).

Table 3. Summary Statistics Variable Name Mean Median Min Max Std.Dev. C.V Skewness Ex. kurtosis HIV 2.27E+05 94000 5100 1.40E+06 3.43E+05 1.5099 2.221 4.136 CPR 39.299 39.84 4.04 71.95 18.988 0.48318 0.094897 -1.1221 APC 3.0387 1.695 0 11.6 2.9702 0.97747 1.0951 0.66035
9

Table 2 3 provides the summary statistics of each variable. Mean values (second column) implies that in 2006, on average, there were 2 people living with HIV, 39.299% of its population was practicing modern contraception and over 3 liters of alcohol was consumed per person. Median (third column) denotes the middle term of the data series, so the median values for number of people with HIV is 94,000 individuals, for CPR 39.840%, and for APC 1.6950 liters. Max and Min (fourth and fifth column) shows the minimum and maximum values from the data. Standard deviation (sixth column) measures the distance of one observation from the other, the ideal standard deviation value is zero (0). Since the standard deviation of HIV is 1.40, CPR 18.988 and APC 2.9702, it would mean that our dataset isnt tightly grouped. It would be ideal to add more observations to the dataset to lower the value of standard deviation. Skewness and kurtosis provide information regarding the datas normality (normal if distribution is bell-curve shaped). Skewness (eighth column) is an indicator used to determine if a data sets symmetry, value for this should be zero (0) for dataset to be normal. Since the skewness value of HIV 2.2210 and APCs is 1.0951 (greater than zero) it means that it is a right skewed distribution, meaning more values are concentrated to the right. CPRs skewness value is 0.0949, (close to zero but still a bit larger than 0) means that values are slightly concentrated on the right. Kurtosis on the other hand, determines the flatness or peakedness (height) of a distribution, value for this should be three (3) for dataset to be normal. Kurtosis value for HIV is 4.1360 and APC 0.66 (greater than 3) means that it is sharper than normal, for CPR

3 Gretl

and Stata were the softwares used for data and econometric model analysis.

10

-1.122 (less than 3) means that it is way flatter than normal. Again, it would be idea to add more observation to the sample to make the distribution normal. 4

III. REGRESSION RESULTS


Table 4. Results Dependent variable Intercept (cons) CPR (in %) APC R-Squared Adjusted R-Squared Coefficient Estimates 326367 (142544)** -5496.59 (3308.99) 38340.7 (21154.0)* 0.151388 0.088528

Number of observations is 30. Standard errors are in parentheses. ** significant at 5%, * significant at 10%, no star indicates P-Value > 10%

For this set of data both of CPR and APC estimates are insignificant at p>.05. APC on the other hand had significant at p>.10 partial effects on the in the full model. Relationships of both variables are in accordance to the

hypothesized relationships: APC has positive correlation to HIV; CPR has negative correlation to HIV. The two variables were able to account for only 15.14% (R-Squared measures the explanatory power of the independent variables) of the variance in the number of people living with HIV. Estimated model:

Definitions of underlined words are obtained from Walpoles Introduction to Statistics.

11

Residual plot vs. fitted

where: x-axis independent variable HIV y-axis dependent variable APC, CPR

VERDICT: FOR THIS STUDY WE CAN CONCLUDE THAT ALCOHOL CONSUMPTION AND PREVALENCE OF MODERN CONTRACEPTION HAS NOTHING TO DO WITH THE NUMBER OF PEOPLE LIVING WITH HIV. SINCE P VALUE OF PREDICTORS DO NOT MEET THE STANDARD P VALUE OF .05, WE ACCEPT NULL.

12

IV. REGRESSION DIAGNOSTICS


A. Analysis of Variance (Using Gretl)

Table 4. ANOVA Table Sum of Squares Explained Unexplained Total R-Squared 5.1516 2.88775 3.40291 df 2 27 29 5.1516 / 3.40291 = 0.151388 Mean Square 2.5758 1.06954 1.17342

Total Sum of Squares (TSS), second column-third row: shows the total number of deviations in the dependent variable (difference of the dependent from the mean) in this model, the total sum of squares is 3.40291. The goal is for the TSS to be as small as possible. Ideally, it should be zero. Values closer to zero means that the model has smaller random error. R-Squared is the ration of Explained Sum of Squares and Unexplained Sum of Squares , it basically measures the goodness of fit of the model, the closer is it to 1, the better. R-Squared value is 0.151388, MODEL IS NOT A GOOD MODEL. B. Normality (Using Gretl) Normality test tries to see if your errors are normally distributed. If if errors are not normal, it violates the Classical Linear Regression Model assumptions.

13

Upon visual inspection (informal testing), we can see already see that

the errors are not normally distributed, it is left skewed.

Table 7. Test for Normality of Residual Null Hypothesis


Test Statistic (chi-squared)

Error is normally distributed

26.1556 2.09E-06

With p-value

With formal testing, we can affirm that the errors are not normally distributed. Chi-squared value: 26.1556, high chi-squared value indicates that errors are not normally distributed. REJECT NULL. ERRORS ARE NOT NORMALLY DISTRIBUTED C. Heteroskedasticity (Using Gretl) Heteroscedasticity is commonly encountered in cross sectional data than in time series data. If this assumption is violated, then the standard errors will be biased. In that case we can no longer trust the t-statistics and the
14

t-test becomes invalid. One can plot the residuals against the predicted values to get an indication of heteroscedasticity. Remember that he homoscedasticity assumption is needed to show the efficiency of OLS. Hence, OLS is not BLUE any longer.the variance of the errors is constant and finite over all the values of independent variables.

Table 5. Breusch-Pagan Test for Linear Heteroskedasticity Null Hypothesis


Test Statistic (chi-squared)

Heteroskedasticity is not present

8.39959 0.0149987

With p-value

REJECT NULL. A high chi-squared would indicate that

heteroskedasticity is present with chi-squared value being 8.39959, linear heteroskedesticiy is present.

Table 6. Whites Test for Non-linear Heteroskedasticity Null Hypothesis


Test Statistic (chi-squared)

Heteroskedasticity is not present

7.73648 0.171369

With p-value

REJECT NULL. A high chi-squared would indicate that

heteroskedasticity is present with chi-squared value being 8.39959, nonlinear heteroskedesticiy is present. D. Multicolinearity (Using Stata) The linear relationship among variables are not perfect if there is more than one independent variable. Intuitively, a problem arises because the inclusion of both X1 and X2 adds no more information to the model than the
15

inclusion of just one of them. Effectively, we are asking the regression model to estimate an additional parameter, but we are not supplying it with any additional information. Consequences of high multicollinearity: (1) Increased standard error of estimates of the !s (decreased reliability). (2) Often confusing and misleading results.

Table 7. Variance Inflator Factor Variable CPR APC VIF 1.07 1.07 1/VIF 0.934200 0.934200

1/VIF tells you the proportion of an x variables variance is independent of all the other x variables. A low proportion, for instance .10 indicates potential trouble. Since, our 1/VIF value is 0.934200, there is multicollinearity present.

Summary: (1) Alcohol consumption and prevalence of modern contraception has nothing to do with the number of people living with HIV. (2) Model is not a good model due to low R-Squared (3) Residual errors are not normally distributed. (4) Heteroskedasticity is present in the model. (5)

Multicollinearity is present in the model. (6) OLS is not BLUE.

16

V. RECOMMENDATION
1.If you want to do a similar test determining the relation of APC and CPR to HIV: Rerun tests with more observations (larger sample).

Check to see if there are any differences in P-Values. Also, adding more observations would also make the distribution of the data more normal; Using other binary variables for condom use instead of prevalence of contraception in part of risky sex and using patterns of consumption per capita on alcohol instead of total consumption levels per capita would make the model better. Rerun again and check if there are differences in P-Values. Last resort: use other models such as weighted least squares which is BLUE even when heteroskedasticity is present. 2. If you are planning to do a study to explain which explains the increasing number of people living with HIV. It would be best to increase the number of observations (increase sample size) to decrease standard errors and variance; Add more explanatory variables (such as income, health expenditure, sexual activity frequency, caloric intake) to increase R-Squared value or the goodness of fit. It would be better to use binary variables for contraception usage instead of prevalence of modern contraception to take the part of risky sex (if data is available).

17

VI. BIBLIOGRAPHY
Diagnostic testing. Retrieved August 26, 2013 from http://www.bi.no/ BibliotekFiles/_nedlastingsfiler/eviews/Diagnostic%20testing.pdf The Classical Linear Regression Model. Retrieved August 26, 2013 from http://irving.vassar.edu/faculty/wl/Econ210/reg210f02.pdf Dixon, S., Mcdonald , S., & Robert , J. (2002). The impact of hiv and aids on africa's economic development. Retrieved from http://www.ncbi.nlm.nih.gov/ pmc/articles/PMC1122139/ Fisher, J. C., Cook, P. A., Sam, N. E., & Kapiga, S. H. (2008). Patterns of alcohol use, problem drinking, and HIV infection among high-risk African women. Sexually Transmitted Diseases, 35(6), 537-544. International Center for Alcohol Policies. (n.d.). Hiv/aids risks and drinking patterns . Retrieved from http://www.icap.org/policytools/icapbluebook/ bluebookmodules/24hivaidsrisksanddrinkingpatterns/tabid/182/default.aspx Gujarati, D.N., & Porter, D.C. (2009). Basic econometrics. Boston: McGrawHill Heteroscedasticity. Retrieved August 26, 2013 from http://www3.nd.edu/ ~rwilliam/stats2/l25.pdf Mendenhall and Sincich. (2003). A Second Course in Statistics: Regression Analysis, 6th edition Macgregror, R. Alcohol and immune defense. Journal of the American Medical Association 256(11):1474-1479, 1986. Walpole, R. (1968 ). Introduction to statistics. New York: McMilan. WHO. (n.d.). Contraceptive prevalence (% of women ages 15-49). Retrieved from http://data.worldbank.org/indicator/SP.DYN.CONU.ZS WHO. (n.d.). Indicator and measurement registry. Retrieved from http:// apps.who.int/gho/data/view.main

18

VII. APPENDIX
(1) .txt files from Gretl Mean dependent var 226863.3 S.D. dependent var 342551.9 Sum squared resid 2.89e+12 S.E. of regression 327038.0 R-squared 0.151388 Adjusted R-squared 0.088528 F(2, 27) 2.408329 P-value(F) 0.109037 Log-likelihood "421.9227 Akaike criterion 849.8454 Schwarz criterion 854.0490 Hannan-Quinn 851.1902 White's test for heteroskedasticity Null hypothesis: heteroskedasticity not present Test statistic: LM = 7.73648 with p-value = P(Chi-square(5) > 7.73648) = 0.171369 Breusch-Pagan test for heteroskedasticity Null hypothesis: heteroskedasticity not present Test statistic: LM = 8.39959 with p-value = P(Chi-square(2) > 8.39959) = 0.0149987 Breusch-Pagan test for heteroskedasticity (robust variant) Null hypothesis: heteroskedasticity not present Test statistic: LM = 2.98834 with p-value = P(Chi-square(2) > 2.98834) = 0.224434 Test for normality of residual Null hypothesis: error is normally distributed Test statistic: Chi-square(2) = 26.1556 with p-value = 2.0911e-06 var1 var2 var3 var1 var2 var3 var1 var2 var3 Mean Median Minimum Maximum 2.2686e+05 94000. 5100.0 1.4000e+06 39.299 39.840 4.0400 71.950 3.0387 1.6950 0.0000 11.600 Std. Dev. C.V. Skewness Ex. kurtosis 3.4255e+05 1.5099 2.2210 4.1360 18.988 0.48318 0.094897 -1.1221 2.9702 0.97747 1.0951 0.66035 5% perc. 95% perc. IQ range Missing obs. 5210.0 1.2350e+06 2.4075e+05 0 10.194 71.741 33.833 0 0.0000 10.242 3.9225 0

19

(2) .log file from Stata . vif

Variable |

VIF

1/VIF

-------------+---------------------var2 | var3 | 1.07 0.934200 1.07 0.934200

-------------+---------------------Mean VIF | 1.07

20

You might also like