Regression Analysis

1.
CHAPTER 14 Suppose the president of an advertising agency did an experiment on the perception of a set of ads for a grocery product. Each participant in the experiment was randomly drawn from the population and each was randomly assigned to one of three groups. Group 1 was a control group that just watched a television show segment, then filled out a questionnaire about their perception of the product. Group 2 saw the television show segment, including a very inexpensive-to-produce commercial, then filled out the same questionnaire. Group 3 saw the television show segment, including a very expensive commercial, then filled out the same questionnaire as the other groups. We can symbolize the experiment as follows: R X1 0 R X2 0 R X3 0 a.Assuming interval data, how would you instruct the analyst to analyze the data for the agency president? Suppose you were not sure of the scale of the data. What would you do? Suppose the main response measure was nominally scaled such as 1 = I like the commercial; 2 = Not sure; 3 = Can't stand the commercial. What would you suggest? Answer a. ANOVA. If the overall F-test was significant, each mean perceptual response can be compared by ttests as well as by graphically showing the scale. The purpose would be to show if Group 2 and Group 3 were significantly more favorable to the product than Group 1; if so, then some advertising would be justified. If Group 2 or Group 3 were higher and Group 3 significantly higher than Group 2, then, perhaps, the expensive commercial is justified in use. b. Graph the distributions and test for continuity of each distribution as well as homogeneity of variance. c. Here a nonparametric ANOVA such as Kruskal-Wallis would be appropriate for analysis purposes. Also, cross tabulation could be utilized. 2. A retail grocery chain owner had an experiment done to test the response (change in sales) to a set of price changes. He also felt that there were two different types of stores represented in his chain of stores -- (1) highly advertised, large stores, and (2) small neighborhood stores. He used three levels of price (40 percent raised price, no change, 40 percent price cut). His niece suggested he should use analysis of covariance to analyze the results of his experiment because the variable, overall store sales, should be used as a covariate to reduce the variance due to outside effects on store traffic and to account for the different sizes of stores. He argued that he could understand that he had a two-way analysis of variance (that is, price change and store type), but could not see how it was analysis of covariance. a. b. c. d. Explain to the store owner how analysis of covariance differs from analysis of variance. Which model would you prefer using? Why? Do you think your choice in part (b) would lead to a better decision on pricing policy if you were the store chain owner? If you divided the dependent variable by the covariate to give a standardized change score of sales, could you use a two-way ANOVA? Answer You are adding another explanatory variable to explain variances in the sales of the price manipulated products that are due to changes in overall store sales (e.g., weather; construction in the street in front of a particular store). ANCOVA - It gives more control over the manipulated situations by removing variance due to extraneous effects (i.e., nature). Yes - identical logic. Further, one could ignore most of the special pleading of bad weather, etc., from the various store managers since it was controlled for. Yes - a clever centering of the data, but not as revealing an explanation. b. c.
a. b. c. d. 4.
Discuss the types of situations where covariance analysis, analysis of variance, and dummy variable regression would be substitutable and contrast this with situations where they are not. In the case of nonparametric data, would your same reasoning apply? Answer Answers will vary according to the field or major area of each student (Note: Non-parametric analogues are difficult to generalize. They are generally situation specific).
5. A sample survey of boat owners in Southeast Florida has yielded the following data on boat costs and buyer income. __________________________________________________________________________ Respondent Boat Cost, Y Buyer Income, X (000's) (000's) __________________________________________________________________________ 1 2.6 20.2 2 10.3 30.5 3 40.0 100.0 4 12.3 38.1 5 3.0 16.4 6 5.6 20.2 7 1.0 14.6 8 9.4 40.2 9 7.8 23.4 10 3.5 17.6 11 5.2 19.3 12 50.1 130.4 ___________________________________________________________________________ a. b. c. a. Compute a linear regression of Y on X. How do you interpret the results? Compute the R2 value. Interpret this measure. Suggest possible uses of this regression equation if you were a boat seller in Florida. Answer The regression model for the boat data is calculated as: Y' = -3.99 + .423(X) with a multivariate F-value of 894.81 (SIGNIF. = .000) The interpretation of the beta weight attached to boat cost, statistically significant at the .01 level (p = .000), is that for every one-thousand dollar increase in buyer income the cost of a boat purchase increases by approximately $423 dollars over the relevant range of X. R-Squared is equal to .99 meaning that nearly all of the variance in boat cost (99%) is accounted for by buyer income. c. The results are of substantive significance to boat sellers because buyers may be qualified primarily on the basis of income. In fact, once buyers are qualified, the regression equation may be used to predict the average purchase cost of a boat for a prospective buyer by plugging in his/her income into the equation.
b.
6.
Assume next that the study of boat owners also produced information of the age (Z) of the owner. The data on age is as follows: Respondent 1 2 3 4 5 6 7 8 9 10 11 12 ______________________________________________________________ Age a. b. c. 25 33 45 35 28 26 39 21 31 29 25 55 Using the data in problem #6, compute a multiple linear regression of Y on X and Z. Interpret the results. If you were told that a boat owner at age 35 had an income of $40K, what boat cost would you predict? Using the computer program of your choice run a multiple regression. Interpret the R2, standardized beta weights, and statistical significance.
Answer a. The regression model of Y onto X and Z is:
Y' = -6.38 + .401(X) + .099(Z). The calculated multivariate F-value is significant (Fdf=2,9 = 451.91, p = .000) so we can conclude that the set of predictor variables, as a whole, is contributing to the explained variance associated with Y. Again, buyer income is significant at the .01 level (p = .000), but age is not significant in the model (p = .34). By plugging the appropriate values for each variable into the equation (X = 40, Z = 35) we can predict a boat cost of Y' = 13.13. However, since age of the boat owner (Z) is nonsignificant in the model, the more parsimonious model contains only buyer income (X) and the predicted boat cost would be Y = 12.93 (using the model in problem #6 part a): Y' = -3.99 + .423(X). c. R-squared (R2) for the model in part a of this problem is .99, or 99% of the variance in Y is explained by the set of predictor variables selected for the multiple regression. By dividing the beta weights obtained for X and Z by their respective standard errors (.025 and .095, in that order) we can calculate standardized beta weights. Standardized betas allow us to compare the relative contribution of each variable. However, in this study age (Z) is not statistically significant and a comparison is not warranted. The standard error of the regression is 1.72 so that the scatter of points about the regression equation is relatively tight; given Y-bar is 12.6. (Note: A plot of the residuals from the regression could also be used to look for outliers or additional violations of model assumptions).
b.
7. A correlation matrix (correlation coefficients and probability level under the hypothesis = 0) for a companys sales force (age, years of service, and current sales) is given below. Comment. Age Years of Current Service Sales -----------------------------------------------------------------------------------------------------------Age 1.00000 .68185 .21652 .0000 .0208 .5225 Years of service .68185 1.00000 .64499 .0208 .0000 .0321 Current sales .21652 .64499 1.00000 .5225 .0321 .0000 Answer: This correlation matrix shows that age is correlated with years of service (.68185) at the .02 level of significance. Age is not highly correlated with current sales (.21652) and is not statistically significant at the .05 level. Years of service is highly correlated with current sales (.64499). This relationship is statistically significant at the .0321 level. 8. A manufacturer of disposable washcloth/wipes told a retailer that sales for this product category closely correlated with the sales of disposable diapers. The retailer thought he would check this out for his own sales-forecasting purposes. Where might a researcher find data to make this forecast? Answer: A manufacturer of disposable washing cloths told a retailer that sales for this product category closely correlate with the sales of disposable diapers. The researchers says, Disposable washcloth/wipes can be predicted with knowledge of the disposable sales. Is this the right thing to say? The basic rule in correlation is Correlation does not mean causation. Just because two variables are correlated in the past does not mean that they can will correlate in the future. However, once the limitations of correlation and regression are explained this would be an appropriate forecasting method. If somebody may wish to ask: If the retailer wished to check this out for his own sales forecasting purposes, where might a researcher find data to make this forecast?. It probably is best for the researcher to obtain this data through a retail scanner data or a wholesaling withdrawal syndicated service, and then perform the actual calculations. 9. Interpret the following: (A) Y = a + bX; Y = 3.5 + .7X, where Y = likelihood of buying a new car, and X = total family income.
(B) Y = a + bX; Y = 3.5 - .4X, where Y = likelihood of buying tickets to a rock concert, and X = age. Answer: A. The formula can be utilized to predict the dependent variable, the likelihood of buying a new car. The estimated intercept of the Y axis is 3.5 and estimated slope of the line ( ) is .7. As income rises by one unit ($10,000), the likelihood of buying a new car increases by .7 on the 10-point scale. B. In this example the likelihood of buying tickets to a rock concert is reduced by .4 as X increases by one unit. 10. The ANOVA summary table below is the result of a regression of sales on year of sales. Is the relationship statistically significant at .05? comment. Source of variation Sum of Squares d.f. Mean Square F-value -----------------------------------------------------------------------------------------------------------Explained by regression 605,370,750 1 605,370,750 3.12 Unexplained by regression1,551,381,712 8193,922,714 Total error -9 Answer: For degree of freedom 1, 8, an F value of 5.32 is required for statistical significance at the .05 level. Thus, the value 3.12 is not significant at the .05 level. 11. A metropolitan economist is attempting to predict the average total budget of retired couples in Phoenix, based on average U.S urban retired couples total budget. An r2 of .7824 is obtained. Will the regressions be a good predictive model? Answer: If the p-value is equal or less than 0.05, then an r 2 of .7824 implies that the regression equation is a reasonably good model for prediction. 12. What is the relationship between the Pearson Product Moment Coefficient and Simple Linear Regression? Answer The analytic procedure that is widely used to predict or estimate is called linear regression. Linear regression is a method of analyzing the change (variability) of a dependent variable by using information available on one or more independent variables, given certain linear assumptions. We are seeking to answer the question: What are the expected changes in the dependent variable as a result (observed or induced) in the independent variables. The Pearson Product Moment Correlation Coefficient indicates to us that when two variables are correlated, we can use one to predict the other. Additionally, when one of the variables is designated as dependent and the other independent, the analysis is simple linear regression. With the addition of one or more independent variables, the focus of the analysis shifts to multiple linear regression. Both procedures use mathematics to develop the analysis. Tests of hypotheses can also be used if certain assumptions of the analytics are met. Linear regression uses the same linear modeling approach and variance partitioning as previously discussed. In the simple linear regression model (SLR), we use pairs of observations, (X, Y). In multiple linear regression (MLR), we use sets of observations (X1, X2, . . . , Xp, Yp). The model for SLR is Yi = + X + ei, for parameters Where + X is fixed, part of the individual score Yi of an individual at a given level of X, and ei is a random component, unique to individual i. Alpha ( ) of the Y intercept is the mean of the population when X=O. is the regression coefficient in the population, or the slope regression line. expresses the change in dependent Y due to changes in independent X. Also, if we standardize the data, is identical to the correlation of X and Y. The purpose of SLR is to estimate the coefficients of and and using sample statistics to obtain a statistically correct, useful managerial equation.

Regression Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis

Uploaded by

Copyright:

Available Formats

1.

Answer a. The regression model of Y onto X and Z is:

You might also like