11 views

Uploaded by hassan_401651634

- CH14 Exercises Solutions
- Chapter 3 Methods and Procedures This Chapter
- Relative Valuation (Long Version)
- MPRA Paper 13560
- stats_project.docx
- Interpreting Multiple Regression
- Data Analysis of Electrostatic Charge in Finish Ball Mills
- jurnal
- Chapter 11
- regcorr5
- Lecture 2. Relaxing the Assumptions of CLRM_0
- xyApr6Lec26
- Answers.docx
- Regression
- Correlation and Regression
- High Risk Stress In Kelatan Schools PDF
- 971
- Adapting Teacher Interventions to Student Needs During Cooperative Learning How to Improve Student Problem Solving and Time on-Task
- 320C10
- AMBO201 QAMR Assignment

You are on page 1of 3

CORRELATION ANALYSIS

Aivaz Kamer-Ainur

Mirea Marioara

Ovidius University of Constanta, Faculty of Economics Sciences, Dumbrava Rosie St. 5, code 900613, E-

mail: elenacondrea2003@yahoo.com

Abstract

This paper describes the main errors and limitation associated with the methods of regression and correlation

analysis. Those methods have been developed specifically to study statistical relationships in data series.

Key Words: Assumption, linear regression, linear correlation, multiple regressions, multiple correlations.

Introduction

Regression analysis is concerned with developing the linear regression equation by which the value of a

dependent variable Y can be estimated given a value of an independent variable X.

If simple regression analysis is used, the assumptions for this technique should be satisfied. The assumption

required to develop the linear regression equation and to estimate the value of dependent variable by point

estimation is:

1. The relationship between the two variables is linear.

2. The value of the independent variable is a set at various values, while the dependent variable is a

random variable.

3. The conditional distributions of the dependent variable have equal variances.

If any interval estimation or hypothesis testing is done, additional required assumptions are:

1. Successive observations of the dependent variable are uncorrelated.

2. The conditional distributions of the dependent variable are normal distributions.

The scatter diagram is a graph that portrays the relationship between the two variables and can be used to

observe whether there is general compliance with the assumptions underlying regression analysis. An alternative

graph to determine such compliance is the residual plot, which is a plot of the residuals e = ( ) with

respect to the fitted values . The mathematical criterion generally used to determine the linear regression

equation is the least squares criterion by which the sum of the squared deviations between the actual and

estimated values of the dependent variable is minimized. The standard error of estimate s y , x is the measure of

variability, or scatter, with respect to the regression line. It is used to establish prediction intervals for the

dependent variable.

Interval estimation of the conditional mean of the dependent variable is based on use of the standard error of the

conditional mean sy,x, The standard error of forecast sy(next) is used to construct a complete prediction interval for

an individual value of the dependent variable. By complete, we mean that the uncertainty regarding the value

of the conditional mean is considered in addition the uncertainty represented by the scatter with respect

regression line. When the sample is relatively large, approximate prediction intervals based on use of only the

standard error of estimate are considered acceptable. A final area of inference that we considered was interval

estimation and hypothesis testing concerning the slope 1 of the linear regression model.

The most frequently used measure of relationship for sample data is the sample correlation coefficient r. The

sign of the correlation coefficient indicates the nature (direct or inverse) of the relationship between the two

variables, while the absolute value of the correlation coefficient indicates the extent of the relationship. The

coefficient of determination r2 indicates the proportion of variance in the dependent variable that is explained

statistically by knowledge of the independent variable (and vice versa). The null hypothesis most frequently

tested in correlation analysis is that the population correlation is zero, represented by = 0 . Rejection of this

hypothesis leads to the conclusion that 0 and that the two variables are related.

The value of the dependent variable cannot be legitimately estimated if the value of the independent variable is

outside the range of values in the sample data that served as the basis for determining the linear regression

equation. There is no statistical basis to assume that the linear regression model applies outside of the range of

the sample data.

710

If the estimate of the dependent variable in fact concerns prediction, the historical data used to determine the

regression equation might not be appropriate to represent future relationships. Unfortunately, one can only

sample past data, not future data.

The standard error estimate is by itself not a complete basis for constructing prediction intervals, because the

uncertainly concerning the accuracy of the regression equation, and specifically of the conditional mean is not

considered. The standard error forecast is the complete measure of variability. However, when the sample is

large, use of the standard error of estimate is generally considered acceptable.

If correlation analysis is used, all the assumptions for this technique should be satisfied. These assumptions are:

1. the relationship is linear

2. Both variables are random variables.

3. For each variable, the conditional distributions have equal variances.

4. Successive observations are uncorrelated for both variables.

5. The joint distribution is a bivariate normal distribution.

A significant correlation does not necessarily indicate causation, but rather may indicate a common linkage in a

sequence of events. One type of significant correlation situation is when both variables are influenced by a

common cause and therefore are correlated with one another. For example, individuals with a higher level of

income have both a higher level of savings and a higher level of spending. We might therefore find that there is

a positive relationship between level of savings and level of spending, but this does not mean that one of this

variable cause the other. Another type of situation is one in which two related variables are separated by several

steps in a cause-effect chain of events. An interesting example in the medical field is the following sequence of

events: warm winter, appearance of viruses, and release of the flu.

The existence of warm, the climatic conditions is not itself the cause of the flu, but these conditions is several

steps removed in the cause-effect sequence. For many years, the climatic conditions themselves were thought to

be the cause, and so this disease was called flu.

A significant correlation is not necessarily an important correlation. There is much confusion regarding the

meaning of significant in the popular press. It is usually implied that a relationship that is significant is also

thereby important. However, from the statistical point of view, a significant correlation simply indicates that a

true relationship exists and that the correlation coefficient for the population is different from 0. Significance

is necessary to conclude that a relationship exists, but the coefficient of determination r2 is more useful in

judging the importance of the relationship.

Given a very large sample, a correlation of, say, r = 0,10 can be significantly different from 0 at = 0,05 . Yet

the coefficient of determination of r2 = 0,01 from this example indicated that only 1 percent of the variance of

the dependent variable is statistically explained by knowledge of the independent variable.

In a comparison of simple linear regression analysis and correlation analysis, the principal difference in the

assumption is that in regression analysis there is one random variable, while in correlation analysis, both

variables have to be random. The sapling design for a study therefore should consider the analysis to be

performed. Regression analysis is used when the main objective is to estimate values of the dependent variable,

whereas correlation analysis is used when the main objective is to measure and express the degree of

relationship between the two variables. When both variables are random variables, either regression analysis or

correlation analysis, or both, can usually be applied to the data.

In multiple regression analysis the value of the dependent variable is estimated on the basis of know values of

two or more independent variables, while the extent of the relationship between the independent variables, while

the extent of the relationship between the independent variables taken as a group and the dependent variable is

measured in multiple correlation analysis. For multiple regression analysis the principal assumption is:

1. The relationship can be represented by a linear model

2. The dependent variable is a continuous random variable

3. The variances of the conditional distributions of the dependent variable are all equal (homoscedasticity)

4. Successive observed values of the dependent variable are uncorrelated

5. The conditional distributions of the dependent variable are all normal distributions.

On the general level, the assumptions associated with multiple regressions and multiple correlation analysis has

to be satisfied if the results are to be meaningful. As in simple analysis involving one independent variable, the

assumption of linearity and equality of conditional variances can be investigated by obtaining a residual plot

based on the multiple regression models.

A specific area of concern when there are several independent variables is the possible existence of

multicollinearity. This term describes the situation in which two or more independent variables are highly

correlated with one another. Under such conditions, the meaning of the partial regression coefficient in the

multiple regression equation is unclear. Similarly, the meaning of the coefficients of partial correlation with a

given independent variable is highly negative even through the simple correlation is highly positive. The

statistical procedures that represent attempts to handle the problem of multicollinearity it is sometimes to

711

eliminate one of two highly correlated independent variables from the analysis, recognizing that the two

variables essentially are measuring the same factors. When correlated variables must be included in the analysis,

care must be taken in ascribing practical meaning to the partial regression coefficients and to the coefficients of

partial correlation. However, multicollinearity causes no special problem for inferences associated with the

overall regression model, such as F test for the significance of the regression effect, confidence intervals for the

mean of the dependent variable, and prediction intervals for individual values of the dependent variable. Of

course, for any interval estimates the values of the independent variables should be within the ranges of values

included in the sample data.

Another area of specific concern in multiple regressions and multiple correlation analysis is the possibility that

successive observed values of the dependent variables are correlated rather than uncorrelated. The existence of

such a correlation is called autocorrelation. The assumption that the successive values of the dependent variable

are uncorrelated has already been identified as a principal assumption in simple regression and simple

correlation analysis. However, in simple analysis the existence of such a correlation is easier to observe than a

multiple analysis. Typically, autocorrelation occurs when values of the dependent variable are collected as time

series values, that is, when they are collected in a series of time periods. When successive values of the

dependent variable are correlated values, the point estimate of the dependent variable based on the multiple

regression equation is not affected. However, the standard error associated witch each partial regression

coefficient bk is understated, and the value of the standard error of estimate is understated. The results is that the

prediction and confidence intervals are narrower (more precise) than they should be, and null hypotheses

concerning the absence of relationship are rejected too frequently. In terms of correlation analysis, the

coefficients of multiple determinations and multiple correlations are both overstated in value.

Bibliography

1. Andrei T.,Stancu S.,Pele T.D.- Statistic. Teorie i aplicaii, Editura Economic, 2002

2. Baron, T.Biji, E.Tovissi, L.Isaic-Maniu, A.- Statistic teoretic i economic, Editura Didactic i

Pedagogic, Bucureti, 1996

3. Biji E.,Baron T.- Statistic teoretic i economic, Editura Didactic i Pedagogic, Bucureti, 1966

4. Biji M., Biji E., Lilea E., Anghelache C.- Tratat de statistic, Editura Economic, Bucureti, 2002

5. Bdi, M., Cristache, S.- Statistic- Aplicaii practice, Editura Mondan, 1998

6. Isaic-Maniu, Al, Mitru, C.,Voineagu V.- Statistica pentru managementul afacerilor,Editura

economic, Bucureti, 1997

7. Jaba, E.- Statistic,Editura Economic, Bucuresti, 1998

8. Mihoc, Gh., Craiu, V.- Tratat de statistica matematica, Editura Academiei, Bucuresti, 1976

9. Trebici, V.- Mic enciclopedie de statistic,Editura tiinific, Bucureti, 1985

712

- CH14 Exercises SolutionsUploaded byhelloagainhello
- Chapter 3 Methods and Procedures This ChapterUploaded bychocoholic potchi
- Relative Valuation (Long Version)Uploaded byHernán Covarrubias
- MPRA Paper 13560Uploaded byhuweida
- stats_project.docxUploaded bySyeda Raiha Raza Gardezi
- Interpreting Multiple RegressionUploaded byRalph Wajah Zwena
- Data Analysis of Electrostatic Charge in Finish Ball MillsUploaded byMeteor Itis
- jurnalUploaded byVinsensia Dita
- Chapter 11Uploaded byDaniel Li
- regcorr5Uploaded byramakrishn34aanantha
- Lecture 2. Relaxing the Assumptions of CLRM_0Uploaded byJavohir Vahobov
- xyApr6Lec26Uploaded byIngga Permana
- Answers.docxUploaded byVihaan Kohli
- RegressionUploaded bypaliwalmanoj
- Correlation and RegressionUploaded bysujnahere7435
- High Risk Stress In Kelatan Schools PDFUploaded byMelody Amina Edwards
- 971Uploaded byAndreea Munteanu
- Adapting Teacher Interventions to Student Needs During Cooperative Learning How to Improve Student Problem Solving and Time on-TaskUploaded byElbHuargo
- 320C10Uploaded byKara Mhisyella Assad
- AMBO201 QAMR AssignmentUploaded byAnant Bhargava
- pearson_On the Reconstruction of the Stature of Prehistoric Races.pdfUploaded byPaula Veiga
- Projection and RegressionUploaded bysambi619
- Uji NormalitasUploaded byThsb Desta
- RegressionUploaded byAnonymous m8oCtJB
- laine reed linreg project reportUploaded byapi-251525028
- Arteco nr. 4Uploaded byDeea Melisse
- Prediction of Construction Project Performance using Regression Analysis and Artificial Neural NetworkUploaded byIJRASETPublications
- Part3. 实用教程--Practical Regression and ANOVA using RUploaded byapi-19919644
- 1.2b Fitting Lines With Technology Cheat SheetUploaded bybobjanes13
- Evaluating Semantic Analysis Methods for Short Answer Grading Using Linear RegressionUploaded byGlobal Research and Development Services

- Fiancial System and AuditingUploaded byhassan_401651634
- Consultancy and How to Write a Consultancy ReportUploaded byhassan_401651634
- postgradcvsresearch_08.pdfUploaded bydaskhago
- Reward Strategy ImplementationUploaded byhassan_401651634
- Ryanair FY2017 Annual ReportUploaded byhassan_401651634
- Ryanair FY2017 Annual ReportUploaded byhassan_401651634
- DBA SIA Module BookletUploaded byhassan_401651634
- 6556-24662-1-PBUploaded byhassan_401651634
- Business Analysis ReportUploaded byhassan_401651634
- PhD Academic CVUploaded byAndreeaAlexandraAnton
- corporate financial managementUploaded byhassan_401651634
- Human Resource managementUploaded byhassan_401651634
- Business OrganisationUploaded byhassan_401651634
- 1.pdfUploaded byhassan_401651634
- 1.pdfUploaded byhassan_401651634
- computer platform........new.docUploaded byhassan_401651634
- f2Uploaded byhassan_401651634
- New Microsoft Word Document (3)Uploaded byhassan_401651634
- New Microsoft Word Document (3)Uploaded byhassan_401651634
- Managing ChangeUploaded byhassan_401651634
- Assignment Computer PlatformsUploaded byhassan_401651634
- Assignment Computer PlatformsUploaded byhassan_401651634
- Business Analysis ReportUploaded byhassan_401651634
- Harvard ReferencingUploaded byhassan_401651634
- swot analysisUploaded byhassan_401651634
- Strategic Analysis of Marks and Spencer (M&S) GroupUploaded byhassan_401651634
- Microsoft WordUploaded byhassan_401651634
- Marks And Spencer One Of UK's Leading RetailersUploaded byhassan_401651634
- Human Resource managementUploaded byhassan_401651634

- A Software Development ProcessUploaded byhimanshi
- lesson planUploaded byapi-439144980
- Huizenga, L.a. - Mt 1,1, 'Son of Abraham' as Chris to Logical Category - HBT 30(2008), 103-113Uploaded bySteveMTN
- Swami Sivananda-A Modern SageUploaded by.Equilibrium.
- ‘It Made Me Feel Powerful’ Women’s Gendered Embodiment and Physical Empowerment in the Martial ArtsUploaded byLuis Lanca
- Vastu Purusha Space GridUploaded byGaurav Pahuja
- Me and DambudzoUploaded byfanouska
- Grof Stanislav Modern Consciousness ResearchUploaded byMojka Magu
- LEAP ResearchUploaded byfuturearuna
- Glory- Phsics 32.1 ReportUploaded byGabriel Rafael S. Viray
- Teaching Language ConstructionUploaded byMerita Devi Agni
- Understandingof5Swithinajapanese.pdfUploaded byGLENDA ROSE BORBAJO
- st pierre choral responding handoutUploaded byapi-313154885
- Bộ Đề Thi Thử TN Quốc Gia (With KeyUploaded byBlissKed
- Yaga Laughter Funny Funny Healing PowerUploaded byBob Cook
- cep reflectionUploaded byapi-302381669
- Au Awards & Achievements Criteria HandbookUploaded byarlsphinx
- Block, Ned - The Mind as Software in the BrainUploaded bychocopandini
- PowerPoint - Problem SolvingUploaded byRonny Aditya Tampake
- Retail Module 1- Customer ServiceUploaded byIspas Ioana
- MIS Module 1Uploaded byjsji14
- OperationalismUploaded bythadikkaran
- Term Paper Cross Cultural IssuesUploaded byRaghuram Indupuri
- Why Study Philosophy_Peter HackerUploaded byjoakinen
- Acct 530Uploaded bysocalsurfy
- praise and worship songsUploaded byshandawanda
- A system of logicUploaded byCezara Enăștescu
- Bennett Consolidating Music ScenesUploaded bymaestro_barth
- Cosmetics in Ayurveda Concept and PracticeUploaded byAnanthram Sharma
- A conceptual framework for implementation fidelity.pdfUploaded bySalvador