You are on page 1of 9

Parametric versus non-parametric A potential source of confusion in working out what statistics to use in analysing data is whether your

data allows for parametric or non-parametric statistics. The importance of this issue cannot be underestimated! If you get it wrong you risk using an incorrect statistical procedure or you may use a less powerful procedure. Non-paramteric statistical procedures are less powerful because they use less information in their calulation. For example, a parametric correlation uses information about the mean and deviation from the mean while a non-parametric correlation will use only the ordinal position of pairs of scores. The basic distinction for paramteric versus non-parametric is:

If your measurement scale is nominal or ordinal then you use nonparametric statistics If you are using interval or ratio scales you use parametric statistics.

There are other considerations which have to be taken into account: You have to look at the distribution of your data. If your data is supposed to take parametric stats you should check that the distributions are approximately normal. The best way to do this is to check the skew and Kurtosis measures from the frequency output from SPSS. For a relatively normal distribution: skew ~= 1.0 kurtosis~=1.0 If a distribution deviates markedly from normality then you take the risk that the statistic will be inaccurate. The safest thing to do is to use an equivalent non-parametric statistic. Non-parametric statistics Descriptive
Name Mode For what Notes

Central tendancy Greatest frequency

Median Central tendancy 50% split of distribution Range Distribution lowest and highest value

Association
Name For what Notes based on rank order of data

Spearman's Rho Correlation Kendall's Tau Chi square Correlation Tabled data

Descriptive vs. Inferential


Statistical procedures can be divided into two major categories: descriptive statistics and inferential statistics. Before discussing the differences between descriptive and inferential statistics, we must first be familiar with two important concepts in social science statistics: population and sample. A population is the total set of individuals, groups, objects, or events that the researcher is studying. For example, if we were studying employment patterns of recent U.S. college graduates, our population would likely be defined as every college student who graduated within the past one year from any college across the United States. A sample is a relatively small subset of people, objects, groups, or events, that is selected from the population. Instead of surveying every recent college graduate in the United States, which would cost a great deal of time and money, we could instead select a sample of recent graduates, which would then be used to generalize the findings to the larger population. Descriptive Statistics Descriptive statistics includes statistical procedures that we use to describe the population we are studying. The data could be collected from either a sample or a population, but the results help us organize and describe data. Descriptive statistics can only be used to describe the group that is being studying. That is, the results cannot be generalized to any larger group. Descriptive statistics are useful and serviceable if you do not need to extend your results to any larger group. However, much of social sciences tend to include studies that give us universal truths about segments of the population, such as all parents, all women, all victims, etc. Frequency distributions, measures of central tendency (mean, median, and mode), and graphs like pie charts and bar charts that describe the data are all examples of descriptive statistics. Inferential Statistics

Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample. That is, we can take the results of an analysis using a sample and can generalize it to the larger population that the sample represents. In order to do this, however, it is imperative that the sample is representative of the group to which it is being generalized. To address this issue of generalization, we have tests of significance. A Chisquare or T-test, for example, can tell us the probability that the results of our analysis on the sample are representative of the population that the sample represents. In other words, these tests of significance tell us the probability that the results of the analysis could have occurred by chance when there is no relationship at all between the variables we studied in the population we studied. Examples of inferential statistics include linear regression analyses, logistic regression analyses,ANOVA, correlation analyses, structural equation modeling, and survival analysis, to name a few.

The information contain normal and non normal data---read ony about normal distributed data. Collection of data: normal distrubution Normally distributed data exhibit predictable traits and probabilities. These characteristics are used to define rules that identify control violations. The most common rules define conditions that would only be expected to occur by chance .3% of the time, provided the data are normally distributed. In practice, we are frequently confronted with data that is not normal. It is useful to understand how non-normal data behaves when it is analyzed by tools that are based on the normal distribution. This discussion will compare the results of 2 data sets with similar means and standard deviations, but different distributions. This data is total minutes spent in the emergency room. The time is measured from the moment the patient enters the ER to the time recorded when the person is discharged from ER. The data is total time per person. The first step to take is to look at how the data is distributed. This will be done with a process capability chart. There are no specifications for this data which will cause some of the statistics to be unavailable. However, the value of this is to see the shape of the distribution and verify the mean and variation of the data. The first chart shows that the data is approximately normally distributed. The mean, mode and median are very close to being equal. The data show very little skewness. The mean is 166.9, standard deviation is 76.1 with 24 cases.

The second process capability chart shows a very different picture. The mean for this set of data is 167.2 with a standard deviation of 82.5 and 24 cases. One of the obvious features of this distribution is that it is bi-modal. In normally distributed data, the mean = median = mode. The bi-modal feature clearly violates the relationship of normal data. The presence of bi-modal distributions is very common in certain settings. There are many reasons why bi or multi-modal data may be unavoidable such as demands for services. There may be natural peaks or modes during certain times of the day or certain days of the week.

Although visually, it appears that these data sets exhibit normal and non-normal tendencies, the next examples are further evidence. The following plots are probability plots. The probability plot draws a theoretical line through the data points and evaluates how the actual data points adhere to the theoretical normal distribution. The plot is augmented by the p-value. When the p-value is smaller than a critical value, .05 in this discussion, we reject the hypothesis that the distribution is normal. In this case, the conclusion is that the distribution is nonnormal. If the p-value if greater than .05, we do not have enough evidence to reject the hypothesis that the distribution is normal. The following graph is for the same data set that appears visually to be normally distributed in the process capability plot.

The p-value for this data is .467. To reiterate, since .467 > .05, we cannot reject the hypothesis that the data is normal. This is further evidence that this data is normally distributed. In contrast, the second set of data follows a very different pattern in the probability plot as seen in the next graph.

The data points are not scattered randomly about the theoretical line. There is also wider divergence from the line than is shown with the normal set of data. The p-value for this data is .046. Since .046 < .05, we reject the hypothesis that the data is normally distributed. Now that we have both visual and statistical evidence that one set of data is approximately normally distributed and one is not, we will proceed to see how the different data sets behave in a variable control chart. The data points are individual values. The most appropriate chart is the I-chart. The data for the first data set does not violate any control rules. Calculation of co releation coefficient:

Correlation Co-efficient Definition: A measure of the strength of linear association between two variables. Correlation will always between -1.0 and +1.0. If the correlation is positive, we have a positive relationship. If it is negative, the relationship is negative. Formula: Correlation Co-efficient : Correlation(r) =[ NXY - (X)(Y) / Sqrt([NX2 - (X)2][NY2 - (Y)2])] where N = Number of values or elements

X = First Score Y = Second Score XY = Sum of the product of first and Second Scores X = Sum of First Scores Y = Sum of Second Scores X2 = Sum of square First Scores Y2 = Sum of square Second Scores

Correlation Co-efficient Example: To find the Correlation of

X Values Y Values 60 3.1 61 3.6 62 3.8 63 4 65 4.1

Step 1: Count the number of values. N=5 Step 2: Find XY, X2, Y2 See the below table

X Y Value Value 60 61 62 63 65 3.1 3.6 3.8 4 4.1

X*Y 60 * 3.1 = 186 61 * 3.6 = 219.6 62 * 3.8 = 235.6 63 * 4 = 252 65 * 4.1 = 266.5

X*X 60 * 60 = 3600 61 * 61 = 3721 62 * 62 = 3844 63 * 63 = 3969 65 * 65 = 4225

Y*Y 3.1 * 3.1 = 9.61 3.6 * 3.6 = 12.96 3.8 * 3.8 = 14.44 4 * 4 = 16 4.1 * 4.1 = 16.81

Step 3: Find X, Y, XY, X2, Y2.

X = 311 Y = 18.6 XY = 1159.7 X2 = 19359 Y2 = 69.82 Step 4: Now, Substitute in the above formula given. Correlation(r) =[ NXY - (X)(Y) / Sqrt([NX2 - (X)2][NY2 - (Y)2])] = ((5)*(1159.7)-(311)*(18.6))/sqrt([(5)*(19359)-(311)2]*[(5)*(69.82)(18.6)2]) = (5798.5 - 5784.6)/sqrt([96795 - 96721]*[349.1 - 345.96]) = 13.9/sqrt(74*3.14) = 13.9/sqrt(232.36) = 13.9/15.24336 = 0.9119 This example will guide you to find the relationship between two variables by calculating the Correlation Co-efficient from the above steps.

You might also like