You are on page 1of 8

Final Project-Baseball Player Statistics Ashly McLain University of Central Oklahoma ORGL 3333-Data Analysis and Interpretation March

7, 2013

Part I:
In this report I will analyze quantitative variable data of 254 Major League Baseball players to help determine the correlation of a players salary and performance including batting average and homeruns. I will look at the highest paid player and compare him to the other players, how his performance compares to everyone elses. Below are the qualitative and quantitative variables we used to determine the analysis of this project.

Variable Name Team Salary Games Played Hits Homeruns Runs Batted In Batting Average

Variable Type Qualitative Qualitative Quantitative Quantitative Quantitative Quantitative Quantitative Quantitative n/a n/a

Data Type

Measurement Scale Nominal Nominal Ratio Ratio Ratio Ratio Ratio Ratio

Continuous Discrete Discrete Discrete Discrete Continuous

Part II:
By utilizing several charts calculating quantitative data throughout this project we can determine the best route to select the right players at the right price to match their performance per game. The first charts we will look at will be the frequency tables. The first is the salary distribution determining frequency of players salaries. This proves the majority of baseball players are paid at least 3,500,000.00 but not more than 5,500,000.00 salary per year. The class limits for this chart are divided into 16 classes with varying frequency ranges as annotated in the chart below. This table tells us the Mean, Median and mode for the salary of the baseball players. Based on this table the best way to judge salary of a player is to go Salary with the Mean salary so you dont pay too little or too Mean 4689717.22 much.
Standard Error Median Mode 301727.4371 3500000 380000

Classes $3,500,000.00 $5,500,000.00 $7,500,000.00 $9,500,000.00 $11,500,000.00 $13,500,000.00 $15,500,000.00 $17,500,000.00 $19,500,000.00 $21,500,000.00 $23,500,000.00 $25,500,000.00

Frequency 133 46 17 14 11 17 10 3 0 0 3 0

Salary Distribution
140 amount of players 120 100 80 60 40 20 0 Frequency

salary

$1,500,000.00

Salary Classes
$3,500,000.00 $5,500,000.00 $7,500,000.00 $9,500,000.00 $25,500,000.00 $11,500,000.00 1 2 3 4 5 6

$23,500,000.00 $13,500,000.00 $21,500,000.00

7 8 9 10

$15,500,000.00

11 12

$19,500,000.00

$17,500,000.00

13

As you can see, we cant make a decision based just on salary, but we need to compare it to something and I have chosen batting average and homeruns. Lets look at batting average first.

The Batting Bar Chart introduces the Batting Averages class and frequency of the respective class of all players. By grouping this data into intervals called classes and recording the number of observations that falls into each class gives a frequency distribution of six classes determining a batting average for all 254 players.

class limit batting average 0.09 0.15 0.21 0.27 0.33 0.39

Frequency 0 0 4 86 161 3

Batting Average Frequency


6

4 class limit batting average 3 Frequency

1 0 50 100 150 200

Based on this chart the majority of the players have a batting average of at least .27 and up to .33. Those players who have this batting average get paid higher than 5 million dollars, but not all of the players who have a high batting average make even one million dollars so maybe batting average isnt the best to base their pay on.

Lets now take a look at another comparison. The comparison is between salary and homeruns. This next chart is a scatterplot that displays the homeruns compared to salaries of the players.

HR Line Fit Plot


$25,000,000 $20,000,000 Salary $15,000,000 $10,000,000 $5,000,000 $0 0 200 400 HR 600 800 1000 Salary Predicted Salary Linear (Salary) Linear (Salary) Linear (Predicted Salary) y = 27943x + 1E+06 R = 0.5192

Based on this chart those who are highest paid dont have the most homeruns but average between 250 to 500 homeruns. There is a strong positive correlation demonstrated here.

Part III:

A) Salary Mean Median Mode Standard Deviation Skewness 4689717.22 3500000 380000 4808744.053 1.299066959 Starting point to negotiate Too low Too high

B)
Batting Average Mean Median Mode Standard Deviation Skewness 0.275527559 0.278 0.284 0.022528972 -0.562095159

c) Highest paid player: Name Giambi, Jason Team NYY Salary $23,428,571 G $1,705 H 1699 HR 364 RBI 1183 AVG 0.289

The highest paid player out of al the data is shown above. Jason doesnt have the highest in all categories, but he is paid what he is paid because he is consistent and there isnt much gap in the amount of games he has played versus his hits, homeruns and RBIs. D) Coefficient of determination = strength of salary and homeruns The 48.1 are explained by chance. P=3.30614E-06 < 0.05 A strong relationship

E) Ho: there is no relationship Ha: there is a relationship i. The 51.9 is the null and 48.1 are explained by chance for Salary and HR.

ii. The null is 15 and alternative hypotheses are 85 for Salary and AVG? iii. Determine the appropriate goodness-of-fit regression equation. B0 + B1 +B2 iv. Identify and interpret the following coefficients: 01. What is the value given for the slope?
Salary/AVG 1.52E-10 Salary/HR 5.81581E-42

02. What is the value given for the intercept?


Salary/HR 3.30614E-06 Salary/AVG 2.67E-07

v. What percentage of variation in the response variable can be explained by the explanatory variable for each pair?
vi. Comment on the strength of the relationship, if any, for each pair: P=3.30614E-06 < 0.05 has a

strong relationship. As shown in this report above there is a correlation between homeruns and batting average. The better the average and the more the homeruns the better the player is paid but you have to watch out for the outliers. These results will help in deciding pay because you can look at the average pay and start negotiating there. My suggestions for deciding on a pay scale would be to start out around the 10 million to 15 million price range.

You might also like