You are on page 1of 57

Page |1

PROJECT TITLE

INDIA

FAR FROM ITS DEVELOPMENT

Department of statistics
THE UNIVERSITY OF BURDWAN

Submitted By
Roll no:-BUR ST 2009/13

Page |2

Registration no:-478 of 2009-10

ACKNOWLEDGEMENT

In order to get us acquainted with the applications in statistics, the


Department of Statistics, University of Burdwan has introduced project work
to be carried out by each student for the M.Sc. Semester-4 exam. I have
been carrying out my project work to find the causal factor of diabetes under
the supervision of my teacher Dr. Arindam Gupta of our department.
In this respect, with a deep sense of gratitude I acknowledge my all teachers,
Department of Statistics, Burdwan University under whose active guidance I
have performed my project work.
I also convey my best regards to all the teachers and other non-teaching
members of our department.

I would like to thank all my friends for their hearty co-operation.

Date.

Department of Statistics.
Burdwan University.

Page |3

Introduction: Though we all say that INDIA our country is developing


now but if is far away from its destination in respect of other G8 countries &
its big competitor CHINA. In this project I will show that how India is differ
from other countries.
1. Data Description:
2. Objective: To find the hidden causes of difference between the G8 countries
,China & India in respect of different health related factor & in case of GDP.
3. Sources:
United Nations Population Division. 2009. World Population Prospects: The 2008 Revision.
New York, United Nations, Department of Economic and Social Affairs (advanced Excel tables),
Census reports and other statistical publications from national statistical offices,
Eurostat: Demographic Statistics,
Secretariat of the Pacific Community: Statistics and Demography Program me, and
U.S. Census Bureau: International Database.)
(http://www.who.int/immunization_monitoring/routine/en/).

4. Description of factors & countries: Here total 8 countries are considered as


emerging & developing economies according to the International Monetary
Funds World Economic Outlook Report, April 2010. The countries are ----------Emerging Countries
China, India.

Developing economic countries


Canada, France, Italy, Japan, United
kingdom, United States.

Here the variable under consideration is


1. gdp per person employed constant 1990 ppp dollars
2. Labor force total
3. Birth rate crude per 1000 people

Page |4
4. Death rate crude per 1000 people
5. Immunization dpt percentage of children ages 12-23 months
6. Life expectancy at birth total years
7. Population growth annual percentage
8. Rural population (% of total population)
9. Urban population(% of total population)

Development

Health

Population

GDP per person


employed
constant
1990 ppp
dollars

Immunization dpt
percentage
of children
ages 12-23
months

Birth rate crude per


1000 people

Labor force total

Life expectancy at
birth total
years

Death rate crude


per 1000
people
Population growth
annual
percentage
Rural population (% of
total population)

Urban population(% of
total population)

Page |5
5.

METHODOLOGY

First I plot the death rate in respect of immunization for different countries for
different years.

Page |6

Page |7

Findings:

In case of china at first life expectancy will rise as immunization increase and
after that life expectancy will certainly decreases. There from the graph we
see in case of china there is presence of autocorrelation i.e. there are some
other factor which influence on the life expectancy.

In case of India there is also presence of autocorrelation & one thing is


notable that if immunization level is at 70 then life expectancy is at high level
at 65 after that life expectancy will decreases.

In case G8 countries all the countries have presence of autocorrelation


except Japan.

To see the changes of death rate & life expectancy considering the
immunization I have plotted the Death rate crude Per 1000 people
vs. Life Expectancy for each countries. Here I have considered three
group of countries. Three groups are taken such that -------------

Page |8

Immunization lies between 70 to 85

Immunization level below 70

Immunization level greeter then 85

In the first group of countries we see observations of China & Japan are in a
scattered shape.

In the second group we see that in India, life expectancy is increasing though
death rate per 1000 people is more or less same.

Page |9

In the third group of countries we see that those countries have


immunization rate greeter then 85, in case of G8 countries observations are
in random pattern except United States & United Kingdom.
Now we are going to use the Principal Component Analysis (PCA)
Definition of PCA: Principal Components Analysis is a method that reduces data
dimensionality by performing a covariance analysis between factors. As such,
it is suitable for data sets in multiple dimensions, such as a large experiment
in gene expression. Lets take an example that illustrates how PCA works with
a microarray experiment:

P a g e | 10
At this point, it is helpful to recall that the goal of a PCA is to explain the variability
in a set of observed measures through as few as possible linear combinations
of them, which combinations are the principal components.
For example, suppose the following np matrix

P a g e | 11

First we use PCA for India


Correlations
Birth_ra
te
Birth_rate

Population Death_R Immuniza Life_Expecte


Growth
ate
tion
ncy

0.97240775 0.969659 -0.0186977


91

-0.9978552

Population_Gr
owth

0.972407
75

1 0.968998 -0.1458039
43

-0.9790156

Death_Rate

0.969659
91

0.96899843

Immunization

0.018697
7

-0.1458039

1 -0.1856009
0.185600
9

-0.9741803
0.06072412

P a g e | 12
Correlations
Birth_ra
te

Population Death_R Immuniza Life_Expecte


Growth
ate
tion
ncy

Life_Expectenc
y
0.997855
2

-0.9790156

0.974180
3

0.0607241
2

Labour_Force

0.994725
4

-0.9856297

0.975669
8

0.0974756

0.99833819

Ruralpercenta
ge

0.996574
8

0.97964615 0.974315 -0.0832094


11

-0.9994813

-0.9796462

0.99948133

Urbanpercenta
ge
0.996574
8

0.974315
1

0.0832093
9

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.9947254

0.9965748

-0.9965748

Population_Gr
owth

-0.9856297

0.97964615

-0.9796462

Death_Rate

-0.9756698

0.97431511

-0.9743151

0.0974756

-0.0832094

0.08320939

Life_Expectenc 0.99833819
y

-0.9994813

0.99948133

Labour_Force

-0.9990989

0.99909885

-0.9990989

-1

Urbanpercenta 0.99909885
ge

-1

Immunization

Ruralpercenta
ge

P a g e | 13
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulat


ce
on
ive

1 6.929682 5.921589
36
48

0.8662

0.8662

2 1.008092 0.976501
88
52

0.1260

0.9922

3 0.031591 0.003287
36
22

0.0039

0.9962

4 0.028304 0.026736
13
30

0.0035

0.9997

5 0.001567 0.000975
83
64

0.0002

0.9999

6 0.000592 0.000422
19
94

0.0001

7 0.000169 0.000169
25
25

0.0000
Ruralpercenta 0.9982 0.03064
ge
3
0.0000
1.0000

8 0.000000
00

2 factors will be retained by the


MINEIGEN criterion.

Factor
1.0000Pattern
Factor
1 Factor2
1.0000

Birth_rate

0.9941 0.09479
3

Population_Gr
owth

0.9871
8 0.03616

Death_Rate

0.9837
5 0.07765

Life_Expectenc
y
0.9977 0.05293
6
Urbanpercenta
ge
0.9982 0.03064
3
Labour_Force

0.9989 0.01599
5

Immunization

- 0.99340
0.1128
5

P a g e | 14
Eigenvectors
1

Birth_rate

0.3776
5

0.0944
1

Population_Gr
owth

0.3750
1

0.0360
1

Death_Rate

0.3737
0

0.0773
4

Immunization

0.0428
7

0.9894
0

Life_Expectenc
y

0.3790
3

0.0527
2

Labour_Force

0.3794
8

0.0159
2

Ruralpercenta
ge

0.3792
1

0.0305
2

Urbanpercenta
ge

0.3792
1

0.0305
2

Conclusion: Here we can construct two liner


combinations

First liner combination is


Y1 = 0.99823*Rural percentage + 0.99413*Birth_rate +
0.98718*Population_Gro
wth + 0.98375*
Variance Explained
Death_Rate - 0.99776*
by Each Factor
Life_Expectency
Factor1
Factor2
6.92968
24

1.0080929

P a g e | 15
-0.99823* Urbanpercentage-0.99895* Labour_Force-0.11285*
Immunization

Second liner combination is


Y2= 0.03064*Rural percentage + 0.09479*Birth_rate
-0.03616*Population_Growth -0.07765*
Death_Rate -0.05293*
Life_Expectency -0.03064* Urbanpercentage-0.01599*
Labour_Force+0.99340* Immunization

Here we will see that first principal component explain the total 86.82%
information & with this second principal component explains the 99.22%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 69.26% information about this
variable & where as second principal component explains near about 10%
information.

For China:

P a g e | 16
Correlations
Birth
rate

Death Immuniza
Rate
tion

Life
Expectancy

0.99568447 0.111645 -0.0669033


48

-0.9900038

Population_Gr
owth

0.995684
47

1 0.055276 -0.0859809
94

-0.9938804

Death_Rate

0.111645
48

0.05527694

0.6969124
2

-0.0309305

Immunization

0.066903
3

-0.0859809 0.696912
42

0.09690177

Birth_rate

Population
Growth

Life_Expectenc
y
0.990003
8

-0.9938804

0.030930
5

0.0969017
7

Labour_Force

0.986750
2

-0.9930929

0.005373
2

0.1080104
6

0.99943335

Ruralpercenta
ge

0.977366
38

0.98775912

- -0.1717671
0.061428
3

-0.9946585

Urbanpercenta
ge
0.977366
4

-0.9877591 0.061428
35

0.1717671
1

0.99465851

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.9867502

0.97736638

-0.9773664

Population_Gr
owth

-0.9930929

0.98775912

-0.9877591

Death_Rate

-0.0053732

-0.0614283

0.06142835

Immunization

0.10801046

-0.1717671

0.17176711

Life_Expectenc 0.99943335
y

-0.9946585

0.99465851

P a g e | 17
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Labour_Force

-0.9970516

0.99705156

-0.9970516

-1

Urbanpercenta 0.99705156
ge

-1

Ruralpercenta
ge

Eigenvalues of the Correlation Matrix:


Total = 8 Average = 1
Eigenval Differen Proport Cumulati
ue
ce
ion
ve
1 5.974051 4.266020
01
20

0.7468

0.7468

2 1.708030 1.407859
81
90

0.2135

0.9603

3 0.300170 0.286016
91
27

0.0375

0.9978

4 0.014154 0.011322
63
14

0.0018

0.9996

5 0.002832 0.002138
49
68

0.0004

0.9999

6 0.000693 0.000627
81
48

0.0001

1.0000

7 0.000066 0.000066
33
33

0.0000

1.0000

8 0.000000
00

0.0000

1.0000

2 factors will be retained by the MINEIGEN criterion.

P a g e | 18
Eigenvectors
1
Birth_rate

Variance Explained
2 by Each Factor

Factor1
- 0.0792
0.4049
5.974051
5
8
0

Population_Gr
owth

- 0.0476
0.4073
7
0

Death_Rate

0.0015 0.7092
7
4

Immunization

0.0577 0.6959
2
0

Factor2
1.7080308

Life_Expectenc 0.4084
y
0 0.0325
6
Labour_Force

Ruralpercenta
ge

0.4085
7 0.0171
4
0.4080 0.0374
8
1

Urbanpercenta 0.4080 0.0374


ge
8
1

Conclusion: Here we can construct two liner


combinations

Factor Pattern
Factor Factor
1
2
Labour_Force

0.9986
1

0.0224
0

Life_Expectenc 0.9982
y
1

0.0425
5

Urbanpercenta 0.9974
ge
1

0.0488
9

Birth_rate

0.9898
4

0.1035
8

Population_Gr
owth

0.9955
1

0.0623
1

Ruralpercenta
ge

0.9974
1

0.0488
9

Death_Rate

0.0038
4

0.9269
1

Immunization

0.1410
8

0.9094
8

P a g e | 19

First liner combination is


Y1 = -0.99741*Rural percentage -0.98984*Birth_rate
-0.99551*Population_Growth +0.00384*
Death_Rate +0.99821*
Life_Expectency +0.99741*
Urbanpercentage+0.99861*Labour_Force+0.14108* Immunization

Second liner combination is


Y2= -0.04889*Rural percentage +0.10358*Birth_rate
+0.06231*Population_Growth +0.92691*
Death_Rate -0.04255*
Life_Expectency +0.04889* Urbanpercentage-0.02240*
Labour_Force+0.90948* Immunization

Here we will see that first principal component explain the total 74.68%
information & with this second principal component explains the 96.03%
information & corresponding eigen values are positive & greater than 1.

Another thing is that factor 1 explains total 59.74% information about this
variable & where as second principal component explains near about 17.08%
information.

P a g e | 20

For Canada:
Correlations
Birth_ra
te

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

Birth_rate

1.00000

0.65862

-0.37488

-0.53213

-0.84769

Population_Gr
owth

0.65862

1.00000

-0.25296

-0.33260

-0.34934

Death_Rate

-0.37488

-0.25296

1.00000

0.14433

0.20530

Immunization

-0.53213

-0.33260

0.14433

1.00000

0.70055

Life_Expectenc -0.84769
y

-0.34934

0.20530

0.70055

1.00000

Labour_Force

-0.78267

-0.27643

0.15269

0.71838

0.98903

Ruralpercenta
ge

0.93172

0.48064

-0.25329

-0.67610

-0.97270

Urbanpercenta -0.93172
ge

-0.48064

0.25329

0.67610

0.97270

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.78267

0.93172

-0.93172

Population_Gr
owth

-0.27643

0.48064

-0.48064

Death_Rate

0.15269

-0.25329

0.25329

Immunization

0.71838

-0.67610

0.67610

Life_Expectenc
y

0.98903

-0.97270

0.97270

Labour_Force

1.00000

-0.94495

0.94495

Ruralpercenta
ge

-0.94495

1.00000

-1.00000

P a g e | 21
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Urbanpercenta
ge

0.94495

-1.00000

Eigenvalues of the Correlation Matrix:


Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulat


ce
on
ive

1 5.576307 4.475758
97
75

0.6970

0.6970

2 1.100549 0.360242
21
71

0.1376

0.8346

3 0.740306 0.246397
50
45

0.0925

0.9271

4 0.493909 0.421098
05
09

0.0617

0.9889

5 0.072810 0.061028
96
80

0.0091

0.9980

6 0.011782 0.007448
16
01

0.0015

0.9995

7 0.004334 0.004334
15
15

0.0005

1.0000

8 0.000000
00

0.0000

1.0000

2 factors will be retained by the MINEIGEN criterion.

1.00000

P a g e | 22
Eigenvectors
1

Birth_rate

- 0.2118
0.3945
1
5

Population_Gr
owth

- 0.5292
0.2300
6
8

Death_Rate

0.1328
6 0.7187
9

Immunization

0.3152 0.2008
2
7

Life_Expectenc 0.4084 0.1897


y
7
3
Labour_Force

0.3964 0.2726
2
2

Ruralpercenta
ge

0.4185 0.0620
6
6

Factor Pattern
Factor Factor
1
2
Urbanpercenta 0.9883
ge
9

0.0651
1

Life_Expectenc 0.9645
y
6

0.1990
4

Labour_Force

0.9361
2

0.2860
0

Immunization

0.7443
7

0.2107
2

Birth_rate

0.9316
9

0.2222
0

Ruralpercenta
ge

0.9883
9

0.0651
1

Population_Gr
owth
Variance Explained
by Each Factor
Death_Rate
Factor1
Factor2

0.5433
3

0.5552
3

0.3137
5

0.7540
6

Urbanpercenta 0.4185 0.0620


ge
6
6
Conclusion: Here we can construct two
liner combinations

First liner combination is


Y1 = -0.98839*Rural percentage
-0.93169*Birth_rate
-0.54333*Population_Growth +

5.5763080 1.1005492

P a g e | 23
0.31375*
Death_Rate + 0.96456* Life_Expectency +0.98839*
Urbanpercentage+0.93612* Labour_Force+0.74437* Immunization

Second liner combination is


Y2 =-0.06511*Rural percentage +0.22220*Birth_rate
+0.55523*Population_Growth -0.75406*
Death_Rate + 0.19904*
Life_Expectency+ 0.06511* Urbanpercentage+0.28600*
Labour_Force+0.21072* Immunization

Here we will see that first principal component explain the total 69.70%
information & with this second principal component explains the 83.46%
information & corresponding eigen values are positive & greater then 1.

Another thing is that factor 1 explains total 55.76% information about this
variable & where as second principal component explains near about
11.005% information.

For France:

P a g e | 24
Correlations
Birth
rate

Population
Growth

Death Immuniza
Rate
tion

Life
Expectancy

0.19045108

- -0.0502327
0.163276
6

0.10109229

Population_Gr
owth

0.190451
08

Death_Rate

0.163276
6

-0.4432427

Immunization

0.050232
7

0.47397151

0.530146
4

0.87629617

Life_Expectenc 0.101092
y
29

0.760789

0.698696

0.8762961
7

0.682066

Birth_rate

0.443242
7

0.4739715
1

0.760789

1 -0.5301464

-0.698696

Labour_Force

0.098017
8

0.82528718

0.8326272
5

0.98265336

Ruralpercenta
ge

0.104538
3

-0.7750252 0.609769 -0.8821888


75

-0.9878581

Urbanpercenta 0.104538
ge
26

0.7750252

0.609769
8

0.8821888
1

0.98785807

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

0.0980178

-0.1045383

0.10453826

0.82528718

-0.7750252

0.7750252

-0.682066

0.60976975

-0.6097698

0.83262725

-0.8821888

0.88218881

Life_Expectenc 0.98265336
y

-0.9878581

0.98785807

Population_Gr
owth
Death_Rate
Immunization

P a g e | 25
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Labour_Force

-0.9819052

0.98190522

-0.9819052

-1

Urbanpercenta 0.98190522
ge

-1

Ruralpercenta
ge

Eigenvalues of the Correlation Matrix:


Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulat


ce
on
ive

1 5.829640 4.772367
27
00

0.7287

0.7287

2 1.057273 0.462448
27
07

0.1322

0.8609

3 0.594825 0.144818
21
68

0.0744

0.9352

4 0.450006 0.398119
53
54

0.0563

0.9915

5 0.051886 0.040788
99
85

0.0065

0.9980

6 0.011098 0.005828
14
54

0.0014

0.9993

7 0.005269 0.005269
60
60

0.0007

1.0000

8 0.000000
00

0.0000

1.0000

2 factors will be retained by the MINEIGEN criterion.

P a g e | 26
Eigenvectors

Factor Pattern

0.0538
7

0.9390
4

Population_Gr
owth

0.3306
2

0.1757
4

Death_Rate

0.2925
1

Immunization

Birth_rate

Factor Factor
1
2
Life_Expectenc
y

0.9940
2 0.0361
8

0.1343
2

Labour_Force

0.9922
5 0.0186
0

0.3590
4

0.2529
9

Urbanpercenta 0.9880
ge
7 0.0442
5

Life_Expectenc 0.4116
y
9

0.0351
8

Immunization

0.8669
0 0.2601
3

Labour_Force

0.0180
9

Population_Gr
owth

0.7982 0.1807
7
0

Death_Rate

0.7062 0.1381
5
1

Ruralpercenta
ge

- 0.0442
0.9880
5
7

Birth_rate

0.1300 0.9655
6
6

Ruralpercenta
ge

0.4109
6
0.4092
3

0.0430
4

Urbanpercenta 0.4092
ge
3

0.0430
4

Conclusion: Here we can construct two


liner combinations

First liner combination is


Y1 = -0.98807*Rural percentage + 0.13006*Birth_rate +
0.79827*Population_Growth -0.70625*
Death_Rate +0.99402*
Life_Expectency
+0.98807*
Variance Explained by
Urbanpercentage+0.9
Each Factor
9225*
Factor1
Factor2
Labour_Force+0.86690
* Immunization
5.8296403
1.0572733

P a g e | 27
Second liner combination is
Y2= 0.04425*Rural percentage +0.96556*Birth_rate
+0.18070*Population_Growth -0.13811*
Death_Rate -0.03618*
Life_Expectency -0.04425* Urbanpercentage-0.01860* Labour_Force0.26013* Immunization

Here we will see that first principal component explain the total 72.87%
information & with this second principal component explains the 86.09%
information & corresponding eigen values are positive & greater then 1.

Another thing is that factor 1 explains total 58.2% information about this
variable & where as second principal component explains near about 10.57%
information.

For Italy:

P a g e | 28
Correlations
Birth_ra
te

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

Birth_rate

1.00000

0.43653

-0.35516

0.15466

-0.02767

Population_Gr
owth

0.43653

1.00000

0.08599

0.23929

0.76261

-0.35516

0.08599

1.00000

-0.06429

0.24502

0.15466

0.23929

-0.06429

1.00000

-0.00729

Life_Expectenc -0.02767
y

0.76261

0.24502

-0.00729

1.00000

Labour_Force

0.55002

0.89418

0.08796

0.19730

0.77348

Ruralpercenta
ge

-0.07121

-0.81498

-0.24408

-0.09571

-0.98421

Urbanpercenta
ge

0.07121

0.81498

0.24408

0.09571

0.98421

Death_Rate
Immunization

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

0.55002

-0.07121

0.07121

Population_Gr
owth

0.89418

-0.81498

0.81498

Death_Rate

0.08796

-0.24408

0.24408

Immunization

0.19730

-0.09571

0.09571

Life_Expectenc
y

0.77348

-0.98421

0.98421

Labour_Force

1.00000

-0.84026

0.84026

Ruralpercenta
ge

-0.84026

1.00000

-1.00000

Urbanpercenta
ge

0.84026

-1.00000

1.00000

P a g e | 29
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulativ


ce
on
e

1 4.612374 2.981094
98
04

0.5765

0.5765

2 1.631280 0.697792
94
79

0.2039

0.7805

3 0.933488 0.283256
15
89

0.1167

0.8971

4 0.650231 0.514889
26
26

0.0813

0.9784

5 0.135342 0.105971
01
57

0.0169

0.9953

6 0.029370 0.021458
43
21

0.0037

7 0.007912 0.007912
22
22

Factor
0.9990 Pattern
Factor Factor
1
2
1.0000

0.0010
Urbanpercenta 0.9708
ge
4
0.0000
1.0000

8 0.000000
00

2 factors will be retained by the


MINEIGEN criterion.

0.1807
3

Life_Expectenc 0.9339
y
5

0.2751
4

Labour_Force

0.9337
0

0.2685
7

Population_Gr
owth

0.9152
2

0.2250
7

Ruralpercenta
ge

- 0.1807
0.9708
3
4

Birth_rate

0.2705
3

0.8505
5

Immunization

0.1690
4

0.3970
3

Death_Rate

0.2094
6

0.6974
2

P a g e | 30
Eigenvectors
1

Birth_rate

0.1259
6

0.6659
4

Population_Gr
owth

0.4261
5

0.1762
2

Death_Rate

0.0975
3

0.5460
5

Immunization

0.0787
1

0.3108
6

Life_Expectenc 0.4348
y
7

0.2154
2

Labour_Force

0.4347
5

0.2102
7

Ruralpercenta
ge

0.4520
5

0.1415
0

Urbanpercenta 0.4520
ge
5

0.1415
0

Conclusion: Here we can construct two liner

combinations

First liner combination is


Y1 = -0.97084*Rural percentage + 0.27053*Birth_rate +
0.91522*Population_Growth + 0.20946*
Death_Rate +0.93395*
Life_Expectency +0.97084* Urbanpercentage+0.93370*
Labour_Force+0.16904* Immunization
Second liner combination
is
Y2 = 0.18073*Rural
percentage +
0.85055*Birth_rate

Variance Explained by
Each Factor
Factor1
4.6123750

Factor2
1.6312809

P a g e | 31
+0.22507*Population_Growth -0.69742*
Death_Rate -0.27514*
Life_Expectency -0.18073* Urbanpercentage+0.26857*
Labour_Force+0.39703* Immunization

Here we will see that first principal component explain the total 57.65%
information & with this second principal component explains the 78.05%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 46.12% information about this
variable & where as second principal component explains near about 16.31%
information.

For Japan:
Correlations
Birth_ra
te
Birth_rate

1.00000

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy
0.92534

-0.90576

-0.56983

-0.88567

P a g e | 32
Correlations
Birth_ra
te
Population_Gr
owth

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

0.92534

1.00000

-0.80518

-0.61445

-0.83360

Death_Rate

-0.90576

-0.80518

1.00000

0.50950

0.94337

Immunization

-0.56983

-0.61445

0.50950

1.00000

0.58974

Life_Expectenc -0.88567
y

-0.83360

0.94337

0.58974

1.00000

Labour_Force

-0.12281

-0.02699

0.26533

-0.34047

0.33058

Ruralpercenta
ge

0.89328

0.79478

-0.95790

-0.47859

-0.97631

Urbanpercenta -0.89328
ge

-0.79478

0.95790

0.47859

0.97631

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.12281

0.89328

-0.89328

Population_Gr
owth

-0.02699

0.79478

-0.79478

0.26533

-0.95790

0.95790

-0.34047

-0.47859

0.47859

Life_Expectenc
y

0.33058

-0.97631

0.97631

Labour_Force

1.00000

-0.43386

0.43386

Ruralpercenta
ge

-0.43386

1.00000

-1.00000

Urbanpercenta
ge

0.43386

-1.00000

1.00000

Death_Rate
Immunization

P a g e | 33
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulat


ce
on
ive

1 5.940154 4.505932
61
84

0.7425

0.7425

2 1.434221 1.085418
78
25

0.1793

0.9218

3 0.348803 0.166956
53
44

0.0436

0.9654

4 0.181847 0.126880
08
49

0.0227

0.9881

5 0.054966 0.024398
59
30

0.0069

0.9950

6 0.030568 0.021130
30
19

0.0038

0.9988

7 0.009438 0.009438
11
11

0.0012

1.0000

8 0.000000
00

0.0000

1.0000

2 factors will be retained by the MINEIGEN criterion.

P a g e | 34
Eigenvectors
1

Birth rate

0.3896
4

0.0988
4

Population
Growth

0.3658
8

0.1989
3

Death Rate

0.3951
4

0.0429
0

Immunization

0.2491
3

0.5528
2

Life
Expectancy

0.4027
4

0.0486
7

Labour Force

0.1130
7

0.7692
9

Rural
percentage

0.4007
1

0.1565
7

0.4007
1

0.1565
7

Urban
percentage

Conclusion: Here we can construct two


liner combinations

First liner combination is


Y1 -0.97662*Rural
percentage
-0.94965*Birth_rate
-0.89174*Population_G
rowth + 0.96305*

Factor Pattern
Factor Factor
1
2
Life_Expectenc 0.9815
y
7

0.0582
9

Urbanpercenta 0.9766
ge
2

0.1875
1

Death_Rate

0.9630
5

0.0513
8

Population_Gr
owth

- 0.2382
0.8917
3
4

Birth_rate

- 0.1183
0.9496
7
5

Ruralpercenta
ge

0.9766 0.1875
2
1

Labour_Force
Variance Explained by
Each Factor
Immunization
Factor1
Factor2
5.9401546

1.4342218

0.2755
8

0.9213
0

0.6071
8

0.6620
5

P a g e | 35
Death_Rate +0.98157* Life_Expectency +0.97662*
Urbanpercentage+0.27558*Labour_Force +0.60718* Immunization
Second liner combination is
Y2= -0.18751*Rural percentage + 0.11837*Birth_rate
+0.23823*Population_Growth +0.05138*
Death_Rate +0.05829*
Life_Expectency +0.18751* Urbanpercentage+0.92130* Labour_Force
-0.66205* Immunization

Here we will see that first principal component explain the total 74.25%
information & with this second principal component explains the 92.18%
information & corresponding eigen values are positive & greater then 1.

Another thing is that factor 1 explains total 59.401% information about this
variable & where as second principal component explains near about 14.34%
information.

For United States:

P a g e | 36
Correlations
Birth_ra
te

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

Birth_rate

1.00000

0.90932

0.48947

-0.84659

-0.82652

Population_Gr
owth

0.90932

1.00000

0.67836

-0.72240

-0.88435

Death_Rate

0.48947

0.67836

1.00000

-0.19973

-0.82964

-0.84659

-0.72240

-0.19973

1.00000

0.60050

Life_Expectenc -0.82652
y

-0.88435

-0.82964

0.60050

1.00000

Labour_Force

-0.90738

-0.92875

-0.74538

0.71727

0.98132

Ruralpercenta
ge

0.89601

0.94334

0.77842

-0.69884

-0.97983

Urbanpercenta -0.89601
ge

-0.94334

-0.77842

0.69884

0.97983

Immunization

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.90738

0.89601

-0.89601

Population_Gr
owth

-0.92875

0.94334

-0.94334

Death_Rate

-0.74538

0.77842

-0.77842

Immunization

0.71727

-0.69884

0.69884

Life_Expectenc
y

0.98132

-0.97983

0.97983

Labour_Force

1.00000

-0.99519

0.99519

Ruralpercenta
ge

-0.99519

1.00000

-1.00000

Urbanpercenta
ge

0.99519

-1.00000

1.00000

P a g e | 37
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue

Differen Proporti
ce
on

Cumulati
ve

1 6.791708 5.857499
64
25

0.8490

0.8490

2 0.934209 0.811389
39
13

0.1168

0.9657

3 0.122820 0.027552
26
65

0.0154

0.9811

4 0.095267 0.049057
61
50

0.0119

0.9930

5 0.046210 0.038040
12
42

0.0058

0.9988

6 0.008169 0.006555
70
41

0.0010

0.9998

7 0.001614 0.001614
28
28

0.0002

1.0000

8 0.000000
00

0.0000

1.0000

1 factor will be retained by the MINEIGEN criterion.

P a g e | 38
Eigenvectors

Factor Pattern
1

Birth_rate

0.35462

Factor
1
Urbanpercenta
ge

0.9957
1

Labour_Force

0.9930
8

Life_Expectenc
y

0.9677
6

Life_Expectenc 0.37135
y

Immunization

0.7439
5

Labour_Force

0.38106

Death_Rate

Ruralpercenta
ge

0.38207

0.7499
5

Birth_rate

0.9241
7

Population_Gr
owth

0.9571
1

Ruralpercenta
ge

0.9957
1

Population_Gr
owth

0.36726

Death_Rate

0.28777

Immunization

0.28547

Urbanpercenta 0.38207
ge

Conclusion: Here we can construct the liner combination.

First liner combination is

Variance Explained by Each


Factor
Factor1
6.7917086

P a g e | 39
Y1 = -0.99571*Rural percentage -0.92417*Birth_rate
-0.95711*Population_Growth -0.74995*
Death_Rate +0.96776*
Life_Expectency +0.99571* Urbanpercentage+0.99308* Labour_Force +
0.74395* Immunization

Here we will see that first principal component explain the total 84.90% information
& eigen value is positive & greater then 1.
Another thing is that factor 1 explains total 67.91% information about this
variables.

For United Kingdom:


Correlations
Birth_ra
te
Birth_rate
Population_Gr
owth
Death_Rate
Immunization

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

1.00000

-0.22428

0.45618

-0.17259

-0.52383

-0.22428

1.00000

-0.93436

-0.39017

0.92604

0.45618

-0.93436

1.00000

0.19475

-0.98137

-0.17259

-0.39017

0.19475

1.00000

-0.16345

P a g e | 40
Correlations
Birth_ra
te

Population_Gr Death_R Immuniza Life_Expecte


owth
ate
tion
ncy

Life_Expectenc -0.52383
y

0.92604

-0.98137

-0.16345

1.00000

Labour_Force

-0.28063

0.98852

-0.93065

-0.40645

0.93326

Ruralpercenta
ge

0.63046

-0.87983

0.95484

0.13343

-0.98481

Urbanpercenta -0.63046
ge

0.87983

-0.95484

-0.13343

0.98481

Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate

-0.28063

0.63046

-0.63046

0.98852

-0.87983

0.87983

Death_Rate

-0.93065

0.95484

-0.95484

Immunization

-0.40645

0.13343

-0.13343

Life_Expectenc
y

0.93326

-0.98481

0.98481

Labour_Force

1.00000

-0.90068

0.90068

Ruralpercenta
ge

-0.90068

1.00000

-1.00000

Urbanpercenta
ge

0.90068

-1.00000

1.00000

Population_Gr
owth

Eigenvalues of the Correlation Matrix:


Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulati


ce
on
ve

1 6.022322 4.680536
19
49

0.7528

0.7528

P a g e | 41
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue

Differen Proporti Cumulati


ce
on
ve

2 1.341785 0.775681
70
10

0.1677

0.9205

3 0.566104 0.523620
60
39

0.0708

0.9913

4 0.042484 0.028413
21
58

0.0053

0.9966

5 0.014070 0.006424
63
12

0.0018

0.9983

6 0.007646 0.002060
51
35

0.0010
0.9993
Factor Pattern
Factor Factor
1.00001
2

7 0.005586 0.005586
0.0007
16
16
Life_Expectenc 0.9917
8 0.000000
0.0000
1.00008
y
00
Urbanpercenta 0.9856
ge
0

2 factors will be retained by the


Labour_Force
MINEIGEN criterion.

0.0623
9
0.1537
6

0.9543
8

0.2587
9

Population_Gr
owth

0.9413
3

0.2827
1

Death_Rate

0.9782
3

0.0001
7

Ruralpercenta
ge

0.9856
0

0.1537
6

Birth_rate

0.5263
5

0.6810
2

Immunization

0.2548
6

0.8245
8

P a g e | 42
Eigenvectors
1

Birth_rate

- 0.5879
0.2144
2
8

Population_Gr
owth

0.3835 0.2440
8
6

Death_Rate

0.3986 0.0001
2
5

Immunization

0.1038 0.7118
5
5

Life_Expectenc 0.4041
y
4 0.0538
6
Labour_Force

0.3889 0.2234
0
1

Ruralpercenta
ge

- 0.1327
0.4016
4
2

Urbanpercenta 0.4016
ge
2 0.1327
4

Conclusion: Here we can construct two liner combinations

First liner combination is


Y1 = -0.98560*Rural percentage -0.52635*Birth_rate +
0.94133*Population_
Growth -0.97823*
Variance Explained by Each
Death_Rate
Factor
+0.99178*
Factor1
Factor2
Life_Expectency
6.0223222
1.3417857

P a g e | 43
+0.98560* Urbanpercentage+0.95438* Labour_Force-0.25486*
Immunization
Second liner combination is
Y2 = 0.15376*Rural percentage + 0.68102*Birth_rate
+0.28271*Population_Growth -0.00017*
Death_Rate -0.06239*
Life_Expectency -0.15376* Urbanpercentage+0.25879* Labour_Force0.82458* Immunization

Here we will see that first principal component explain the total 75.28% information
& with this second principal component explains the 92.05% information &
corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 60.22% information about this variable
& where as second principal component explains near about 13.41% information.

Regression Analysis:

P a g e | 44

For India:

For understand the relationship between the variable we consider the


matrix plot, which is the diagram between the each variable under consideration
individually between two variables for each possible combination.

L INEAR REGRESSION MODEL:


The data consist n observations on a dependent variable Y & p independent
variableX1, X2,,Xp then the linear relationship between response variable(Y)
& p predictor variables is formulated as linear model
Y=0+1X1+2X2++pXp+
Where 0,1,,p are regression coefficients& is random disturbance or
error.

P a g e | 45

We may write the model equation in terms of the observations


yi =0+1xi1+2xi2++pxip+i

; i=1(1) n

7.1TESTING OF HYPOTHESIS IN A LINEAR MODEL:


Our null hypothesis: H0:j=j0 vs H1:jj0,where j0is chosen by
investigator. The test statistics is
(j^-j0)/s.e(j^)~tn-p-1
First model the life expectancy on various co-factors. The result is shown below
Here the first model for life-expectancy is shown below
Life.Expectency ~ Birth.rate + Population.Growth + Labour.Force
+ GDP + Urbanpercentage
Residuals:
Min

1Q

Median

-0.061357 -0.018017 -0.001179

3Q

Max

0.019070 0.033363

Coefficients:
Estimate

Std. Error

(Intercept)
***

4.211e+01

5.770e+00

Birth.rate

-1.807e-01

4.682e-02

Population.Growth
Labour.Force
GDP
**
Urbanpercentage

5.017e-01
2.946e-08
-2.566e-04
4.664e-01

4.650e-01
1.521e-08
8.102e-05
3.790e-01

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

t value
7.299
-3.860
1.079
1.937
-3.167
1.231

Pr(>|t|)
1.54e-05
0.00266 **
0.30367
0.07883 .
0.00897
0.24412

P a g e | 46

Residual standard error: 0.03107 on 11 degrees of freedom

Multiple R-squared:
0.9997
Adjusted R-squared:
F-statistic: 8226 on 5 and 11 DF, p-value: < 2.2e-16
Therefore the model can be written as

Life.Expectency = 42.11 -0.1807*Birth.rate +0.5017*Population.Growth +


2.946e-08* Labour.Force +0.0002566*GDP +0.4664*Urbanpercentage
In below we see the residual vs fitted histogram plot & Normal qq plot

P a g e | 47
Conclusion:
Here Birth Rate& GDP is highly significant & they have negatively related with
Life expectancy.
Labour force may significant & have positively related with life expectancy.
Here Multiple correlation coefficient is very high.
The histogram plot of residual shows that it is negatively skewed distribution.

Now we model the gdp on various co-factors for India.


Modelling of GDP:
GDP ~ Labour.Force +
Ruralpercentage
Residuals:
Min
-223.58

1Q

Median

-84.23

18.28

3Q
68.08

Max
277.49

Coefficients:

(Intercept)
Labour.Force

Estimate

Std. Error

-1.339e+05

6.380e+04

-2.099

0.05445 .

7.610e-05

2.308e-05

3.297

0.00529 **

Ruralpercentage 1.516e+03

t value

7.586e+02

1.998

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 140.7 on 14 degrees of freedom


Multiple R-squared: 0.9854
Adjusted R-squared:
0.9833

Pr(>|t|)

0.06555 .

P a g e | 48

F-statistic: 471.8 on 2 and 14 DF, p-value: 1.427e-13


Here the model is
GDP =-133900+7.61e-05*Labour.Force +1516*
Ruralpercentage
In below we see the Residual vs. density plot for the gdp & Normal qq plot

Conclusion:
Labour force is highly significant with Gdp & have positively related with Gdp.
Rural percentage may have significant effect on gdp & it is also positively
related with gdp.
Residual plot of gdp shows that it is normally distributed.
Multiple correlation coefficient is high.

P a g e | 49

For China: For understand the relationship between the variable we consider
the matrix plot, which is the diagram between the each variable under
consideration individually between two variables for each possible combination.

P a g e | 50

Here we model the life expectency


Life.Expectency ~ Birth.rate + Population.Growth + Labour.Force + GDP +
Urbanpercentage

P a g e | 51

Residuals:
Min

1Q

-0.030548 -0.012751

Median
0.001812

3Q
0.009514

Max
0.028484

Coefficients:
Estimate

Std. Error

t value

Pr(>|t|)
(Intercept)
3.20e-11 ***

4.599e+01

1.772e+00

Birth.rate
0.37024

-2.986e-02

3.196e-02

Population.Growth

1.338e-01

Labour.Force
05 ***
GDP
0.00119 **
Urbanpercentage
0.01627 *

2.975e-08
-1.074e-04
1.258e-01

3.418e-01
4.272e-09
2.479e-05
4.441e-02

25.963
-0.934
0.392
6.964

0.70282
2.38e-

-4.334
2.833

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.019 on 11 degrees of freedom


Multiple R-squared:
0.9999
Adjusted
R-squared:
Here
The model
is
Life.Expectency ~=45.99+-0.02986*Birth.rate +0.1338* Population.Growth +
2.975e-08* Labour.Force +-0.0001074*GDP + 0.1258*Urbanpercentage

P a g e | 52

F-statistic: 2.026e+04 on 5 and 11 DF, p-value: < 2.2e-16

Conclusion:
Here labour force is highly significant & they have positively related with
Life expectancy.
Gdp is also highly significant & have negatively related with life expectancy.
Here Multiple correlation coefficient is very high.

P a g e | 53
The histogram plot of residual shows that it is positively skewed distribution.

Modelling for gdp:


GDP ~ Labour.Force +

Residuals:
Min
-360.346

1Q
-189.430

Median
-5.045

3Q
153.007

Max
611.453

Coefficients:
Estimate

Std. Error

t value

Pr(>|t|)
(Intercept)
3.31e-07 ***

2.691e+05

2.984e+04

9.017

Labour.Force
06 ***

-1.864e-04

2.361e-05

-7.892

1.60e-

Ruralpercentage
08 ***

-2.006e+03

1.991e+02

-10.076

8.50e-

---

P a g e | 54
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 272.1 on 14 degrees of freedom
Multiple R-squared:
0.9845
Here the model is
GDP =269100 -0.0001864*Labour.Force
F-statistic: 444.8 on 2 and 14 DF, p-value: 2.145e-13

Conclusion:
Labour force & Rural percentage is highly significant with Gdp & have
negatively related with Gdp.
Residual plot of gdp shows that it is positively skewed distributed.
Multiple correlation coefficient is high.

P a g e | 55

Overall discussion: From the PCA we see that for India cumulative proportion of
the two factors is high comparing to other countries. That is for India these factors
carry more information than another factors of remaining countries. Therefore we
conclude that there may have many other factors which are not considered in this
analysis for these countries (China, Canada, France, Italy, Japan, United States &
United Kingdom).
Another thing is that the for India variance explained by factors is greater
than other countries.
From Regression Analysis we can conclude the following things for India-

P a g e | 56
From histogram plot of Life expectancy we say that Life expectancy is
increasing for India.
If birth rate is decreasing then life expectancy is increasing.
If labour force increases than gdp is also increases.
If Urban percentage is increases than gdp is also increases.

References:
Regression analysis by example (3rd edition),Samprit
chatterjee,Alis.Hadi,Bertram Price, Welly interscience.
An Introduction to Applied Multivariate Analysis By Tenko Raykov, George A.
Marcoulides.
Applied Multivariate Statistical Analysis((2rd edition), By Hrdle Simar.

P a g e | 57
Applied Multivariate Statistical Analysis By DEAN W. WICHERN & RICHARD A.
JOHNSON.

You might also like