Professional Documents
Culture Documents
PROJECT TITLE
INDIA
Department of statistics
THE UNIVERSITY OF BURDWAN
Submitted By
Roll no:-BUR ST 2009/13
Page |2
ACKNOWLEDGEMENT
Date.
Department of Statistics.
Burdwan University.
Page |3
Page |4
4. Death rate crude per 1000 people
5. Immunization dpt percentage of children ages 12-23 months
6. Life expectancy at birth total years
7. Population growth annual percentage
8. Rural population (% of total population)
9. Urban population(% of total population)
Development
Health
Population
Immunization dpt
percentage
of children
ages 12-23
months
Life expectancy at
birth total
years
Urban population(% of
total population)
Page |5
5.
METHODOLOGY
First I plot the death rate in respect of immunization for different countries for
different years.
Page |6
Page |7
Findings:
In case of china at first life expectancy will rise as immunization increase and
after that life expectancy will certainly decreases. There from the graph we
see in case of china there is presence of autocorrelation i.e. there are some
other factor which influence on the life expectancy.
To see the changes of death rate & life expectancy considering the
immunization I have plotted the Death rate crude Per 1000 people
vs. Life Expectancy for each countries. Here I have considered three
group of countries. Three groups are taken such that -------------
Page |8
In the first group of countries we see observations of China & Japan are in a
scattered shape.
In the second group we see that in India, life expectancy is increasing though
death rate per 1000 people is more or less same.
Page |9
P a g e | 10
At this point, it is helpful to recall that the goal of a PCA is to explain the variability
in a set of observed measures through as few as possible linear combinations
of them, which combinations are the principal components.
For example, suppose the following np matrix
P a g e | 11
-0.9978552
Population_Gr
owth
0.972407
75
1 0.968998 -0.1458039
43
-0.9790156
Death_Rate
0.969659
91
0.96899843
Immunization
0.018697
7
-0.1458039
1 -0.1856009
0.185600
9
-0.9741803
0.06072412
P a g e | 12
Correlations
Birth_ra
te
Life_Expectenc
y
0.997855
2
-0.9790156
0.974180
3
0.0607241
2
Labour_Force
0.994725
4
-0.9856297
0.975669
8
0.0974756
0.99833819
Ruralpercenta
ge
0.996574
8
-0.9994813
-0.9796462
0.99948133
Urbanpercenta
ge
0.996574
8
0.974315
1
0.0832093
9
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.9947254
0.9965748
-0.9965748
Population_Gr
owth
-0.9856297
0.97964615
-0.9796462
Death_Rate
-0.9756698
0.97431511
-0.9743151
0.0974756
-0.0832094
0.08320939
Life_Expectenc 0.99833819
y
-0.9994813
0.99948133
Labour_Force
-0.9990989
0.99909885
-0.9990989
-1
Urbanpercenta 0.99909885
ge
-1
Immunization
Ruralpercenta
ge
P a g e | 13
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue
1 6.929682 5.921589
36
48
0.8662
0.8662
2 1.008092 0.976501
88
52
0.1260
0.9922
3 0.031591 0.003287
36
22
0.0039
0.9962
4 0.028304 0.026736
13
30
0.0035
0.9997
5 0.001567 0.000975
83
64
0.0002
0.9999
6 0.000592 0.000422
19
94
0.0001
7 0.000169 0.000169
25
25
0.0000
Ruralpercenta 0.9982 0.03064
ge
3
0.0000
1.0000
8 0.000000
00
Factor
1.0000Pattern
Factor
1 Factor2
1.0000
Birth_rate
0.9941 0.09479
3
Population_Gr
owth
0.9871
8 0.03616
Death_Rate
0.9837
5 0.07765
Life_Expectenc
y
0.9977 0.05293
6
Urbanpercenta
ge
0.9982 0.03064
3
Labour_Force
0.9989 0.01599
5
Immunization
- 0.99340
0.1128
5
P a g e | 14
Eigenvectors
1
Birth_rate
0.3776
5
0.0944
1
Population_Gr
owth
0.3750
1
0.0360
1
Death_Rate
0.3737
0
0.0773
4
Immunization
0.0428
7
0.9894
0
Life_Expectenc
y
0.3790
3
0.0527
2
Labour_Force
0.3794
8
0.0159
2
Ruralpercenta
ge
0.3792
1
0.0305
2
Urbanpercenta
ge
0.3792
1
0.0305
2
1.0080929
P a g e | 15
-0.99823* Urbanpercentage-0.99895* Labour_Force-0.11285*
Immunization
Here we will see that first principal component explain the total 86.82%
information & with this second principal component explains the 99.22%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 69.26% information about this
variable & where as second principal component explains near about 10%
information.
For China:
P a g e | 16
Correlations
Birth
rate
Death Immuniza
Rate
tion
Life
Expectancy
-0.9900038
Population_Gr
owth
0.995684
47
1 0.055276 -0.0859809
94
-0.9938804
Death_Rate
0.111645
48
0.05527694
0.6969124
2
-0.0309305
Immunization
0.066903
3
-0.0859809 0.696912
42
0.09690177
Birth_rate
Population
Growth
Life_Expectenc
y
0.990003
8
-0.9938804
0.030930
5
0.0969017
7
Labour_Force
0.986750
2
-0.9930929
0.005373
2
0.1080104
6
0.99943335
Ruralpercenta
ge
0.977366
38
0.98775912
- -0.1717671
0.061428
3
-0.9946585
Urbanpercenta
ge
0.977366
4
-0.9877591 0.061428
35
0.1717671
1
0.99465851
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.9867502
0.97736638
-0.9773664
Population_Gr
owth
-0.9930929
0.98775912
-0.9877591
Death_Rate
-0.0053732
-0.0614283
0.06142835
Immunization
0.10801046
-0.1717671
0.17176711
Life_Expectenc 0.99943335
y
-0.9946585
0.99465851
P a g e | 17
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Labour_Force
-0.9970516
0.99705156
-0.9970516
-1
Urbanpercenta 0.99705156
ge
-1
Ruralpercenta
ge
0.7468
0.7468
2 1.708030 1.407859
81
90
0.2135
0.9603
3 0.300170 0.286016
91
27
0.0375
0.9978
4 0.014154 0.011322
63
14
0.0018
0.9996
5 0.002832 0.002138
49
68
0.0004
0.9999
6 0.000693 0.000627
81
48
0.0001
1.0000
7 0.000066 0.000066
33
33
0.0000
1.0000
8 0.000000
00
0.0000
1.0000
P a g e | 18
Eigenvectors
1
Birth_rate
Variance Explained
2 by Each Factor
Factor1
- 0.0792
0.4049
5.974051
5
8
0
Population_Gr
owth
- 0.0476
0.4073
7
0
Death_Rate
0.0015 0.7092
7
4
Immunization
0.0577 0.6959
2
0
Factor2
1.7080308
Life_Expectenc 0.4084
y
0 0.0325
6
Labour_Force
Ruralpercenta
ge
0.4085
7 0.0171
4
0.4080 0.0374
8
1
Factor Pattern
Factor Factor
1
2
Labour_Force
0.9986
1
0.0224
0
Life_Expectenc 0.9982
y
1
0.0425
5
Urbanpercenta 0.9974
ge
1
0.0488
9
Birth_rate
0.9898
4
0.1035
8
Population_Gr
owth
0.9955
1
0.0623
1
Ruralpercenta
ge
0.9974
1
0.0488
9
Death_Rate
0.0038
4
0.9269
1
Immunization
0.1410
8
0.9094
8
P a g e | 19
Here we will see that first principal component explain the total 74.68%
information & with this second principal component explains the 96.03%
information & corresponding eigen values are positive & greater than 1.
Another thing is that factor 1 explains total 59.74% information about this
variable & where as second principal component explains near about 17.08%
information.
P a g e | 20
For Canada:
Correlations
Birth_ra
te
Birth_rate
1.00000
0.65862
-0.37488
-0.53213
-0.84769
Population_Gr
owth
0.65862
1.00000
-0.25296
-0.33260
-0.34934
Death_Rate
-0.37488
-0.25296
1.00000
0.14433
0.20530
Immunization
-0.53213
-0.33260
0.14433
1.00000
0.70055
Life_Expectenc -0.84769
y
-0.34934
0.20530
0.70055
1.00000
Labour_Force
-0.78267
-0.27643
0.15269
0.71838
0.98903
Ruralpercenta
ge
0.93172
0.48064
-0.25329
-0.67610
-0.97270
Urbanpercenta -0.93172
ge
-0.48064
0.25329
0.67610
0.97270
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.78267
0.93172
-0.93172
Population_Gr
owth
-0.27643
0.48064
-0.48064
Death_Rate
0.15269
-0.25329
0.25329
Immunization
0.71838
-0.67610
0.67610
Life_Expectenc
y
0.98903
-0.97270
0.97270
Labour_Force
1.00000
-0.94495
0.94495
Ruralpercenta
ge
-0.94495
1.00000
-1.00000
P a g e | 21
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Urbanpercenta
ge
0.94495
-1.00000
1 5.576307 4.475758
97
75
0.6970
0.6970
2 1.100549 0.360242
21
71
0.1376
0.8346
3 0.740306 0.246397
50
45
0.0925
0.9271
4 0.493909 0.421098
05
09
0.0617
0.9889
5 0.072810 0.061028
96
80
0.0091
0.9980
6 0.011782 0.007448
16
01
0.0015
0.9995
7 0.004334 0.004334
15
15
0.0005
1.0000
8 0.000000
00
0.0000
1.0000
1.00000
P a g e | 22
Eigenvectors
1
Birth_rate
- 0.2118
0.3945
1
5
Population_Gr
owth
- 0.5292
0.2300
6
8
Death_Rate
0.1328
6 0.7187
9
Immunization
0.3152 0.2008
2
7
0.3964 0.2726
2
2
Ruralpercenta
ge
0.4185 0.0620
6
6
Factor Pattern
Factor Factor
1
2
Urbanpercenta 0.9883
ge
9
0.0651
1
Life_Expectenc 0.9645
y
6
0.1990
4
Labour_Force
0.9361
2
0.2860
0
Immunization
0.7443
7
0.2107
2
Birth_rate
0.9316
9
0.2222
0
Ruralpercenta
ge
0.9883
9
0.0651
1
Population_Gr
owth
Variance Explained
by Each Factor
Death_Rate
Factor1
Factor2
0.5433
3
0.5552
3
0.3137
5
0.7540
6
5.5763080 1.1005492
P a g e | 23
0.31375*
Death_Rate + 0.96456* Life_Expectency +0.98839*
Urbanpercentage+0.93612* Labour_Force+0.74437* Immunization
Here we will see that first principal component explain the total 69.70%
information & with this second principal component explains the 83.46%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 55.76% information about this
variable & where as second principal component explains near about
11.005% information.
For France:
P a g e | 24
Correlations
Birth
rate
Population
Growth
Death Immuniza
Rate
tion
Life
Expectancy
0.19045108
- -0.0502327
0.163276
6
0.10109229
Population_Gr
owth
0.190451
08
Death_Rate
0.163276
6
-0.4432427
Immunization
0.050232
7
0.47397151
0.530146
4
0.87629617
Life_Expectenc 0.101092
y
29
0.760789
0.698696
0.8762961
7
0.682066
Birth_rate
0.443242
7
0.4739715
1
0.760789
1 -0.5301464
-0.698696
Labour_Force
0.098017
8
0.82528718
0.8326272
5
0.98265336
Ruralpercenta
ge
0.104538
3
-0.9878581
Urbanpercenta 0.104538
ge
26
0.7750252
0.609769
8
0.8821888
1
0.98785807
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
0.0980178
-0.1045383
0.10453826
0.82528718
-0.7750252
0.7750252
-0.682066
0.60976975
-0.6097698
0.83262725
-0.8821888
0.88218881
Life_Expectenc 0.98265336
y
-0.9878581
0.98785807
Population_Gr
owth
Death_Rate
Immunization
P a g e | 25
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Labour_Force
-0.9819052
0.98190522
-0.9819052
-1
Urbanpercenta 0.98190522
ge
-1
Ruralpercenta
ge
1 5.829640 4.772367
27
00
0.7287
0.7287
2 1.057273 0.462448
27
07
0.1322
0.8609
3 0.594825 0.144818
21
68
0.0744
0.9352
4 0.450006 0.398119
53
54
0.0563
0.9915
5 0.051886 0.040788
99
85
0.0065
0.9980
6 0.011098 0.005828
14
54
0.0014
0.9993
7 0.005269 0.005269
60
60
0.0007
1.0000
8 0.000000
00
0.0000
1.0000
P a g e | 26
Eigenvectors
Factor Pattern
0.0538
7
0.9390
4
Population_Gr
owth
0.3306
2
0.1757
4
Death_Rate
0.2925
1
Immunization
Birth_rate
Factor Factor
1
2
Life_Expectenc
y
0.9940
2 0.0361
8
0.1343
2
Labour_Force
0.9922
5 0.0186
0
0.3590
4
0.2529
9
Urbanpercenta 0.9880
ge
7 0.0442
5
Life_Expectenc 0.4116
y
9
0.0351
8
Immunization
0.8669
0 0.2601
3
Labour_Force
0.0180
9
Population_Gr
owth
0.7982 0.1807
7
0
Death_Rate
0.7062 0.1381
5
1
Ruralpercenta
ge
- 0.0442
0.9880
5
7
Birth_rate
0.1300 0.9655
6
6
Ruralpercenta
ge
0.4109
6
0.4092
3
0.0430
4
Urbanpercenta 0.4092
ge
3
0.0430
4
P a g e | 27
Second liner combination is
Y2= 0.04425*Rural percentage +0.96556*Birth_rate
+0.18070*Population_Growth -0.13811*
Death_Rate -0.03618*
Life_Expectency -0.04425* Urbanpercentage-0.01860* Labour_Force0.26013* Immunization
Here we will see that first principal component explain the total 72.87%
information & with this second principal component explains the 86.09%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 58.2% information about this
variable & where as second principal component explains near about 10.57%
information.
For Italy:
P a g e | 28
Correlations
Birth_ra
te
Birth_rate
1.00000
0.43653
-0.35516
0.15466
-0.02767
Population_Gr
owth
0.43653
1.00000
0.08599
0.23929
0.76261
-0.35516
0.08599
1.00000
-0.06429
0.24502
0.15466
0.23929
-0.06429
1.00000
-0.00729
Life_Expectenc -0.02767
y
0.76261
0.24502
-0.00729
1.00000
Labour_Force
0.55002
0.89418
0.08796
0.19730
0.77348
Ruralpercenta
ge
-0.07121
-0.81498
-0.24408
-0.09571
-0.98421
Urbanpercenta
ge
0.07121
0.81498
0.24408
0.09571
0.98421
Death_Rate
Immunization
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
0.55002
-0.07121
0.07121
Population_Gr
owth
0.89418
-0.81498
0.81498
Death_Rate
0.08796
-0.24408
0.24408
Immunization
0.19730
-0.09571
0.09571
Life_Expectenc
y
0.77348
-0.98421
0.98421
Labour_Force
1.00000
-0.84026
0.84026
Ruralpercenta
ge
-0.84026
1.00000
-1.00000
Urbanpercenta
ge
0.84026
-1.00000
1.00000
P a g e | 29
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue
1 4.612374 2.981094
98
04
0.5765
0.5765
2 1.631280 0.697792
94
79
0.2039
0.7805
3 0.933488 0.283256
15
89
0.1167
0.8971
4 0.650231 0.514889
26
26
0.0813
0.9784
5 0.135342 0.105971
01
57
0.0169
0.9953
6 0.029370 0.021458
43
21
0.0037
7 0.007912 0.007912
22
22
Factor
0.9990 Pattern
Factor Factor
1
2
1.0000
0.0010
Urbanpercenta 0.9708
ge
4
0.0000
1.0000
8 0.000000
00
0.1807
3
Life_Expectenc 0.9339
y
5
0.2751
4
Labour_Force
0.9337
0
0.2685
7
Population_Gr
owth
0.9152
2
0.2250
7
Ruralpercenta
ge
- 0.1807
0.9708
3
4
Birth_rate
0.2705
3
0.8505
5
Immunization
0.1690
4
0.3970
3
Death_Rate
0.2094
6
0.6974
2
P a g e | 30
Eigenvectors
1
Birth_rate
0.1259
6
0.6659
4
Population_Gr
owth
0.4261
5
0.1762
2
Death_Rate
0.0975
3
0.5460
5
Immunization
0.0787
1
0.3108
6
Life_Expectenc 0.4348
y
7
0.2154
2
Labour_Force
0.4347
5
0.2102
7
Ruralpercenta
ge
0.4520
5
0.1415
0
Urbanpercenta 0.4520
ge
5
0.1415
0
combinations
Variance Explained by
Each Factor
Factor1
4.6123750
Factor2
1.6312809
P a g e | 31
+0.22507*Population_Growth -0.69742*
Death_Rate -0.27514*
Life_Expectency -0.18073* Urbanpercentage+0.26857*
Labour_Force+0.39703* Immunization
Here we will see that first principal component explain the total 57.65%
information & with this second principal component explains the 78.05%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 46.12% information about this
variable & where as second principal component explains near about 16.31%
information.
For Japan:
Correlations
Birth_ra
te
Birth_rate
1.00000
-0.90576
-0.56983
-0.88567
P a g e | 32
Correlations
Birth_ra
te
Population_Gr
owth
0.92534
1.00000
-0.80518
-0.61445
-0.83360
Death_Rate
-0.90576
-0.80518
1.00000
0.50950
0.94337
Immunization
-0.56983
-0.61445
0.50950
1.00000
0.58974
Life_Expectenc -0.88567
y
-0.83360
0.94337
0.58974
1.00000
Labour_Force
-0.12281
-0.02699
0.26533
-0.34047
0.33058
Ruralpercenta
ge
0.89328
0.79478
-0.95790
-0.47859
-0.97631
Urbanpercenta -0.89328
ge
-0.79478
0.95790
0.47859
0.97631
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.12281
0.89328
-0.89328
Population_Gr
owth
-0.02699
0.79478
-0.79478
0.26533
-0.95790
0.95790
-0.34047
-0.47859
0.47859
Life_Expectenc
y
0.33058
-0.97631
0.97631
Labour_Force
1.00000
-0.43386
0.43386
Ruralpercenta
ge
-0.43386
1.00000
-1.00000
Urbanpercenta
ge
0.43386
-1.00000
1.00000
Death_Rate
Immunization
P a g e | 33
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue
1 5.940154 4.505932
61
84
0.7425
0.7425
2 1.434221 1.085418
78
25
0.1793
0.9218
3 0.348803 0.166956
53
44
0.0436
0.9654
4 0.181847 0.126880
08
49
0.0227
0.9881
5 0.054966 0.024398
59
30
0.0069
0.9950
6 0.030568 0.021130
30
19
0.0038
0.9988
7 0.009438 0.009438
11
11
0.0012
1.0000
8 0.000000
00
0.0000
1.0000
P a g e | 34
Eigenvectors
1
Birth rate
0.3896
4
0.0988
4
Population
Growth
0.3658
8
0.1989
3
Death Rate
0.3951
4
0.0429
0
Immunization
0.2491
3
0.5528
2
Life
Expectancy
0.4027
4
0.0486
7
Labour Force
0.1130
7
0.7692
9
Rural
percentage
0.4007
1
0.1565
7
0.4007
1
0.1565
7
Urban
percentage
Factor Pattern
Factor Factor
1
2
Life_Expectenc 0.9815
y
7
0.0582
9
Urbanpercenta 0.9766
ge
2
0.1875
1
Death_Rate
0.9630
5
0.0513
8
Population_Gr
owth
- 0.2382
0.8917
3
4
Birth_rate
- 0.1183
0.9496
7
5
Ruralpercenta
ge
0.9766 0.1875
2
1
Labour_Force
Variance Explained by
Each Factor
Immunization
Factor1
Factor2
5.9401546
1.4342218
0.2755
8
0.9213
0
0.6071
8
0.6620
5
P a g e | 35
Death_Rate +0.98157* Life_Expectency +0.97662*
Urbanpercentage+0.27558*Labour_Force +0.60718* Immunization
Second liner combination is
Y2= -0.18751*Rural percentage + 0.11837*Birth_rate
+0.23823*Population_Growth +0.05138*
Death_Rate +0.05829*
Life_Expectency +0.18751* Urbanpercentage+0.92130* Labour_Force
-0.66205* Immunization
Here we will see that first principal component explain the total 74.25%
information & with this second principal component explains the 92.18%
information & corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 59.401% information about this
variable & where as second principal component explains near about 14.34%
information.
P a g e | 36
Correlations
Birth_ra
te
Birth_rate
1.00000
0.90932
0.48947
-0.84659
-0.82652
Population_Gr
owth
0.90932
1.00000
0.67836
-0.72240
-0.88435
Death_Rate
0.48947
0.67836
1.00000
-0.19973
-0.82964
-0.84659
-0.72240
-0.19973
1.00000
0.60050
Life_Expectenc -0.82652
y
-0.88435
-0.82964
0.60050
1.00000
Labour_Force
-0.90738
-0.92875
-0.74538
0.71727
0.98132
Ruralpercenta
ge
0.89601
0.94334
0.77842
-0.69884
-0.97983
Urbanpercenta -0.89601
ge
-0.94334
-0.77842
0.69884
0.97983
Immunization
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.90738
0.89601
-0.89601
Population_Gr
owth
-0.92875
0.94334
-0.94334
Death_Rate
-0.74538
0.77842
-0.77842
Immunization
0.71727
-0.69884
0.69884
Life_Expectenc
y
0.98132
-0.97983
0.97983
Labour_Force
1.00000
-0.99519
0.99519
Ruralpercenta
ge
-0.99519
1.00000
-1.00000
Urbanpercenta
ge
0.99519
-1.00000
1.00000
P a g e | 37
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue
Differen Proporti
ce
on
Cumulati
ve
1 6.791708 5.857499
64
25
0.8490
0.8490
2 0.934209 0.811389
39
13
0.1168
0.9657
3 0.122820 0.027552
26
65
0.0154
0.9811
4 0.095267 0.049057
61
50
0.0119
0.9930
5 0.046210 0.038040
12
42
0.0058
0.9988
6 0.008169 0.006555
70
41
0.0010
0.9998
7 0.001614 0.001614
28
28
0.0002
1.0000
8 0.000000
00
0.0000
1.0000
P a g e | 38
Eigenvectors
Factor Pattern
1
Birth_rate
0.35462
Factor
1
Urbanpercenta
ge
0.9957
1
Labour_Force
0.9930
8
Life_Expectenc
y
0.9677
6
Life_Expectenc 0.37135
y
Immunization
0.7439
5
Labour_Force
0.38106
Death_Rate
Ruralpercenta
ge
0.38207
0.7499
5
Birth_rate
0.9241
7
Population_Gr
owth
0.9571
1
Ruralpercenta
ge
0.9957
1
Population_Gr
owth
0.36726
Death_Rate
0.28777
Immunization
0.28547
Urbanpercenta 0.38207
ge
P a g e | 39
Y1 = -0.99571*Rural percentage -0.92417*Birth_rate
-0.95711*Population_Growth -0.74995*
Death_Rate +0.96776*
Life_Expectency +0.99571* Urbanpercentage+0.99308* Labour_Force +
0.74395* Immunization
Here we will see that first principal component explain the total 84.90% information
& eigen value is positive & greater then 1.
Another thing is that factor 1 explains total 67.91% information about this
variables.
1.00000
-0.22428
0.45618
-0.17259
-0.52383
-0.22428
1.00000
-0.93436
-0.39017
0.92604
0.45618
-0.93436
1.00000
0.19475
-0.98137
-0.17259
-0.39017
0.19475
1.00000
-0.16345
P a g e | 40
Correlations
Birth_ra
te
Life_Expectenc -0.52383
y
0.92604
-0.98137
-0.16345
1.00000
Labour_Force
-0.28063
0.98852
-0.93065
-0.40645
0.93326
Ruralpercenta
ge
0.63046
-0.87983
0.95484
0.13343
-0.98481
Urbanpercenta -0.63046
ge
0.87983
-0.95484
-0.13343
0.98481
Correlations
Labour_Fo Ruralpercen Urbanpercen
rce
tage
tage
Birth_rate
-0.28063
0.63046
-0.63046
0.98852
-0.87983
0.87983
Death_Rate
-0.93065
0.95484
-0.95484
Immunization
-0.40645
0.13343
-0.13343
Life_Expectenc
y
0.93326
-0.98481
0.98481
Labour_Force
1.00000
-0.90068
0.90068
Ruralpercenta
ge
-0.90068
1.00000
-1.00000
Urbanpercenta
ge
0.90068
-1.00000
1.00000
Population_Gr
owth
1 6.022322 4.680536
19
49
0.7528
0.7528
P a g e | 41
Eigenvalues of the Correlation Matrix:
Total = 8 Average = 1
Eigenval
ue
2 1.341785 0.775681
70
10
0.1677
0.9205
3 0.566104 0.523620
60
39
0.0708
0.9913
4 0.042484 0.028413
21
58
0.0053
0.9966
5 0.014070 0.006424
63
12
0.0018
0.9983
6 0.007646 0.002060
51
35
0.0010
0.9993
Factor Pattern
Factor Factor
1.00001
2
7 0.005586 0.005586
0.0007
16
16
Life_Expectenc 0.9917
8 0.000000
0.0000
1.00008
y
00
Urbanpercenta 0.9856
ge
0
0.0623
9
0.1537
6
0.9543
8
0.2587
9
Population_Gr
owth
0.9413
3
0.2827
1
Death_Rate
0.9782
3
0.0001
7
Ruralpercenta
ge
0.9856
0
0.1537
6
Birth_rate
0.5263
5
0.6810
2
Immunization
0.2548
6
0.8245
8
P a g e | 42
Eigenvectors
1
Birth_rate
- 0.5879
0.2144
2
8
Population_Gr
owth
0.3835 0.2440
8
6
Death_Rate
0.3986 0.0001
2
5
Immunization
0.1038 0.7118
5
5
Life_Expectenc 0.4041
y
4 0.0538
6
Labour_Force
0.3889 0.2234
0
1
Ruralpercenta
ge
- 0.1327
0.4016
4
2
Urbanpercenta 0.4016
ge
2 0.1327
4
P a g e | 43
+0.98560* Urbanpercentage+0.95438* Labour_Force-0.25486*
Immunization
Second liner combination is
Y2 = 0.15376*Rural percentage + 0.68102*Birth_rate
+0.28271*Population_Growth -0.00017*
Death_Rate -0.06239*
Life_Expectency -0.15376* Urbanpercentage+0.25879* Labour_Force0.82458* Immunization
Here we will see that first principal component explain the total 75.28% information
& with this second principal component explains the 92.05% information &
corresponding eigen values are positive & greater then 1.
Another thing is that factor 1 explains total 60.22% information about this variable
& where as second principal component explains near about 13.41% information.
Regression Analysis:
P a g e | 44
For India:
P a g e | 45
; i=1(1) n
1Q
Median
3Q
Max
0.019070 0.033363
Coefficients:
Estimate
Std. Error
(Intercept)
***
4.211e+01
5.770e+00
Birth.rate
-1.807e-01
4.682e-02
Population.Growth
Labour.Force
GDP
**
Urbanpercentage
5.017e-01
2.946e-08
-2.566e-04
4.664e-01
4.650e-01
1.521e-08
8.102e-05
3.790e-01
t value
7.299
-3.860
1.079
1.937
-3.167
1.231
Pr(>|t|)
1.54e-05
0.00266 **
0.30367
0.07883 .
0.00897
0.24412
P a g e | 46
Multiple R-squared:
0.9997
Adjusted R-squared:
F-statistic: 8226 on 5 and 11 DF, p-value: < 2.2e-16
Therefore the model can be written as
P a g e | 47
Conclusion:
Here Birth Rate& GDP is highly significant & they have negatively related with
Life expectancy.
Labour force may significant & have positively related with life expectancy.
Here Multiple correlation coefficient is very high.
The histogram plot of residual shows that it is negatively skewed distribution.
1Q
Median
-84.23
18.28
3Q
68.08
Max
277.49
Coefficients:
(Intercept)
Labour.Force
Estimate
Std. Error
-1.339e+05
6.380e+04
-2.099
0.05445 .
7.610e-05
2.308e-05
3.297
0.00529 **
Ruralpercentage 1.516e+03
t value
7.586e+02
1.998
Pr(>|t|)
0.06555 .
P a g e | 48
Conclusion:
Labour force is highly significant with Gdp & have positively related with Gdp.
Rural percentage may have significant effect on gdp & it is also positively
related with gdp.
Residual plot of gdp shows that it is normally distributed.
Multiple correlation coefficient is high.
P a g e | 49
For China: For understand the relationship between the variable we consider
the matrix plot, which is the diagram between the each variable under
consideration individually between two variables for each possible combination.
P a g e | 50
P a g e | 51
Residuals:
Min
1Q
-0.030548 -0.012751
Median
0.001812
3Q
0.009514
Max
0.028484
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
3.20e-11 ***
4.599e+01
1.772e+00
Birth.rate
0.37024
-2.986e-02
3.196e-02
Population.Growth
1.338e-01
Labour.Force
05 ***
GDP
0.00119 **
Urbanpercentage
0.01627 *
2.975e-08
-1.074e-04
1.258e-01
3.418e-01
4.272e-09
2.479e-05
4.441e-02
25.963
-0.934
0.392
6.964
0.70282
2.38e-
-4.334
2.833
P a g e | 52
Conclusion:
Here labour force is highly significant & they have positively related with
Life expectancy.
Gdp is also highly significant & have negatively related with life expectancy.
Here Multiple correlation coefficient is very high.
P a g e | 53
The histogram plot of residual shows that it is positively skewed distribution.
Residuals:
Min
-360.346
1Q
-189.430
Median
-5.045
3Q
153.007
Max
611.453
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
3.31e-07 ***
2.691e+05
2.984e+04
9.017
Labour.Force
06 ***
-1.864e-04
2.361e-05
-7.892
1.60e-
Ruralpercentage
08 ***
-2.006e+03
1.991e+02
-10.076
8.50e-
---
P a g e | 54
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 272.1 on 14 degrees of freedom
Multiple R-squared:
0.9845
Here the model is
GDP =269100 -0.0001864*Labour.Force
F-statistic: 444.8 on 2 and 14 DF, p-value: 2.145e-13
Conclusion:
Labour force & Rural percentage is highly significant with Gdp & have
negatively related with Gdp.
Residual plot of gdp shows that it is positively skewed distributed.
Multiple correlation coefficient is high.
P a g e | 55
Overall discussion: From the PCA we see that for India cumulative proportion of
the two factors is high comparing to other countries. That is for India these factors
carry more information than another factors of remaining countries. Therefore we
conclude that there may have many other factors which are not considered in this
analysis for these countries (China, Canada, France, Italy, Japan, United States &
United Kingdom).
Another thing is that the for India variance explained by factors is greater
than other countries.
From Regression Analysis we can conclude the following things for India-
P a g e | 56
From histogram plot of Life expectancy we say that Life expectancy is
increasing for India.
If birth rate is decreasing then life expectancy is increasing.
If labour force increases than gdp is also increases.
If Urban percentage is increases than gdp is also increases.
References:
Regression analysis by example (3rd edition),Samprit
chatterjee,Alis.Hadi,Bertram Price, Welly interscience.
An Introduction to Applied Multivariate Analysis By Tenko Raykov, George A.
Marcoulides.
Applied Multivariate Statistical Analysis((2rd edition), By Hrdle Simar.
P a g e | 57
Applied Multivariate Statistical Analysis By DEAN W. WICHERN & RICHARD A.
JOHNSON.