PCA Analisis de Componentes Principales

STHDA
S t at i s t i c al t o o ls fo r h i gh t h ro u gh p u t d at a an aly s i s
HOME BOOKS R/STATISTICS STAT SOFTWARES CONTACT
Search
Connect
Home / Easy Guides / R software / Factor analysis / FactoMineR and factoextra : Principal Component Analysis Actions menu for module Wiki
Visualization R software and data mining
FactoMineR and factoextra : Principal Component Analysis Visualization R software and data mining
Adsby Google Analysis Plot SPSSStatistics
Tools
Install and load FactoMineR package

Install and load factoextra for visualization
Prepare the data
Exploratory data analysis
Descriptive statistics
Correlation matrix
Principal component analysis
Variances of the principal components
Graph of individus and variables
Variables factor map : The correlation circle
Coordinates of variables on the principal components
Cos2 : quality of variables on the factor map
Contributions of the variables to the principal components
Graph of variables using FactoMineR base graph
Graph of variables using factoextra
Graph of individuals
Coordinates of individuals on the principal components
Cos2 : quality of representation of individuals on the principal components
Contribition of individuals to the princial components
Graph of individuals using FactoMineR base graph
Graph of individuals using factoextra
Change the color of individuals by groups
Principal component analysis using supplementary individuals and variables
Visualize supplementary quantitative variables
Visualize supplementary individuals
Supplementary qualitative variables
Dimension description
Infos
Principal component analysis (PCA) allows us to summarize the variations (informations) in a data set described by multiple variables. Each
variable could be considered as a different dimension. If you have more than 3 variables in your data sets, it could be very difficult to visualize a
multidimensional hyperspace.
The goal of principal component analysis is to transform the initial variables into a new set of variables which explain the variation in the data.
These new variables corresponds to a linear combination of the originals and are called principal components.
PCA reduces the dimensionality of multivariate data, to two or three that can be visualized graphically with minimal loss of information.
Several functions from different packages are available in R for performing PCA : prcomp and princomp (builtin R stats package), PCA (FactoMineR
package), dudi.pca(ade4 package).
This R tutorial describes :
1. How to perform a principal component analysis using R software and FactoMineR package
2. How to visualize the output of the PCA using the R package factoextra
Install and load FactoMineR package

FactoMineR (Husson et al.) is one of the most powerful R packages and my favorite one for performing a multivariate exploratory data analysis. A
rich documentation is available on the FactoMineR official website (http://factominer.free.fr/index.html) and on youtube. Many thanks to
Franois Husson for this effort
FactoMineR can be installed and loaded as follow :
install.packages("FactoMineR")
library("FactoMineR")
Install and load factoextra for visualization
The package factoextra has flexible methods for the classes PCA, prcomp, princomp and dudi in order to extract and visualize quickly the
results of the analysis. The ggplot2 plotting system is used for the data visualization.
Install and load factoextra as follow :
library("devtools")
install_github("kassambara/factoextra")
Load it :
library("factoextra")
Prepare the data

Well use the data sets decathlon2 from the package factoextra :
data(decathlon2)
head(decathlon2[,1:6])
X100mLong.jumpShot.putHigh.jumpX400mX110m.hurdle
SEBRLE11.047.5814.832.0749.8114.69
CLAY10.767.4014.261.8649.3714.05
BERNARD11.027.2314.251.9248.9314.99
YURKOV11.347.0915.192.1050.4215.31
ZSIVOCZKY11.137.3013.482.0148.6214.17
McMULLEN10.837.3113.762.1349.9114.38
This data is just a subset of the decathlon data in FactoMineR package

As illustrated below, the data used here describes athletes performance during two sporting events (Desctar and OlympicG). It contains 27
individuals (athletes) described by 13 variables :
Only some of these individuals and variables will be used to perform the principal component analysis (PCA).
The coordinates of the remaining individuals and variables on the factor map will be predicted after the PCA.
In PCA terminology, our data contains :
Active individuals (in blue, rows 1:23) : Individuals that are used during the principal component analysis.
Supplementary individuals (in green, rows 24:27) : The coordinates of these individuals will be predicted using the PCA informations and
parameters obtained with active individuals/variables
Active variables (in pink, columns 1:10) : Variables that are used for the principal component analysis.
Supplementary variables : As supplementary individuals, the coordinates of these variables will be predicted also.
Supplementary continuous variables : Columns 11 and 12 corresponding respectively to the rank and the points of athletes.
Supplementary qualitative variables : Column 13 corresponding to the two athletetic meetings (2004 Olympic Game or 2004 Decastar).
This factor variables will be used to color individuals by groups.
Extract only active individuals and variables for principal component analysis:
decathlon2.active<decathlon2[1:23,1:10]
head(decathlon2.active[,1:6])
SEBRLE11.047.5814.832.0749.8114.69
CLAY10.767.4014.261.8649.3714.05
BERNARD11.027.2314.251.9248.9314.99
YURKOV11.347.0915.192.1050.4215.31
ZSIVOCZKY11.137.3013.482.0148.6214.17
McMULLEN10.837.3113.762.1349.9114.38
Exploratory data analysis

Before principal component analysis, we can perform some exploratory data analysis such as descriptive statistics, correlation matrix and scatter
plot matrix.
Descriptive statistics
decathlon2.active_stats<data.frame(
Min=apply(decathlon2.active,2,min),#minimum
Q1=apply(decathlon2.active,2,quantile,1/4),#Firstquartile
Med=apply(decathlon2.active,2,median),#median
Mean=apply(decathlon2.active,2,mean),#mean
Q3=apply(decathlon2.active,2,quantile,3/4),#Thirdquartile
Max=apply(decathlon2.active,2,max)#Maximum
)
decathlon2.active_stats<round(decathlon2.active_stats,1)
head(decathlon2.active_stats)
MinQ1MedMeanQ3Max
X100m10.410.811.011.011.211.6
Long.jump6.87.27.37.37.58.0
Shot.put12.714.214.714.615.116.4
High.jump1.91.92.02.02.12.1
X400m46.849.049.449.450.051.2
X110m.hurdle14.014.214.414.514.915.7
Note that, you can also use the builtin R function summary() for the descriptive statistics but I dont like the format of the output on
data frame.
Correlation matrix
The correlation between variables can be calculated as follow :
cor.mat<round(cor(decathlon2.active),2)
head(cor.mat[,1:6])
X100m1.000.760.450.400.590.73
Long.jump0.761.000.440.340.510.59
Shot.put0.450.441.000.530.310.38
High.jump0.400.340.531.000.370.25
X400m0.590.510.310.371.000.58
X110m.hurdle0.730.590.380.250.581.00
Visualize the correlation matrix using a correlogram : the package corrplot is required.
#install.packages("corrplot")
library("corrplot")
corrplot(cor.mat,type="upper",order="hclust",
tl.col="black",tl.srt=45)
Read more about visualizing correlation matrix : Correlation matrix visualization
Make a scatter plot matrix showing the correlation coefficients between variables and the significance levels : the package PerformanceAnalytics
is required.
#install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
chart.Correlation(decathlon2.active[,1:6],histogram=TRUE,pch=19)
You can read more about this plot here : Correlation matrix visualization
Principal component analysis

The function PCA() [in FactoMiner package] can be used. A simplified format is :
PCA(X,scale.unit=TRUE,ncp=5,graph=TRUE)
X : a data frame. Rows are individuals and columns are numeric variables
scale.unit : a logical value. If TRUE, the data are scaled to unit variance before the analysis. This standardization to the same scale avoids
some variables to become dominant just because of their large measurement units.
ncp : number of dimensions kept in the final results.
graph : a logical value. If TRUE a graph is displayed.
In the R code below, the PCA is performed only on the active individuals/variables :
library("FactoMineR")
res.pca<PCA(decathlon2.active,graph=FALSE)
The output of the function PCA() is a list including :
print(res.pca)
**ResultsforthePrincipalComponentAnalysis(PCA)**
Theanalysiswasperformedon23individuals,describedby10variables
*Theresultsareavailableinthefollowingobjects:
namedescription
1"$eig""eigenvalues"
2"$var""resultsforthevariables"
3"$var$coord""coord.forthevariables"
4"$var$cor""correlationsvariablesdimensions"
5"$var$cos2""cos2forthevariables"
6"$var$contrib""contributionsofthevariables"
7"$ind""resultsfortheindividuals"
8"$ind$coord""coord.fortheindividuals"
9"$ind$cos2""cos2fortheindividuals"
10"$ind$contrib""contributionsoftheindividuals"
11"$call""summarystatistics"
12"$call$centre""meanofthevariables"
13"$call$ecart.type""standarderrorofthevariables"
14"$call$row.w""weightsfortheindividuals"
15"$call$col.w""weightsforthevariables"
The object that is created using the function PCA() contains many informations found in many different lists and matrices. These values
are described in the next section.
Variances of the principal components
The proportion of variances retained by the principal components can be extracted as follow :
eigenvalues<res.pca$eig
head(eigenvalues[,1:2])
eigenvaluepercentageofvariance
comp14.124213341.242133
comp21.838530918.385309
comp31.239140312.391403
comp40.81944028.194402
comp50.70155287.015528
comp60.42288284.228828
Eigenvalues correspond to the amount of the variation explained by each principal component (PC). Eigenvalues are large for the first
PC and small for the subsequent PCs.
A PC with an eigenvalue > 1 indicates that the PC accounts for more variance than accounted by one of the original variables in
standardized data. This is commonly used as a cutoff point to determine the number of PCs to retain.
Make a scree plot using base graphics : A scree plot is a graph of the eigenvalues/variances associated with components.
barplot(eigenvalues[,2],names.arg=1:nrow(eigenvalues),
main="Variances",
xlab="PrincipalComponents",
ylab="Percentageofvariances",
col="steelblue")
#Addconnectedlinesegmentstotheplot
lines(x=1:nrow(eigenvalues),eigenvalues[,2],
type="b",pch=19,col="red")
~60% of the informations (variances) contained in the data are retained by the first two principal components.
Make the scree plot using the package factoextra :
fviz_screeplot(res.pca,ncp=10)
Graph of individus and variables
The function plot.PCA() can be used. A simplified format is :
plot.PCA(x,axes=c(1,2),choix=c("ind","var"))
x : An object of class PCA

axes : A numeric vector of length 2 specifying the component to plot
choix : The graph to be plotted. Possible values are ind for the individuals and var for the variables
Variables factor map : The correlation circle
Coordinates of variables on the principal components
head(res.pca$var$coord)
Dim.1Dim.2Dim.3Dim.4Dim.5
X100m0.85062570.179398060.30155640.033573200.1944440
Long.jump0.79418060.280856950.19054650.115389560.2331567
Shot.put0.73391270.085404120.51759780.128468370.2488129
High.jump0.61008400.465214150.33008520.144550120.4027002
X400m0.70160340.290178260.28353290.430825520.1039085
X110m.hurdle0.76412520.024740810.44888730.016895890.2242200
Cos2 : quality of variables on the factor map
The quality of representation of the variables of the principal components are called the cos2.
head(res.pca$var$cos2)
X100m0.72356410.03218366410.090936280.00112715970.03780845
Long.jump0.63072290.07888062850.036307980.01331475060.05436203
Shot.put0.53862790.00729386360.267907490.01650412110.06190783
High.jump0.37220250.21642420700.108956220.02089473750.16216747
X400m0.49224730.08420342090.080390910.18561062690.01079698
X110m.hurdle0.58388730.00061210770.201499840.00028547120.05027463
Contributions of the variables to the principal components
Variable contributions in the determination of a given principal component are (in percentage) : (var.cos2 * 100) / (total cos2 of the
component)
head(res.pca$var$contrib)
X100m17.5442931.75050987.3386590.137552405.389252
Long.jump15.2931684.29041622.9300941.624859367.748815
Shot.put13.0601370.396722421.6204322.014072698.824401
High.jump9.02481111.77158388.7928882.5498795123.115504
X400m11.9355444.57992966.48763622.650905991.539012
X110m.hurdle14.1575440.033293316.2612610.034837357.166193
Graph of variables using FactoMineR base graph
plot(res.pca,choix="var")
Graph of variables using factoextra

The function fviz_pca_var() is used to visualize variables :
#Defaultplot
fviz_pca_var(res.pca)
#Changecolorandtheme
fviz_pca_var(res.pca,col.var="steelblue")+
theme_minimal()
Note that, using factoextra package, the color or the transparency of variables can be automatically controlled by the value of their
contributions, their cos2, their coordinates on x or y axis.
#Controlvariablecolorsusingtheircontribution
#Possiblevaluesfortheargumentcol.varare:
#"cos2","contrib","coord","x","y"
fviz_pca_var(res.pca,col.var="contrib")
#Changethegradientcolor
fviz_pca_var(res.pca,col.var="contrib")+
scale_color_gradient2(low="white",mid="blue",
high="red",midpoint=55)+theme_bw()
This is helpful to highlight the most important variables in the determination of the principal components.
Its also possible to control automatically the transparency of variables by their contributions :
#Controlthetransparencyofvariablesusingtheircontribution
#Possiblevaluesfortheargumentalpha.varare:
#"cos2","contrib","coord","x","y"
fviz_pca_var(res.pca,alpha.var="contrib")+
theme_minimal()
Read more about ggplot2 and colors here : ggplot2 colors How to change colors automatically and manually?
Graph of individuals
Coordinates of individuals on the principal components
head(res.pca$ind$coord)
SEBRLE0.19550471.58905670.64249120.083896521.16829387
CLAY0.80787952.47481371.38738271.298382320.82498206
BERNARD1.35913401.64809500.20055841.964094200.08419345
YURKOV0.88895320.44260672.52958430.712908370.40782264
ZSIVOCZKY0.10812162.06883771.33425910.101527960.20145217
McMULLEN0.12121951.01391020.86251701.341642911.62151286
Cos2 : quality of representation of individuals on the principal components
head(res.pca$ind$cos2)
SEBRLE0.0075301790.497473230.0813252320.0013866880.2689026575
CLAY0.0487012490.457016600.1436281170.1257917410.0507850580
BERNARD0.1971998040.289965550.0042940150.4118191830.0007567259
YURKOV0.0961098000.023825710.7782303220.0618126370.0202279796
ZSIVOCZKY0.0015743850.576419440.2397541520.0013882160.0054654972
McMULLEN0.0021754370.152194990.1101378720.2664865300.3892621478
Contribition of individuals to the princial components
head(res.pca$ind$contrib)
SEBRLE0.040294475.97145331.44839190.037345898.45894063
CLAY0.6880566414.48392486.75373818.944582834.21794385
BERNARD1.947401836.42341070.141134520.468194330.04393073
YURKOV0.833084150.463273322.45173962.696636051.03075263
ZSIVOCZKY0.0123241310.12171436.24643250.054692300.25151025
McMULLEN0.015490892.43108542.61027949.5505588816.29493304
Graph of individuals using FactoMineR base graph
plot(res.pca,choix="ind")
Graph of individuals using factoextra
The function fviz_pca_ind() is used to visualize individuals :
fviz_pca_ind(res.pca)
Remove the points from the graph, use texts only :
fviz_pca_ind(res.pca,geom="text")
Note that, allowed values for the argument geom are :

point to show only points (dots)
text to show only labels
c(point, text) to show both types
Control automatically the color of individuals using the cos2 values (the quality of the individuals on the factor map) :
fviz_pca_ind(res.pca,col.ind="cos2")+
high="red",midpoint=0.50)
Read more about ggplot2 and colors here : ggplot2 colors How to change colors automatically and manually?
Change the theme :
fviz_pca_ind(res.pca,col.ind="cos2")+
high="red",midpoint=0.50)+
theme_minimal()
Read more about ggplot2 themes here : ggplot2 themes and background colors
Make a biplot of individuals and variables :
fviz_pca_biplot(res.pca,geom="text")
Change the color of individuals by groups
We will use iris data sets in this section :
data(iris)
head(iris)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
#ThevariableSpecies(index=5)isremoved
#beforePCAanalysis
iris.pca<PCA(iris[,5],graph=FALSE)
Individuals factor map :
#Defaultplot
fviz_pca_ind(iris.pca,label="none")
Change individual colors by groups :
fviz_pca_ind(iris.pca,label="none",habillage=iris$Species)
Add ellipses of point concentrations : the argument habillage is used to specify the factor variable for coloring the observations by groups.
fviz_pca_ind(iris.pca,label="none",habillage=iris$Species,
addEllipses=TRUE,ellipse.level=0.95)
Now, lets :
make a biplot of individuals and variables
change the color of individuals by groups
change the transparency of variable colors by their contribution values
show only the labels for variables
fviz_pca_biplot(iris.pca,
habillage=iris$Species,addEllipses=TRUE,
col.var="red",alpha.var="cos2",
label="var")+
scale_color_brewer(palette="Dark2")+
theme_minimal()
Principal component analysis using supplementary individuals and variables
Asqualitative
described above, the data sets decathlon2 contain supplementary continuous variables (quanti.sup, columns 11:12), supplementary
variables (quali.sup, column 13) and supplementary individuals (ind.sup, rows 24:27)
Supplementary variables and individuals are not used for the determination of the principal components. Their coordinates are predicted using
only the informations provided by the performed principal component analysis on active variables/individuals.
To specify supplementary individuals and variables, the function PCA() can be used as follow :
PCA(X,scale.unit=TRUE,ncp=5,ind.sup=NULL,
quanti.sup=NULL,quali.sup=NULL,graph=TRUE,axes=c(1,2))
X : a data frame. Rows are individuals and columns are numeric variables.
scale.unit : a logical value. If TRUE, the data are scaled to unit variance before the analysis.
ncp : number of dimensions kept in the final results.
ind.sup : a numeric vector specifying the indexes of the supplementary individuals
quanti.sup, quali.sup : a numeric vector specifying, respectively, the indexes of the quantitative and qualitative variables
graph : a logical value. If TRUE a graph is displayed.
axes : a vector of length 2 specifying the components to be plotted
Example of usage :
res.pca<PCA(decathlon2,ind.sup=24:27,
quanti.sup=11:12,quali.sup=13,graph=FALSE)
Visualize supplementary quantitative variables
All the results (coordinates, correlation and cos2) for the supplementary quantitative variables can be extracted as follow :
res.pca$quanti.sup
$coord
Rank0.70147770.245194430.18342940.055751860.07382647
Points0.96370750.077682620.15802250.166230920.03114711
$cor
Rank0.70147770.245194430.18342940.055751860.07382647
Points0.96370750.077682620.15802250.166230920.03114711
$cos2
Rank0.49207100.0601203100.033646350.003108270.0054503477
Points0.92873220.0060345890.024971100.027632720.0009701427
Variables factor map using FactoMineR base graph :
plot(res.pca,choix="var")
Supplementary quantitative variables are shown in blue color and dashed lines.
Its also possible to make the variables factor map using factoextra :
fviz_pca_var(res.pca)
Visualize supplementary individuals
The data sets decathlon2 contain some supplementary individuals from row 24 to 27.
#Dataforthesupplementaryindividuals
ind.sup<decathlon2[24:27,1:10]
ind.sup[,1:6]
KARPOV11.027.3014.772.0448.3714.09
WARNERS11.117.6014.311.9848.6814.23
Nool10.807.5314.261.8848.8114.80
Drews10.877.3813.071.8848.5114.01
Individuals factor map using FactoMineR base graph :
plot(res.pca,choix="ind")
Supplementary individuals are shown in blue. The levels of the supplementary qualitative variable are shown in magnenta color.
The results for supplementary individuals can be extracted as follow :
res.pca$ind.sup
$coord
KARPOV0.79472060.779512271.63302031.72422830.75070396
WARNERS0.38646450.121592371.73873320.70633410.03230011
Nool0.55913061.977488710.48303582.27845260.25461493
Drews1.10920380.017414773.04881821.53434680.32642192
$cos2
KARPOV0.051046774.911173e020.215537300.240286200.0455487744
WARNERS0.024227072.398250e030.490396770.080928620.0001692349
Nool0.028971493.623868e010.021622360.481087800.0060077529
Drews0.092070942.269527e050.695605470.176176090.0079736753
$dist
KARPOVWARNERSNoolDrews
3.5174702.4828993.2849433.655527
Supplementary qualitative variables
The data sets decathlon2 contain a supplementary qualitative variable at columns 13 corresponding to the type of competitions.
Qualitative variable can be helpful for interpreting the data and for coloring individuals by groups.
The argument habillage is used to specify the index of the supplementary qualitative variable :
plot(res.pca,choix="ind",habillage=13)
Its also possible to use factoextra :
fviz_pca_ind(res.pca,habillage=13,
addEllipses=TRUE,ellipse.level=0.68)+
scale_color_brewer(palette="Dark2")+
theme_minimal()
Supplementary individuals are shown in blue color

The results concerning the supplementary qualitative variable are :
res.pca$quali
$coord
Decastar1.3434510.12180970.037895240.18083570.1343364
OlympicG1.2314970.11165890.034737300.16576610.1231417
$cos2
Decastar0.90512330.0074409390.00072016690.016399560.009050062
OlympicG0.90512330.0074409390.00072016690.016399560.009050062
$v.test
Decastar2.9707660.40342560.15287670.89710360.7202457
OlympicG2.9707660.40342560.15287670.89710360.7202457
$dist
DecastarOlympicG
1.4121081.294433
$eta2
Dim1Dim2Dim3Dim4Dim5
Competition0.40115680.007397830.0010623320.036581590.02357972
Dimension description
The function dimdesc() can be used to identify the most correlated variables with a given principal component.
A simplified format is :
dimdesc(res,axes=1:3,proba=0.05)
res : an object of class PCA

axes : a numeric vector specifying the dimensions to be described
prob : the significance level
Example of usage :
res.desc<dimdesc(res.pca,axes=c(1,2))
#Descriptionofdimension1
res.desc$Dim.1
$quanti
correlationp.value
Points0.96370751.605675e13
Long.jump0.79418066.059893e06
Discus0.74320904.842563e05
Shot.put0.73391276.723102e05
High.jump0.61008401.993677e03
Javeline0.42822664.149192e02
Rank0.70147771.917657e04
X400m0.70160341.910387e04
X110m.hurdle0.76412522.195812e05
X100m0.85062572.727129e07
$quali
R2p.value
Competition0.40115680.001177378
$category
Estimatep.value
OlympicG1.2874740.001177378
Decastar1.2874740.001177378
#Descriptionofdimension2
res.desc$Dim.2
$quanti
correlationp.value
Pole.vault0.80745113.205016e06
X1500m0.78448029.384747e06
High.jump0.46521422.529390e02
Infos
This analysis has been performed using R software (ver. 3.1.2) and ggplot2 (ver. )
Adsby Google PCA LineGraphChart ScatterPlotData
Share 15 Like Share 15 Tweet Share 0 40 Share 28

WanttoLearnMoreonRProgrammingandDataScience?

FollowusbyEmail
Subscribe
byFeedBurner
OnSocialNetworks:
onSocialNetworks
Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (topright position of this page)
Sign up as a member and post news and articles on STHDA web site.
Suggestions
Principal component analysis in R : prcomp() vs. princomp() R software and data mining
Correspondence Analysis in R: The Ultimate Guide for the Analysis, the Visualization and the Interpretation R software and data mining
Principal Component Analysis: How to reveal the most important variables in your data? R software and data mining
Multiple Correspondence Analysis Essentials: Interpretation and application to investigate the associations between categories of multiple
qualitative variables R software and data mining
Principal component analysis : the basics you should read R software and data mining
ade4 and factoextra : Principal Component Analysis R software and data mining
Correspondence analysis basics R software and data mining
Factor analysis
ca package and factoextra : Correspondence Analysis R software and data mining
ade4 and factoextra : Correspondence Analysis R software and data mining
MASS package and factoextra : Correspondence Analysis R software and data mining
This page has been seen 42681 times
License
(Click on the image below)
Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email
Subscribe
by FeedBurner
on Social Networks
R Basi cs
Impo rt i ng D at a
Ex po rt i ng D at a
Reshapi ng D at a
D at a Mani pul at i o n
D at a Vi sual i zat i o n
Basi c St at i st i cs
Cl ust er A nal ysi s
Survi val A nal ysi s
Adsby Google
Correlation
Graphs
R Packages
R packages developed by STHDA for easier data analyses and visualization: factoextra, survminer and ggpubr.
Learn more >>
Fo rum
Co nt act

PCA Analisis de Componentes Principales

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCA Analisis de Componentes Principales

Uploaded by

Copyright:

Available Formats

STHDA

HOME BOOKS R/STATISTICS STAT SOFTWARES CONTACT

Install and load FactoMineR package

Install and load FactoMineR package

Install and load factoextra for visualization

Install and load factoextra as follow :

Prepare the data

This data is just a subset of the decathlon data in FactoMineR package

In PCA terminology, our data contains :

Exploratory data analysis

The correlation between variables can be calculated as follow :

Principal component analysis

The output of the function PCA() is a list including :

Variances of the principal components

x : An object of class PCA

Variables factor map : The correlation circle

Coordinates of variables on the principal components

Cos2 : quality of variables on the factor map

Contributions of the variables to the principal components

Graph of variables using FactoMineR base graph

Graph of variables using factoextra

Coordinates of individuals on the principal components

Cos2 : quality of representation of individuals on the principal components

Graph of individuals using FactoMineR base graph

Graph of individuals using factoextra

The function fviz_pca_ind() is used to visualize individuals :

Note that, allowed values for the argument geom are :

Change the color of individuals by groups

We will use iris data sets in this section :

Individuals factor map :

Principal component analysis using supplementary individuals and variables

Visualize supplementary quantitative variables

Variables factor map using FactoMineR base graph :

Visualize supplementary individuals

Individuals factor map using FactoMineR base graph :

Supplementary qualitative variables

Its also possible to use factoextra :

Supplementary individuals are shown in blue color

res : an object of class PCA

Share 15 Like Share 15 Tweet Share 0 40 Share 28

This page has been seen 42681 times

Cl ust er A nal ysi s

Survi val A nal ysi s

Learn more >>

You might also like