Professional Documents
Culture Documents
SPSS tutorial #2
The dataset stored under cpu_problem.xls (posted under the course documents for week 3)
contains 8 attributes (6 predictive attributes, 2 non-predictive) used to predict the relative CPU
performance (the ninth attribute in the dataset). The description of the attributes is as follows:
v1. vendor name: 30
(adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec,
dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson,
microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry,
sratus, wang)
v2. Model Name: many unique symbols
v3. MYCT: machine cycle time in nanoseconds (integer)
v4. MMIN: minimum main memory in kilobytes (integer)
v5. MMAX: maximum main memory in kilobytes (integer)
v6. CACH: cache memory in kilobytes (integer)
v7. CHMIN: minimum channels in units (integer)
v8. CHMAX: maximum channels in units (integer)
v9. PRP: published relative performance (integer)
a) Import the Excel file in SPSS and make sure that the types of the variables in SPSS
matches the types from the description of the attributes above (if they do not, you can use
the Variable View to make any appropriate changes; also add labels to your variables
using the description above)
b) Visualize and interpret the data
II. Use box plots and histograms for the other variables
Page | 1
CSC367- Spring 2018, SPSS practice exercises
Page | 2
CSC367- Spring 2018, SPSS practice exercises
d) Calculate the distances among cases and identify the most dissimilar two cases
e)
Perform a correlation analysis. Interpret the correlation matrix and summarize the relationships
among the variables based on this analysis. Are there any variables strongly correlated
(correlation greater than 0.8)?
Page | 3
CSC367- Spring 2018, SPSS practice exercises
The Forest Fire dataset provided by the University of California at Irvine repository for machine
learning algorithms (http://archive.ics.uci.edu/ml/datasets/Forest+Fires) provides the following attributes
considered to be important when predict the burned area of forest fires, in the northeast region of
Portugal, by using meteorological and other data:
Attribute Information:
Problem 4 (Dimensionality Reduction) Repeat Problems 2 and 3 on the Auto MPG data from:
http://archive.ics.uci.edu/ml/datasets/Auto+MPG
Page | 4