You are on page 1of 53

Oracle 11g DB

Data Warehousing

Oracle's In-Database
Statistical Functions

ETL
<Insert Picture
Here>
OLAP

Statistics

Data Mining

Charlie Berger
Sr. Director Product Management,
Data Mining Technologies
Oracle Corporation
charlie.berger@oracle.com
Copyright 2007 Oracle Corporation

Synopsis
Oracle has delivered on a multi-year strategy to transform the
database from a data repository to an analytical database by
bringing the "analytics" to the data (data mining, text mining, and
statistical functions)

This new analytical Database, integrated with Oracle Business


Intelligence EE, opens new doors for better BI

Why did something happen?


What corrective actions should be taken?
Which factors are influencing your businesss key performance indicators?
Which things should I target?
What will happen in the future and where should you focus limited resources?

Overview of SQL statistical capabilities embedded in Oracle


Database
Repeat what I was shown hands-on session

Copyright 2007 Oracle Corporation

Agenda

Introduction
Oracles in-Database Statistical Functions
Several Simple Demonstrations
Opportunities for Use Cases
Hands-on Exercises
User Stories

A
B
C

Copyright 2007 Oracle Corporation

Market Trends
Analytics Provide Competitive Value
Competing on Analytics, by Tom Davenport
Some companies have built their very businesses
on their ability to collect, analyze, and act on data.
Although numerous organizations are embracing analytics, only a
handful have achieved this level of proficiency. But analytics
competitors are the leaders in their varied fieldsconsumer products
finance, retail, and travel and entertainment among them.
Organizations are moving beyond query and reporting

- IDC 2006

Super Crunchers, by Ian Ayers


In the past, one could get by on intuition and experience.
Times have changed. Today, the name of the game is data.
Steven D. Levitt, author of Freakonomics

Data-mining and statistical analysis have suddenly become


cool.... Dissecting marketing, politics, and even sports, stuff this
complex and important shouldn't be this much fun
to read. Wired
Copyright 2007 Oracle Corporation

Market Trends
Analytics Save Lives
Super Crunchers, by Ian Ayers
In December 2004, [Berwick] brazenly announced a plan to save 100,000
lives over the next year and a half. The 100,000 Lives Campaign challenged
hospitals to implement six changes in care to prevent avoidable deaths.
He noticed that thousands of ICU patients die each year from infections
after a central line catheter is placed in their chests. About half of all intensive
care patients have central line catheters, and ICU infections are deadly
(carrying mortality rates of up to 20 percent). He then looked to see if there
was any statistical evidence of ways to reduce the chance of infection. He
found a 2004 article in Critical Care Medicine that showed that systematic
hand-washing (combined with a bundle of improved hygienic procedures such
as cleaning the patients skin with an antiseptic called chlorhexidine) could
reduce the risk of infection from central-line catheters by more than 90
percent. Berwick estimated that if all hospitals just implemented this one
bundle of procedures, they might be able to save as many as 25,000 lives per
year.
New York Times, August 23, 2007, Attack of the Super Crunchers:
Adventures in Data Mining, By Melissa Lafsky

Copyright 2007 Oracle Corporation

Competitive Advantage of BI & Analytics

$$

Optimization

Whats the best that can happen?

Competitive Advantage

Predictive Modeling

What will happen next?

Forecasting/Extrapolation

Analytic$

What if these trends continue?

Statistical Analysis

Why is this happening?

Alerts

What actions are needed?

Query/drill down

Where exactly is the problem?

Ad hoc reports

How many, how often, where?

Standard Reports

What happened?

Degree of Intelligence
Source: Competing on Analytics, by T. Davenport & J. Harris

Copyright 2007 Oracle Corporation

Access &
Reporting

Oracle Data Mining


& Statistical Functions

Copyright 2007 Oracle Corporation

Definition: Statistics

There are three kinds of


lies: lies, damned lies,
1
and statistics.
1

This well-known saying is part of a phrase attributed to Benjamin


Disraeli and popularized in the U.S. by Mark Twain
http://en.wikipedia.org/wiki/Statistics

Copyright 2007 Oracle Corporation

Definition: Statistics
Statistics is a mathematical science pertaining to the
collection, analysis, interpretation or explanation, and
presentation of data. It is applicable to a wide variety
of academic disciplines, from the physical and social
sciences to the humanities. Statistics are also used for
making informed decisions and misused for other
reasons in all areas of business and government.

http://en.wikipedia.org/wiki/Statistics

Copyright 2007 Oracle Corporation

Definitions: Statistics
Statistical methods can be used to summarize or
describe a collection of data; this is called descriptive
statistics. In addition, patterns in the data may be
modeled in a way that accounts for randomness and
uncertainty in the observations, and then used to draw
inferences about the process or population being
studied; this is called inferential statistics. Both
descriptive and inferential statistics comprise applied
statistics.
http://en.wikipedia.org/wiki/Statistics

Copyright 2007 Oracle Corporation

Statistical Concepts

Copyright 2007 Oracle Corporation

Statistics & SQL Analytics


Ranking functions

Descriptive Statistics

rank, dense_rank, cume_dist, percent_rank, ntile

Window Aggregate functions


(moving and cumulative)
Avg, sum, min, max, count, variance, stddev,
first_value, last_value

average, standard deviation, variance, min, max, median


(via percentile_count), mode, group-by & roll-up
DBMS_STAT_FUNCS: summarizes numerical columns
of a table and returns count, min, max, range, mean,
stats_mode, variance, standard deviation, median,
quantile values, +/- n sigma values, top/bottom 5 values

Correlations

LAG/LEAD functions
Direct inter-row reference using offsets

Reporting Aggregate functions


Sum, avg, min, max, variance, stddev, count,
ratio_to_report

Statistical Aggregates
Correlation, linear regression family, covariance

Linear regression
Fitting of an ordinary-least-squares regression line
to a set of number pairs.
Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions.

Pearsons correlation coefficients, Spearman's and


Kendall's (both nonparametric).

Cross Tabs
Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa

Hypothesis Testing
Student t-test , F-test, Binomial test, Wilcoxon Signed
Ranks test, Chi-square, Mann Whitney test, KolmogorovSmirnov test, One-way ANOVA

Distribution Fitting
Kolmogorov-Smirnov Test, Anderson-Darling Test, ChiSquared Test, Normal, Uniform, Weibull, Exponential

Note: Statistics and SQL Analytics are included in Oracle


Database Standard Edition

Copyright 2007 Oracle Corporation

Descriptive Statistics
MEDIAN & MODE

> SQL

Median: takes numeric or datetype values and returns the middle


value
Mode: returns the most common value
A. SELECT STATS_MODE(EDUCATION) from CD_BUYERS;
B. SELECT MEDIAN(ANNUAL_INCOME) from CD_BUYERS;
C. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from
CD_BUYERS GROUP BY EDUCATION;
D. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from
CD_BUYERS GROUP BY EDUCATION ORDER BY
MEDIAN(ANNUAL_INCOME) ASC;

Copyright 2007 Oracle Corporation

DBMS_STAT_FUNCS Package
SUMMARY procedure
The SUMMARY procedure is used to summarize a numerical column
(ADM_PULSE); the summary is returned as record of type summaryType
> SQL

DECLARE
v_ownername varchar2(8);
v_tablename varchar2(50);
v_columnname varchar2(50);
v_sigma_value number;
type n_arr1 is varray(5) of number;
type num_table1 is table of number;
s1 dbms_stat_funcs.summaryType;
BEGIN
v_ownername
:= 'cberger';
v_tablename
:= 'LYMPHOMA';
v_columnname := 'ADM_PULSE';
v_sigma_value := 3;
dbms_stat_funcs.summary(p_ownername=> v_ownername, p_tablename=> v_tablename, p_columnname=>
v_columnname, p_sigma_value=> v_sigma_value, s=> s1);
END;
/

Copyright 2007 Oracle Corporation

DBMS_STAT_FUNCS Package
SUMMARY procedure
The SUMMARY procedure is used to summarize a numerical column
(ADM_PULSE); the summary is returned as record of type summaryType
> SQL

set echo off


connect CBERGER/CBERGER@ora10gr2
set serveroutput on
set echo on
declare
s DBMS_STAT_FUNCS.SummaryType;
begin
DBMS_STAT_FUNCS.SUMMARY('CBERGER','LYMPHOMA','ADM_PULSE',3,s);
dbms_output.put_line('SUMMARY STATISTICS');
dbms_output.put_line('Count: '||s.count);
dbms_output.put_line('Min: '||s.min);
dbms_output.put_line('Max: '||s.max);
dbms_output.put_line('Range: '||s.range);
dbms_output.put_line('Mean: '||round(s.mean));
dbms_output.put_line('Mode Count: '||s.cmode.count);
dbms_output.put_line('Mode:
'||s.cmode(1));
dbms_output.put_line('Variance: '||round(s.variance));
dbms_output.put_line('Stddev:
'||round(s.stddev));
dbms_output.put_line('Quantile 5 '||s.quantile_5);
dbms_output.put_line('Quantile 25 '||s.quantile_25);
dbms_output.put_line('Median
'||s.median);
dbms_output.put_line('Quantile 75 '||s.quantile_75);
dbms_output.put_line('Quantile 95 '||s.quantile_95);
dbms_output.put_line('Extreme Count: '||s.extreme_values.count);
dbms_output.put_line('Extremes: '||s.extreme_values(1));
dbms_output.put_line('Top 3:
'||s.top_5_values(1)||','||s.top_5_values(2)||','||s.top_5_values(3));
dbms_output.put_line('Bottom 3: '||s.bottom_5_values(5)||','||s.bottom_5_values(4)||','||s.bottom_5_values(3));
end;
/

Copyright 2007 Oracle Corporation

DBMS_STAT_FUNCS Package
SUMMARY procedure
A subset of data
that is returned
after execution of
the PL/SQL
package
summarizes the
use of the different
SUMMARY
procedures

Copyright 2007 Oracle Corporation

Summary Statistics and Histograms


Oracle Data
Miner (gui
for Oracle
Data Mining
Option)
provides
graphical
histograms
with
summary
statistics

Copyright 2007 Oracle Corporation

Hypothesis Testing
Parametric Tests
Parametric tests make some
assumptions about the data
typically that the data is
normally distributed among
other assumptions

Oracle 10g parametric


hypothesis tests include:
T-test
F-test
One-Way ANOVA
Copyright 2007 Oracle Corporation

T-Test
T-tests are used to measure the significance of
a difference of means.
T-tests include the following:

One-sample T-test
Paired-samples T-test
Independent-samples T-test (pooled variances)
Independent-samples T-test (unpooled variances)

Copyright 2007 Oracle Corporation

Basic Example
Compare
difference in blood
pressures
between people
who eat meat
frequently vs.
dont

Copyright 2007 Oracle Corporation

One-Sample T-Test
STATS_T_TEST_*
The t-test functions are:
STATS_T_TEST_ONE: A one-sample t-test
STATS_T_TEST_PAIRED: A two-sample, paired t-test (also known as
a crossed t-test)
STATS_T_TEST_INDEP: A t-test of two independent groups with the
same variance (pooled variances)
STATS_T_TEST_INDEPU: A t-test of two independent groups with
unequal variance (unpooled variances)

http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm

Copyright 2007 Oracle Corporation

One-Sample T-Test
Query compares the mean of SURVIVAL_TIME
to the assumed value of 35:
SELECT avg(SURVIVAL_TIME_MO) group_mean,
stats_t_test_one(SURVIVAL_TIME_MO, 35,
'STATISTIC') t_observed,
stats_t_test_one(SURVIVAL_TIME_MO, 35)
two_sided_p_value
FROM LYMPHOMA;

Returns the observed t value and its related


two-sided significance
SQL Worksheet
Copyright 2007 Oracle Corporation

Paired Samples T-Test


Query compares the mean of LOGWT for Pig Weights in
Week 3 to Week 8, grouped by Diet:
SELECT substr(diet,1,1) as diet, avg(LOGWT3)
logwt3_mean,
avg(LOGWT8) logwt8_mean,
stats_t_test_paired(LOGWT3, LOGWT8,'STATISTIC')
t_observed,
stats_t_test_paired(LOGWT3, LOGWT8) two_sided_p_value
FROM CBERGER.PIGLETS3
GROUP BY ROLLUP(DIET)
ORDER BY 5 ASC;

Returns the observed t value and its related two-sided


significance
SQL Worksheet
Copyright 2007 Oracle Corporation

Independent Samples T-Test


(Pooled Variances)
Query compares the mean of AMOUNT_SOLD between
MEN and WOMEN within CUST_INCOME_LEVEL ranges
SELECT substr(cust_income_level,1,22) income_level,
avg(decode(cust_gender,'M',amount_sold,null)) sold_to_men,
avg(decode(cust_gender,'F',amount_sold,null)) sold_to_women,
stats_t_test_indep(cust_gender, amount_sold, 'STATISTIC','F')
t_observed,
stats_t_test_indep(cust_gender, amount_sold) two_sided_p_value
FROM sh.customers c, sh.sales s
WHERE c.cust_id=s.cust_id
GROUP BY rollup(cust_income_level)
ORDER BY 1;

SQL Worksheet
Copyright 2007 Oracle Corporation

Independent Samples T-Test


(Pooled Variances)

Copyright 2007 Oracle Corporation

F-Test
Query compares the variance in the SIZE_TUMOR
between MALES and FEMALES
SELECT variance(decode(GENDER,'0', SIZE_TUMOR_MM, null)) var_tumor_men,
variance(decode(GENDER,'1', SIZE_TUMOR_MM,null)) var_tumor_women,

stats_f_test(GENDER, SIZE_TUMOR_MM, 'STATISTIC', '1') f_statistic,


stats_f_test(GENDER, SIZE_TUMOR_MM) two_sided_p_value
FROM CBERGER.LYMPHOMA;

Returns observed f value and two-sided significance

SQL Worksheet
Copyright 2007 Oracle Corporation

F-Test
Query compares the variance in the SIZE_TUMOR
between males and females Grouped By GENDER
SELECT GENDER,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'F_RATIO') f_ratio,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'SIG') p_value, AVG(SIZE_REDUCTION)
FROM CBERGER.LYMPHOMA
GROUP BY GENDER ORDER BY GENDER;

Returns observed f value and two-sided significance

SQL Worksheet
Copyright 2007 Oracle Corporation

One-Way ANOVA
In statistics, analysis of variance (ANOVA, or
sometimesA.N.O.V.A.) is a collection of statistical
models, and their associated procedures, in which
the observed variance is partitioned into
components due to different explanatory variables.
Example
Group A is given vodka, Group B is given gin, and Group C
is given a placebo. All groups are then tested with a memory
task. A one-way ANOVA can be used to assess the effect of
the various treatments (that is, the vodka, gin, and placebo).

http://en.wikipedia.org/wiki/Statistics

Copyright 2007 Oracle Corporation

One-Way ANOVA
Query compares the average SIZE_REDUCTION within different
TREATMENT_PLANS Grouped By LYMPH_TYPE:
SELECT LYMPH_TYPE,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'F_RATIO') f_ratio,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'SIG') p_value
FROM CBERGER.LYMPHOMA
GROUP BY LYMPH_TYPE ORDER BY 1;

Returns one-way ANOVA significance and split by LYMPH_TYPE

Copyright 2007 Oracle Corporation

Hypothesis Testing
(Nonparametric)
Nonparametric tests are used when certain assumptions
about the data are questionable.
This may include the difference between samples that are
not normally distributed.
All tests involving ordinal scales (in which data is ranked)
are nonparametric.
Nonparametric tests supported in Oracle Database 10g:

Binomial test
Wilcoxon Signed Ranks test
Mann-Whitney test
Kolmogorov-Smirnov test

Copyright 2007 Oracle Corporation

Customer Example

"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)

Copyright 2007 Oracle Corporation

?x

Correlation Functions

The CORR_S and CORR_K


select CORR_S(AGE, WEIGHT)
functions support nonparametric or
coefficient,
rank correlation (finding correlations
CORR_S(AGE, WEIGHT,
between expressions that are ordinal
'TWO_SIDED_SIG')
scaled).
p_value,
Correlation coefficients take on a
substr(TREATMENT_PLAN, 1,15)
value ranging from 1 to 1, where:
1 indicates a perfect relationship
1 indicates a perfect inverse
relationship
0 indicates no relationship

as TREATMENT_PLAN
from CBERGER.LYMPHOMA
GROUP BY TREATMENT_PLAN;

The following query determines


whether there is a correlation
between the AGE and WEIGHT of
people, using Spearman's
correlation:

Copyright 2007 Oracle Corporation

Cross Tabulations
This query analyzes the strength of the association between
TREATMENT_PLAN and GENDER Grouped By LYMPH_TYPE
using a cross tabulation:
SELECT LYMPH_TYPE,
stats_crosstab(GENDER, TREATMENT_PLAN,
'CHISQ_OBS') chi_squared,
stats_crosstab(GENDER, TREATMENT_PLAN,
'CHISQ_SIG') p_value,
stats_crosstab(GENDER, TREATMENT_PLAN,
'PHI_COEFFICIENT') phi_coefficient
FROM CBERGER.LYMPHOMA
GROUP BY LYMPH_TYPE ORDER BY 1;

Returns the observed p_value and phi coefficient significance:

Copyright 2007 Oracle Corporation

Cross Tabulations
STATS_CROSSTAB function takes as arguments two expressions
(the two variables being analyzed) and a value that determines which test to
perform. These values include the following:

CHISQ_OBS (observed value of chi-squared)


CHISQ_SIG (significance of observed chi-squared)
CHISQ_DF (degree of freedom for chi-squared)
PHI_COEFFICIENT (phi coefficient)
CRAMERS_V (Cramers V statistic)
CONT_COEFFICIENT (contingency coefficient)
COHENS_K (Cohens kappa)

Function returns all values as specified by the third argument (default


is CHISQ_SIG)

Copyright 2007 Oracle Corporation

Distribution-Fitting Functions
Distribution-fitting functions in Oracle Database 10g
include the following

NORMAL_DIST_FIT function
UNIFORM_DIST_FIT function
POISSON_DIST_FIT function
WEIBULL_DIST_FIT function
EXPONENTIAL_DIST_FIT function

These functions test how well a sample of values fits a


particular distribution
The IN parameter of each function specifies which of the
tests to use to measure the fit

Copyright 2007 Oracle Corporation

Copyright 2007 Oracle Corporation

Opportunities for Use Cases


Control charts
Set flags on your datae.g. when a value is above 3 sigma

Copyright 2007 Oracle Corporation

Opportunities for Use Cases


Construction of a
Control Chart
1.Calculate means
and ranges for
each sample
2.Chart
3.Apply out-ofcontrol rules e.g.
outside of 3 sigma

Copyright 2007 Oracle Corporation

Opportunities for Use Cases


Construction of a
Control Chart
1.Calculate means
and ranges for
each sample
2.Chart
3.Apply out-ofcontrol rules e.g.
outside of 3 sigma

Copyright 2007 Oracle Corporation

Customer Example

"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)

Copyright 2007 Oracle Corporation

http://www.oracle.com/technology/products/bi/stats_fns/index.html
Copyright 2007 Oracle Corporation

In-Database Statistics
Advantages
Oracle 10g DB

Data remains in the database at all


timeswith appropriate access security
control mechanismsfewer moving parts
Straightforward inclusion within interesting
and arbitrarily complex queries
Real-world scalabilityavailable for mission critical appls
Data Warehousing

ETL

OLAP

Statistics

Data Mining

Copyright 2007 Oracle Corporation

Industry Analysts
PREDICTIVE ANALYTICS: Extending the Value of Your
Data Warehousing Investment, By Wayne W. Eckerson
According to our survey, most organizations plan to significantly
increase the analytic processing within a data warehouse database in
the next three years, particularly for model building and scoring, which
show 88% climbs. The amount of data preparation done in databases
will only climb 36% in that time, but it will be done by almost two-thirds
of all organizations (60%)double the rate of companies planning to
use the database to create or score analytical models.
its surprising that about one-third of organizations plan to build
analytical models in databases within three years.
We leverage the data warehouse database when possible, says one
analytics manager. He says most analysts download a data sample to
their desktop and then upload it to the data warehouse once its
completed. Ultimately, however, everything will run in the data
warehouse, the manager says.

http://download.101com.com/pub/tdwi/Files/PA_Report_Q107_F.pdf

Copyright 2007 Oracle Corporation

Analytics vs.
1. In-Database Analytics Engine
Basic Statistics (Free)
Data Mining
Text Mining

1. External Analytical Engine


Basic Statistics
Data Mining
Text Mining (separate: SAS EM for Text)
Advanced Statistics

2. Costs (ODM: $20K cpu)


Simplified environment
Single server
Security

2. Costs (SAS EM: $150K/5 users)


Duplicates data
Annual Renewal Fee (AUF)

3. IT Platform
SQL (standard)
Java (standard)

3. IT Platform
SAS Code (proprietary)

(~45% each year)

Oracle 11g DB
Data Warehousing
ETL
OLAP Statistics
Data Mining

Copyright 2007 Oracle Corporation

Analytics vs.
1. In-Database Analytics Engine
Basic Statistics (Free)
Data Mining
Text Mining

1. External Analytical Engine


Basic Statistics
Data Mining
Text Mining (separate: SAS EM for Text)
Advanced Statistics

2. Costs (ODM: $20K cpu)


Simplified environment
Single server
Security

2. Costs (SAS EM: $150K/5 users)


Duplicates data
Annual Renewal Fee (AUF)

3. IT Platform
SQL (standard)
Java (standard)

3. IT Platform
SAS Code (proprietary)

Oracle 11g DB
Data Warehousing

(~45% each year)

Oracle 11g DB
Data Warehousing

ETL

ETL

OLAP Statistics

OLAP Statistics

Data Mining

Data Mining

Copyright 2007 Oracle Corporation

SAS In-Database Processing


3-Year Road Map
The goal of the SAS In-Database initiative is to
achieve deeper technical integration with database
providers, but also blends the best SAS data
integration and analytics with the core strengths
of databases..
Like all DBMS client applications, the SAS engine often must
load and extract data over a network to and from the DBMS. This
presents a series of challenges:
Network bottlenecks between SAS and the DBMS constrain
access to large volumes of data
The best practice today is to read data into the SAS environment for processing.
For highly repeatable processes, this might not be efficient because it takes time
to transfer the data and resources are used to temporarily store in the SAS
environment. In some cases, the results of the SAS processing must be
transferred back to the DBMS for final storage, which further increases the cost.
Addressing this challenge can result in improved resource utilization and enable
companies to answer business questions more quickly.

Oracle Data Mining is available today


Source: SAS In-Database Processing White PaperOctober 2007

Copyright 2007 Oracle Corporation

SAS In-Database Processing


3-Year Road Map

It boils down to this simple equation:


Less data movement = faster analytics,
and faster analytics = faster delivery of
real-time BI throughout an enterprise.
Source:
http://www.teradata.com/t/pdf.aspx?a=83673&b=178909

Use SAS to get more power out of your database


Move key components of BI, analytics and data integration processes from the server
or desktop to inside the database and help shorten your time to intelligence

Copyright 2007 Oracle Corporation

IDC Worldwide Business Analytics Software

Oracle

http://www.oracle.com/corporate/analyst/reports/infrastructure/bi_dw/208699e.pdf

Copyright 2007 Oracle Corporation

References
1.
2.
3.
4.
5.
6.
7.

Back to Basics Understanding and Visualising Variation in


Data.Pete Ceuppens, Robert Shaw, Zhiping You. AstraZeneca R&D.
QuickStart: Oracle Statistics Release 10gR2. Charlie Berger, Oracle
Corporation. April, 2007.
Oracle Database SQL Reference 10g Release 2 (10.2) Part Number:
B14200-02. December 2005
Applied Linear Statistical Models. John Neter, William Wasserman,
Michael H. Kutner. IRWIN 1985.
Mathematical Statistics with Applications. Mendenhall, Scheffer,
Wackley. Duxbury Press, Boston, MA. 1981
Oracle Database Data Warehousing Guide 10g Release 2 (10.2) Part
Number: B14223-02 December 2005
Oracle Technology Network:
http://www.oracle.com/technology/products/bi/stats_fns/index.html

Source: Oracle 10gR2 Statistics Functions, OLSUG08 Workshop, Henri B. Tuthill, AstraZeneca & Charlie Berger, Oracle

Copyright 2007 Oracle Corporation

Hands-on Exercises
Quick Start Statistics

Copyright 2007 Oracle Corporation

<Insert Picture Here>

More Information:
Oracle Data Mining 10g
oracle.com/technology/products/bi/odm/index.html

Oracle Statistical Functions


http://www.oracle.com/technology/products/bi/stats_fns/index.html

Oracle Business Intelligence Solutions


oracle.com/bi

Contact Information:
Email: Charlie.berger@oracle.com
Copyright 2007 Oracle Corporation

Q U E S T I O N S
A N S W E R S

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

You might also like