Professional Documents
Culture Documents
Lift charts and decile tables compare the results of a model against what the
results would be if no model was used.
Karl Rexer, founder of Rexer Analytics, uses lift charts and decile tables to test models that
predict binary behaviors, e.g. if a lead will convert to a sale on a website.
Heres how it works:
1. Randomly split lead data into two samples: 60% = modeling sample, 40%
= hold-out sample.
2. Use data mining algorithms to find the best set of predictors that work in
the modeling sample and identify highly responsive leads.
3. Score leads on a scale of 1-100, 100 being the most likely to convert.
4. Rank order leads by score.
5. Split leads into 10 sections (deciles).
6. Evaluate the results in a decile table.
If the model is working well, the leads in the top deciles will have a much higher conversion
rate than leads in the lower deciles. The Lift column = how much more successful the
model is than if no model was used.
This data is then plotted on a lift chart to illustrate the performance of the model.
If no model was used, the results would appear as a linear line (in red below), i.e.
contacting leads in the first decile (first 10% of leads) = 10% of sales, contacting 20% of
leads = 20% of sales, etc.
The blue line represents the predictive model. The red X represents the lift of the first
decile above the random model.
The green line represents the perfect model, or the fewest leads you would have to
contact to yield 100% of sales.
Target Shuffling
Target shuffling is a process that reveals how likely it is for results to have
occurred by chance.
John Elder, founder of Elder Research, uses target shuffling to test the statistical accuracy
of his data mining results.
Heres how target shuffling works:
1. Randomly shuffle the output (target variable) on the training data to
break the relationship between it and the input variables.
2. Search for combinations of variables having a high concentration of
interesting outputs.
3. Save the most interesting result and repeat the process many times.
4. Look at a distribution of the collection of bogus most interesting results
to see how much of apparent results can be extracted from random data.
5. Evaluate where on (or beyond) this distribution your actual results stand.
6. Use this as your significance measure.
Target Shuffling
According to Elder, target shuffling is useful for preventing what he calls the vast search
effect.
The more variables you have, the easier it becomes to oversearch and identify false
patterns between them, he says.
Elder compares the bogus results using a histogram, or a graphical representation of
how data is distributed, and evaluates where on this distribution his models initial results
stand.
If this initial result is stronger than the best result of your shuffled data, it means your
findings are valid.
Target Shuffling
Target Shuffling
Bootstrap Sampling
Bootstrap Sampling
Bootstrap Sampling
Bootstrap Sampling
Read Report
@PlottingSuccess
/SoftwareAdvice
/company/software-advice
@SoftwareAdvice