You are on page 1of 5

Netezza Best Practices

Prepared By
Sivakumar Nair/India/IBM
1. Introduction

2. Distribution

3. Datatypes

4. ZoneMaps

5. Statistics

6. Groom / Reclaim

7. ETL/ELT Guidelines
Introduction
Netezza sells itself on simplicity and therefore best practice should not mean
hundreds of rules and regulations to follow. Recommended that basic principles are
on
> Distribution
> Datatypes
> Statistics
> Zonemaps
> Reclaim
Along side some basic standards for ETL and general pointers will help applications
to perform 99%. Best practices means minimal effort early on for maximum gain.

Distribution
Good Distribution is the fundamental element of performance. A SPU is the
individual element of parallelism and if all SPUs have same amount of work to do, a
query will be quicker than if one SPU was asked to do same job.
> Bad distribution is called data skew
> Skew to one SPU is worst case scenario.
> Skew affects query in hand and others as SPU has more to do.
> Skew also means that the machine will fill up quicker.
> Simple rule. Good distribution-Good Performance.
> Never create a table with out distribution key.
> If no distribution key is specified, the NPS chooses a distribution key and there is
no guarantee what that key is. This will eventually creates data skew.
When choosing the distribution key consider the following factors
> More distinct the distribution key values, the better.
> The Same distribution key value always goes to the same SPU.
> Table Used together should use the same columns for their distribution key
when possible.
> If a particular key is largely used in equal join clause, then that key is good
choice for distribution key.
> Check that there is no accidental process skew when there is a good record
distribution.
> If in doubt, use Random distribution as it will give perfect distribution.
> For Smaller tables Random distribution is usually good choice.
Criteria for Selecting distribution keys.
> Choose column for distribution key that distribute table rows eventually.
> Choose columns for the distribution key based on the selection set that you use
most frequently to retrieve rows from the table.
> Choose as few columns as possible for distribution key (Max 4 Columns).
> Do not choose Boolean columns as distribution key.

Data types
Picking right data types always give better performance.
> Having columns of uniform type produces consistent results.
> Having columns of uniform type ensures that data is stored efficiently.
> Having columns of Uniform type allow the system to process the queries
efficiently
> Numeric data type with a scales 0 are similar to INTEGER datatypes and switch to
Integer datatype means Zonemaps
> The INTERVAL datatype means cumbersome and hard to work with. Consider
storing original Time and Timestamp values and calculating interval on fly.
> Floating point datatype are, by definition, lousy in nature. There may be
performance hit by using them
> Inconsistent datatype for same column in different tables hit performance

ZoneMaps
> ZoneMaps improve the throughput and the response time of SQL against large
groups, or continually augmented nearly ordered data.
> Zonemaps are automatically generated, persistent, internal tables.
> Works with Large, grouped or nearly ordered date, timestamp and byteint,
smallint, integer, and biginteger datatypes.
> Zonemaps take advantage of inherent ordering or grouping of data to reduce disk
scans required to retrieve data on restricted scan queries.

Statistics
> Netezza uses Cost based Optimizer
> The more up to date and accurate table statistics are, the better plans the query
optimizer will generate.
> Statistics should be built into ETL or ELT processing where ever possible.
> Regular monitoring should be deployed to check out of date statistics.

Groom / Reclaim
Why groom is important?
> An update or delete of a table row does not remove old tuple.
> Over time outdated or deleted tuples are of no interest to any transaction
and must be deleted to free up space.
When should you reclaim
> Groom tables that receives frequent update or deletes
> Groom tables if you cancel or abort large load operation.
Groom best practices
> If You have a table whose contents are delete completely, consider using
truncate rather than delete, which eliminates the need to run groom
command.
> Build groom into the ETL processing where ever possible.

ETL / ELT Guidelines


> Avoid many small insert / update, especially single line inserts.
> Use bulk load method where ever possible.
> Avoid cursor based processing
> Order by on primary key, date or common join column field to optimize zone
maps
> Look to establish standard load and ETL methods (best practices) for ETL and
Load tools and methods that you used.
> Minimize I/O between the host and the ETL server where ever possible.

You might also like