Netezza Best Practices

Netezza Best Practices
Prepared By
Sivakumar Nair/India/IBM
1. Introduction
2. Distribution
3. Datatypes
4. ZoneMaps
5. Statistics
6. Groom / Reclaim
7. ETL/ELT Guidelines
Introduction
Netezza sells itself on simplicity and therefore best practice should not mean
hundreds of rules and regulations to follow. Recommended that basic principles are
on
> Distribution
> Datatypes
> Statistics
> Zonemaps
> Reclaim
Along side some basic standards for ETL and general pointers will help applications
to perform 99%. Best practices means minimal effort early on for maximum gain.
Distribution
Good Distribution is the fundamental element of performance. A SPU is the
individual element of parallelism and if all SPUs have same amount of work to do, a
query will be quicker than if one SPU was asked to do same job.
> Bad distribution is called data skew
> Skew to one SPU is worst case scenario.
> Skew affects query in hand and others as SPU has more to do.
> Skew also means that the machine will fill up quicker.
> Simple rule. Good distribution-Good Performance.
> Never create a table with out distribution key.
> If no distribution key is specified, the NPS chooses a distribution key and there is
no guarantee what that key is. This will eventually creates data skew.
When choosing the distribution key consider the following factors
> More distinct the distribution key values, the better.
> The Same distribution key value always goes to the same SPU.
> Table Used together should use the same columns for their distribution key
when possible.
> If a particular key is largely used in equal join clause, then that key is good
choice for distribution key.
> Check that there is no accidental process skew when there is a good record
distribution.
> If in doubt, use Random distribution as it will give perfect distribution.
> For Smaller tables Random distribution is usually good choice.
Criteria for Selecting distribution keys.
> Choose column for distribution key that distribute table rows eventually.
> Choose columns for the distribution key based on the selection set that you use
most frequently to retrieve rows from the table.
> Choose as few columns as possible for distribution key (Max 4 Columns).
> Do not choose Boolean columns as distribution key.
Data types
Picking right data types always give better performance.
> Having columns of uniform type produces consistent results.
> Having columns of uniform type ensures that data is stored efficiently.
> Having columns of Uniform type allow the system to process the queries
efficiently
> Numeric data type with a scales 0 are similar to INTEGER datatypes and switch to
Integer datatype means Zonemaps
> The INTERVAL datatype means cumbersome and hard to work with. Consider
storing original Time and Timestamp values and calculating interval on fly.
> Floating point datatype are, by definition, lousy in nature. There may be
performance hit by using them
> Inconsistent datatype for same column in different tables hit performance
ZoneMaps
> ZoneMaps improve the throughput and the response time of SQL against large
groups, or continually augmented nearly ordered data.
> Zonemaps are automatically generated, persistent, internal tables.
> Works with Large, grouped or nearly ordered date, timestamp and byteint,
smallint, integer, and biginteger datatypes.
> Zonemaps take advantage of inherent ordering or grouping of data to reduce disk
scans required to retrieve data on restricted scan queries.
Statistics
> Netezza uses Cost based Optimizer
> The more up to date and accurate table statistics are, the better plans the query
optimizer will generate.
> Statistics should be built into ETL or ELT processing where ever possible.
> Regular monitoring should be deployed to check out of date statistics.
Groom / Reclaim
Why groom is important?
> An update or delete of a table row does not remove old tuple.
> Over time outdated or deleted tuples are of no interest to any transaction
and must be deleted to free up space.
When should you reclaim
> Groom tables that receives frequent update or deletes
> Groom tables if you cancel or abort large load operation.
Groom best practices
> If You have a table whose contents are delete completely, consider using
truncate rather than delete, which eliminates the need to run groom
command.
> Build groom into the ETL processing where ever possible.
ETL / ELT Guidelines

> Avoid many small insert / update, especially single line inserts.
> Use bulk load method where ever possible.
> Avoid cursor based processing
> Order by on primary key, date or common join column field to optimize zone
maps
> Look to establish standard load and ETL methods (best practices) for ETL and
Load tools and methods that you used.
> Minimize I/O between the host and the ETL server where ever possible.

Netezza Best Practices

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Netezza Best Practices

Uploaded by

Copyright:

Available Formats

Netezza Best Practices

ETL / ELT Guidelines

You might also like