5 Replies Last post: Feb 5, 2010 6:31 PM by Dan Kuhn  
Singh   4 posts since
Jan 21, 2010
Currently Being Moderated

Jan 21, 2010 1:32 PM

SKEW Concept

Hi,

 

Am new to Netezza,  can anyone tell me the concept of skew in the Netezza Aritecture.  How does that effect the distribution key when creating a table. 

I see in the NPS Administrator tool the skew values for different table, its high for some table and low for other. 

 

Would be a great help!!!!!!

 

Thanks,

Singh

David Birmingham   151 posts since
Sep 24, 2007
Currently Being Moderated
1. Jan 21, 2010 1:53 PM in response to: Singh
Re: SKEW Concept

Skew is simply a way to measure the efficiency of the data distribution. Anything below 0.0 is probably what you are shooting for,

 

As an example, if you distribute randomly, you will often get a skew of 0, meaning that there is no statistical (and perhaps physical) difference in the total rows from SPU to SPU.

 

However, when using a distribution key, it is expected that some keys may have more records associated with them than others, but on average will usually balance out to an even distribution. This is provided that the given key has a fairly high cardinality compared to the data. If, for example, you chose a boolean value as a key, the data would end up on just two SPUs, meaning that it's not really distributed at all.  If even one of the SPUs is "dogpiled" where an inordinate count of records is on it for that table, it will slow down the query. It basically means that the query will only be as fast as the slowest SPU, and if one SPU is overloaded it won't matter how fast the other SPUs did their work.

 

Another form of skew is "process skew". Let's say you distribute your data on a transaction date, and when it's all done the records are physically distributed as evenly as you would expect. But when you do a query for a given date, the query is very slow. This is because only one SPU (the one containing the data for the date-in-question) is actually doing any work. Dates are notoriously bad choices for a distribution key.

 

 

Get a copy of the book "Netezza Underground" from Amazon.com - lots more detail on applications and pitfalls.

sdennyk   11 posts since
Nov 22, 2009
Currently Being Moderated
3. Jan 22, 2010 7:51 AM in response to: Singh
Re: SKEW Concept

Hi Vamsi,

 

Netezza box comes in as an appliance.You can install the Netezza Admin Windows GUI tool to monitor and administer the NPS box from

Windows.

 

WinSql can be used a query tool.Also Informatica as an ETL tool works fine with Netezza.

 

Thanks

Sebastian

David Birmingham   151 posts since
Sep 24, 2007
Currently Being Moderated
4. Jan 22, 2010 8:43 AM in response to: Singh
Re: SKEW Concept

Netezza is a hardware product, not a software product.

 

If you want more interaction, esp for learning, engage their training program.

Dan Kuhn   1 posts since
Jul 3, 2008
Currently Being Moderated
5. Feb 5, 2010 6:31 PM in response to: sdennyk
Re: SKEW Concept

You can also take a look at the Aginity Netezza Workbench which is both a simple query browser and includes some of the administrative capabilities such as distribution skew, etc.

More Like This

  • Retrieving data ...