Skip navigation
12327 Views 2 Replies Latest reply: Feb 28, 2010 2:28 PM by David Birmingham RSS
sdennyk New Enzee 18 posts since
Nov 22, 2009
Currently Being Moderated

Feb 28, 2010 7:12 AM

Simple Distribution doubt

Hi All,

 

I have a very simple doubt.

 

Consider a STAR schema as below. D1,D2,D3 and D4 are dimension tables with Keys K1,K2,K3 and K4. FACT is the fact table with a surrogate key S1 and dimension keys(foriegn keys) K1,K2,K3 and K4. Also you have other metrics like M1,M2 etc....

 

1) D1

    K1

 

2) D2

    K2

 

3) D3

    K3

 

4) D4

    K4

 

5) FACT

    S1

    K1

    K2

    K3

    K4

    M1

    M2

 

My doubt is if we distribute the dimensions tables on K1, K2 , K3 and K4 respectively and the FACT on S1, then in a query that joins all the these tables, we will always have redistributions and broadcasts.

 

Is there a way out. what is the ideal distribution for the FACT if it contains a billion rows.

 

Regards

Sebastian

  • Shawn Fox Enzee Exraordinaire 1,326 posts since
    Aug 15, 2006
    Currently Being Moderated
    1. Feb 28, 2010 8:07 AM (in response to sdennyk)
    Re: Simple Distribution doubt

    Redistributing small tables should never concern you, that happens very quickly.  Your primary concern should always be on how your largest tables are joined together.  If you are going to join really large tables together you want both tables to be distributed on the same key.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    2. Feb 28, 2010 2:28 PM (in response to sdennyk)
    Re: Simple Distribution doubt

    An important factor here is how we understand the star schema and the reason for its existence. The star is an artificial construct necessary to prop up the lack of performance of common RDBMS.

     

    What you might want to do, is take a harder look at how the data is consumed, and rather than conform the data to an unnecessary model, conform the data to the machine's physics.

     

    I have seen in a number of cases where a simple reconfiguratioin of the data can allow more tablee to be distributed on a common key. That it might look like a star is circumstantial. The star is unnecessary in Netezza and should only be used if functionally necessary, not for performance..

More Like This

  • Retrieving data ...

Bookmarked By (0)