Feb 28, 2010 7:12 AM
Simple Distribution doubt
-
Like (0)
Hi All,
I have a very simple doubt.
Consider a STAR schema as below. D1,D2,D3 and D4 are dimension tables with Keys K1,K2,K3 and K4. FACT is the fact table with a surrogate key S1 and dimension keys(foriegn keys) K1,K2,K3 and K4. Also you have other metrics like M1,M2 etc....
1) D1
K1
2) D2
K2
3) D3
K3
4) D4
K4
5) FACT
S1
K1
K2
K3
K4
M1
M2
My doubt is if we distribute the dimensions tables on K1, K2 , K3 and K4 respectively and the FACT on S1, then in a query that joins all the these tables, we will always have redistributions and broadcasts.
Is there a way out. what is the ideal distribution for the FACT if it contains a billion rows.
Regards
Sebastian
Redistributing small tables should never concern you, that happens very quickly. Your primary concern should always be on how your largest tables are joined together. If you are going to join really large tables together you want both tables to be distributed on the same key.
An important factor here is how we understand the star schema and the reason for its existence. The star is an artificial construct necessary to prop up the lack of performance of common RDBMS.
What you might want to do, is take a harder look at how the data is consumed, and rather than conform the data to an unnecessary model, conform the data to the machine's physics.
I have seen in a number of cases where a simple reconfiguratioin of the data can allow more tablee to be distributed on a common key. That it might look like a star is circumstantial. The star is unnecessary in Netezza and should only be used if functionally necessary, not for performance..

